I’d like to share with you some considerations about maintenance mode. From a design point of view maintenance mode is a great feature: I can suspend monitoring of single objects or objects groups. So I can suspend a single computer as well as my distributed app down to that single database instance. OK, the UI lacks the ability of schedule maintenance mode, but there are plenty of powershell scripts out there that can help you with this. So we though it could have been a good idea to schedule maintenance mode to adhere to defined maintenance windows for our systems. It’s only when we started to use maintenance mode that the nasty side (immo) came out.
The net result of maintenance mode is: performance hit and closed alerts that should not. The alert closing behavior is the issue that pushed us to get rid of scheduled maintenance in favor of custom scripts to suppress alerts generated during maintenance windows.
What happens? When a managed entity is put into maintenance mode all its monitors status are changed to “Unmonitored”. The state change is used to track availability vs maintenance vs unavailability periods. When the maintenance window comes to an end all the monitors are reset, for monitors with On Demand Detections these are run to check the monitor status, for monitors without On Demand Detections they are reset to “green”.
So when we wake up form maintenance we have un status change from unknown to “true” status for monitor with on demand detections and to green for monitors without. This status change hits the RMS and then the DB. Old open alerts are closed when needed new alerts are opened.
Fig 1. alerts closed and then reopened after maintenance window.
But that’s not all, monitors without On Demand Detections are turned green and then recalculated, this means some of them will remain green but probably some of them will turn to red or yellow (state change once again).
Fig 2. Typical behavior of a monitor with on demand detections. Red – Unknown – Red
Fig 3. Typical behavior for a monitor without on demand detections. Red – Unknown – Green – Red
When monitors with “auto resolve alert” are reset (after maintenance mode) all the open alerts get closed with their associated properties you may have changed (i.e. resolution status, trouble ticket id, custom properties, …), for monitors still in troubles a brand new alert gets opened. This caused a lot of confusion for our support people and pushed us away from scheduled maintenance.
In the next few weeks I’ll check if maintenance mode behavior is changed with R2.