The nasty side of maintenance mode

I’d like to share with you some considerations about maintenance mode. From a design point of view maintenance mode is a great feature: I can suspend monitoring of single objects or objects groups. So I can suspend a single computer as well as my distributed app down to that single database instance. OK, the UI lacks the ability of schedule maintenance mode, but there are plenty of powershell scripts out there that can help you with this. So we though it could have been a good idea to schedule maintenance mode to adhere to defined maintenance windows for our systems. It’s only when we started to use maintenance mode that the nasty side (immo) came out.

The net result of maintenance mode is: performance hit and closed alerts that should not. The alert closing behavior is the issue that pushed us to get rid of scheduled maintenance in favor of custom scripts to suppress alerts generated during maintenance windows.

What happens? When a managed entity is put into maintenance mode all its monitors status are changed to “Unmonitored”. The state change is used to track availability vs maintenance vs unavailability periods. When the maintenance window comes to an end all the monitors are reset, for monitors with On Demand Detections these are run to check the monitor status, for monitors without On Demand Detections they are reset to “green”.

So when we wake up form maintenance we have un status change from unknown to “true” status for monitor with on demand detections and to green for monitors without. This status change hits the RMS and then the DB. Old open alerts are closed when needed new alerts are opened.

open / close / open alerts

Fig 1. alerts closed and then reopened after maintenance window.

But that’s not all, monitors without On Demand Detections are turned green and then recalculated, this means some of them will remain green but probably some of them will turn to red or yellow (state change once again).

with ondemand detections

Fig 2. Typical behavior of a monitor with on demand detections. Red – Unknown – Red

Without ondemand detections

Fig 3. Typical behavior for a monitor without on demand detections. Red – Unknown – Green – Red

When monitors with “auto resolve alert” are reset (after maintenance mode) all the open alerts get closed with their associated properties you may have changed (i.e. resolution status, trouble ticket id, custom properties, …), for monitors still in troubles a brand new alert gets opened. This caused a lot of confusion for our support people and pushed us away from scheduled maintenance.

In the next few weeks I’ll check if maintenance mode behavior is changed with R2.

  1. #1 by RogerM on March 6, 2009 - 8:30 am

    I totally agree with your opinion on MM.
    The problem is more complex than what the product currently offers.

    We have not gone so far as not to use it but there are issues.

    Thanx for a great article, it sure opened my eyes to reasons why we should NOT hardwire the incident system and SCOM, yet…

  1. Maintenance Mode Plugin

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: