KB 933061 and some interesting effects


In one of my latest posts (Tired of WMI Probe Failed Execution-) I stated that the WMI fix cited in KB 933061 has very positive effects on monitor reliability. I must say that WMI errors have not disappeared all together, but they have decreased by two orders of magnitude. In my quest for the perfect monitoring agent I noticed a significative decrease in CPU utilization by HealthService. As you can see from the following screenshots, this is a huge improvement.

clip_image001clip_image002

So the fix is not only recommended for monitor reliability, but for decreasing the monitoring impact as well. Not all the environments will record the same improvement, if you have a quiet healthservice you won’t notice any difference, but if you have tons of 21025 (see the event saga in previous and future posts) on your RMS than you’ll benefit from the fix.

The reason here is hidden in monitors state, rollups and internal tasks. The sequence is more or less the following: WMI is not reliable, so monitors based on WMI probes are not always up to date or worse are in a unknown health state. WMI errors affect the discovery process, some properties and objects are not discovered at every iteration and there are cases where they flip / flop between iterations. These changes in discovered entities will fire 21025 on agents and worse on RMS. At every 21025 on the RMS (caused by these and other reasons, see the wrap up post on 21025) the RMS fires a recalculation of every monitor involved in a rollup it doesn’t know the state of. The internal task reaches the agent, the agent reloads the workflows (huge cpu spike as you can see from previous screenshots) and tries to recalculate the monitor state, alas WMI is unreliable and the agent is not able to recalculate it. At the next 21025 on the RMS the entire process restarts. If 21025 on the RMS are frequent the recalculation basically never ends. The fix makes the agent more reliable in calculating monitors state and, as you can see, health service becomes quieter.

For more troubleshooting tips on WMI failures I would suggest the following post: Getting lots of Script Failed To Run alerts? WMI Probe Failed Execution? Backward Compatibility Script Error?

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Advertisements
  1. OpsMgr 2007 R2 – lessons learned reprise « Quaue Nocent Docent

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: