Quae Nocent Docent

What hurts, teaches – Ordinary tales from management trenches

Archive for January 2009

Finally we got those reports (Virtual Machine Manager)

without comments

On January 28th Microsoft has finally released the complete VMM 2008 MP. Without this reporting stuff the pro-tips were almost useless (at least in my environment).

Written by Daniele Grandini

January 31, 2009 at 11:03 am

Posted in Uncategorized

How to schedule a web site monitor during Business hours

with 2 comments

Following my previous post on how to schedule a monitor during business hours only, I’ve been asked how to achieve the same result with a web site monitor and a service monitor. Here we can have two approaches: first create the monitor via the authoring space in console or create from scratch a brand new MP. I suspect the first method is the easiest for most of us, so I will try to explain this one. In a second post I will do the same for service monitoring, but this time we’ll see this is going to be a little more difficult.

So let’s start with our Add Monitoring Wizard and select the Web Application template, please remember to create a new MP. We can then complete the wizard and add any custom steps we think it’s appropriate. Once the monitoring is ready we simply need to export the relative MP. The wizard ahs created for us one or more DataSourceModuleType entries like the following one:

<DataSourceModuleType ID=”WebApplication_5cdfdc9c54904bec93bb75fb13d32cc3.UrlDataSource” Accessibility=”Public” Batching=”false”>
        <Configuration />
        <ModuleImplementation Isolation=”Any”>
          <Composite>
            <MemberModules>
              <DataSource ID=”Scheduler” TypeID=”System!System.Scheduler”>
                <Scheduler>
                  <SimpleReccuringSchedule>
                    <Interval>300</Interval>
                  </SimpleReccuringSchedule>
                  <ExcludeDates />
                </Scheduler>

These data source are the basis for all the monitors implemented in the MP, the composition is like this:

<Composition>
  <Node ID=”Probe”>
         <Node ID=”Scheduler” />
  </Node>
</Composition>

This should sound familiar by now if you read my previous post on this topic. What we need to do is just add a System.SchedulerFilter and inject it into the module composition:

  <ConditionDetection ID=”Filter” TypeID=”System!System.SchedulerFilter”>
    <SchedulerFilter>
      <ProcessDataMode>OnSchedule</ProcessDataMode>
      <Schedule>
        <WeeklySchedule>
          <Windows>
            <Daily>
              <Start>07:00</Start>
              <End>19:00</End>
              <DaysOfWeekMask>62</DaysOfWeekMask>
            </Daily>
          </Windows>
        </WeeklySchedule>
        <ExcludeDates />
      </Schedule>
      <UseCurrentTime>true</UseCurrentTime>
    </SchedulerFilter>
  </ConditionDetection>
</MemberModules>
<Composition>
  <Node ID=”Probe”>
    <Node ID=”Filter”>
      <Node ID=”Scheduler” />
    </Node>
  </Node>
</Composition>

Obviously this approach has some shortcuts:

1) the business hour period is not configurable via overrides but instead cabled into the MP. To do this we should modify the coniguration of the DataSource and of all the unit monitors that are using this DataSource.

2) if we have several URL to monitor this needs to be repeated for every data source

Remember, we can always leave the monitor run all day long and then suppress the alerts generated during the non BH period via a powershell script, use a scheduled subscription to avoid receiving notifications during the night shift and use the BH filter in availability reports. So running monitors during BH is nice, but if it takes more work than it’s worth then we can implement these alternative strategies.

I attach a complete MP that monitors nocentdocent.wordpress.com. (http://cid-558ec647eef17f8d.skydrive.live.com/self.aspx/.Public/Sample%20MPs/Progel.Web.BH.xml)

Written by Daniele Grandini

January 27, 2009 at 8:24 pm

Posted in MP, SCOM

Tuning the DPM MP to avoid excessive CPU Usage

with 2 comments

The brand new DPM MP is starting to become something usable. It completely lacks any serious performance collection for trending and statistics, but at least we have some decent monitors. From a backup system MP I need to have evidence of any substantial change in backup volumes and speed, just to cite an example. (Maybe I should work on it)

But what’s going to hurt you if you implement such an MP is an high CPU usage on your DPM servers caused by the logical disk monitoring of the OS MP. If you implemented DPM you know that every protected resource has it’s own disk, these disks are implemented as mounting points at the OS level. It’s not uncommon to have hundreds of such disks on a DPM Server. All these disks are discovered as logical disks and monitored the way logical disks deserve:

  • availability check every 5′
  • space availability check every 1 hour
  • avg disk seconds per transfer check every 1′

All these checks are implemented via WMI, so we have these wmi queries for hundreds of disks, guess what? Your cpu is constantly overloaded. Not taking into account the performance collection rules.

The solution is pretty easy, and immo should have been implemented in the DPM MP or at least documented in the MP guide. Since these disks are managed by DPM there’s no need to double check them, if any problem arises then the DPM MP will take charge of letting us know. Given this fact we can simply disable the “Mount Point Discovery Rule” (from the OS MP) for the group “DPM Server Group” (exposed by the DPM MP). And if you want to cleanup your console, don’t forget to run Remove-DisabledMonitoringObject from CommandShell.

Written by Daniele Grandini

January 24, 2009 at 12:38 pm

Posted in MP, SCOM

Running a monitor during business hours

with 18 comments

One question I’ve been asked from customers of mine is how to run specific monitors only during business hours. Let’s say I want to check CPU performance only from 7:00 to 19:00 Monday to Friday. This was easy with MOM 2005, but has been lost with OpsMgr 2007. Actually it’s not lost but needs some manual work to be done for your own rules or monitors and a lot of work for rules and monitors exported in sealed MPs. First of all, this post won’t talk about rules since this topic has been covered by Boris Yanushpolsky in this post “Configuring rules to run during business hours only“.

To achieve the same result with a monitor you just need to build a composite module with a System.SchedulerFilter condition in your workflow. Alas the OpsMgr team didn’t think to the opportunity to create a composite data source with a Scheduler and a SchedulerFilter and use it as the base scheduler data source and that’s a pity. And indeed this could have been a good idea. So for sealed MPs you should:

  1. Export the MP in XML format
  2. Disable the original monitor
  3. Create you own MP
  4. Do some cut and paste from the exported one not forgetting to put a reference to it in your MP
  5. Add a SchedulerFilter to the monitor workflow

I suspect no one will try this mess on a regular basis.

But what Microsoft didn’t do you can do in your MPs. Let’s say we have a monitor based on a script that needs to run on a schedule, but only during business hours. The basic workflow of your monitor is typically based on just one step of type Microsoft.Windows.TimedScript.PropertyBagProvider. This precooked data source is the combination of a Scheduler and Script Probe, what we need to do is to recombine the two with a condition detection in the middle.

So what today is:

<DataSourceModuleType ID=”My.Script” Accessibility=”Internal” Batching=”false”>
        <Configuration> …
        
</Configuration>
        <OverrideableParameters> …

         </OverrideableParameters>
        <ModuleImplementation Isolation=”Any”>
          <Composite>
            <MemberModules>
              <DataSource ID=”Script” TypeID=”Windows!Microsoft.Windows.TimedScript.PropertyBagProvider”> …
</DataSource>
            </MemberModules>
            <Composition>
              <Node ID=”Script” />
            </Composition>

          </Composite>
        </ModuleImplementation>
        <OutputType>System!System.PropertyBagData</OutputType>
      </DataSourceModuleType
>

Should become like this:          

           <Composition>
              <Node ID=”Script”>
                <Node ID=”Filter”>
                <Node ID=”Scheduler”/>
                </Node>
              </Node>
            </Composition>

The scheduler triggers based on its configuration, the Filter lets the trigger pass only during business hours and in turns the script runs only during the intended time.

Let’s expand our code with a more complete example. The example schedules a script that basically writes an event in the eventlog and returns a property bag to run only Monday to Friday from 7.00 to 19.00:

      <DataSourceModuleType ID=”Progel.Test.Filter.DS” Accessibility=”Internal” Batching=”false”>
        <Configuration>
        <xsd:element xmlns:xsd=”
http://www.w3.org/2001/XMLSchema” name=”IntervalSeconds” type=”xsd:int” />
        <xsd:element xmlns:xsd=”
http://www.w3.org/2001/XMLSchema” name=”SyncTime” type=”xsd:string” />
        <xsd:element xmlns:xsd=”
http://www.w3.org/2001/XMLSchema” name=”Message” type=”xsd:string” />
        <xsd:element xmlns:xsd=”
http://www.w3.org/2001/XMLSchema” name=”From” type=”xsd:string” />
        <xsd:element xmlns:xsd=”
http://www.w3.org/2001/XMLSchema” name=”To” type=”xsd:string” />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID=”IntervalSeconds” Selector=”$Config/IntervalSeconds$” ParameterType=”int” />
          <OverrideableParameter ID=”SyncTime” Selector=”$Config/SyncTime$” ParameterType=”string” />
          <OverrideableParameter ID=”Message” Selector=”$Config/Message$” ParameterType=”string” />
          <OverrideableParameter ID=”From” Selector=”$Config/From$” ParameterType=”string” />
          <OverrideableParameter ID=”To” Selector=”$Config/To$” ParameterType=”string” />
        </OverrideableParameters>
        <ModuleImplementation Isolation=”Any”>
          <Composite>
            <MemberModules>
                          <DataSource ID=”Scheduler” TypeID=”System!System.Scheduler”><Scheduler>
             <SimpleReccuringSchedule><Interval>$Config/IntervalSeconds$</Interval><SyncTime>$Config/SyncTime$</SyncTime>
            </SimpleReccuringSchedule><ExcludeDates /></Scheduler></DataSource>

            <ProbeAction ID=”Script” TypeID=”Windows!Microsoft.Windows.ScriptPropertyBagProbe”>
                <ScriptName>ProgelDebug.vbs</ScriptName>
                <Arguments>$Config/Message$</Arguments>
                <ScriptBody>
                <![CDATA[
Set g_API = CreateObject("MOM.ScriptAPI")
    Set oArgs = WScript.Arguments
    for I=0 to oArgs.Count -1
     sCmdLine = sCmdLine & " " & oArgs(I)
    next
    LogEvent 100,1,"Starting script. " & sCmdLine
    Set oBag= g_API.CreatePropertyBag()
    Call oBag.AddValue("Message", oARgs(0))
    Call oBag.AddValue("RowLength", CInt(oArgs(0)))
    Call g_API.Return(oBag)

Sub LogEvent(eventID, eventType, msg)
    WScript.Echo "Logging event. " & WScript.ScriptName & " EventID: " & eventID & " eventType: " & eventType & " --> " & msg
    Call g_API.LogScriptEvent(WScript.ScriptName,eventID,eventType, msg)
End Sub
                ]]>
                </ScriptBody>
                            <TimeoutSeconds>60</TimeoutSeconds>
              </ProbeAction>
             <ConditionDetection ID=”Filter” TypeID=”System!System.SchedulerFilter”>
               <SchedulerFilter>
                    <ProcessDataMode>OnSchedule</ProcessDataMode>
                    <Schedule>
                      <WeeklySchedule>
                        <Windows>
                          <Daily>
                            <Start>$Config/From$</Start>
                            <End>$Config/To$</End>
                            <DaysOfWeekMask>62</DaysOfWeekMask>
                          </Daily>
                        </Windows>
                      </WeeklySchedule>
                      <ExcludeDates />
                    </Schedule>
                    <UseCurrentTime>true</UseCurrentTime>
                  </SchedulerFilter>
              </ConditionDetection>
            </MemberModules>
            <Composition>
              <Node ID=”Script”>
                <Node ID=”Filter”>
                <Node ID=”Scheduler”/>
                </Node>
              </Node>
            </Composition>
          </Composite>
        </ModuleImplementation>
        <OutputType>System!System.PropertyBagData</OutputType>
      </DataSourceModuleType>

Now you can build your own monitor based on this datasource, you can add one more parameter for the DaysOfWeekMask so that it can be overridden and so on. Hope this can help.

Written by Daniele Grandini

January 20, 2009 at 9:09 am

Posted in MP, SCOM

Tagged with

DNS MP – guilty once again?

with 10 comments

When a DNS Zone is Active Directory integrated and you have several DCs, the discovery for the DNS Domain generates useless traffic and high CPU usage. One of the discovered properties is the “Primary Server”, alas every DC for DNS integrated zones is the primary server for the zone, since the DNS Domain is discovered on every DC we have a race condition in which every DC may override the previous discovery. This in turn generates, as detailed by Fabrizio in a previous post, a configuration reload on every DC. The configuration reload, at the state, is a CPU intensive operation…

This is the definition of the DNS Domain class, as you can see the PrimaryServer is not a key, so every instance is identified by the DomainName property.

<ClassType ID=”Microsoft.Windows.DNSServer.Library.DNSDomain” Accessibility=”Public” Abstract=”false” Base=”Microsoft.Windows.DNSServer.Library.Component” Hosted=”false” Singleton=”false”>
  <Property ID=”DomainName” Type=”string” Key=”true” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”PrimaryServer” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”DynamicUpdates” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”NameServers” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
</ClassType>

By default the discovery runs every 6 hours with no sync time, if, for example, the zone is distributed on 6 DCs we can have one config reload per DC per hour, more DCs mean more configuration reload. This can quickly become an issue. Once again a configuration reload is a CPU intensive operation for the Health Service process.

The DNS MP potentially has another issue, for each DNS Zone it discovers the zone SerialNumber. The SerialNumber is incremented by the DNS for every change in the zone, so we can assume it will change on every discovery cycle. From my first research this property is never used for monitoring purposes. Since the discovery runs every 6 hours it means a reload every 6 hours, this is not dramatic, but if bad MP coding habits sum up, then it can add to the health service total cpu usage, bringing the net impact of monitoring to an unacceptable level. We have DCs and ISA servers well above a 5% average cpu utilization for the HealthService process. With the previous DNS MP we had HS CPU usage over 10% on average.

<ClassType ID=”Microsoft.Windows.DNSServer.Library.Zone” Accessibility=”Public” Abstract=”false” Base=”Microsoft.Windows.DNSServer.Library.Component” Hosted=”true” Singleton=”false”>
  <Property ID=”ZoneName” Type=”string” Key=”true” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”ZoneType” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”IsReverseZone” Type=”bool” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”IsADIntegrated” Type=”bool” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”AllowDynamicUpdates” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”AllowZoneTransfers” Type=”int” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”PrimaryServerName” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”ZoneFileName” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”UseWINS” Type=”bool” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”SerialNumber” Type=”int” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”HostName” Type=”string” Key=”true” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”NameServers” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”MasterServers” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”WINSServers” Type=”string” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
  <Property ID=”Interval” Type=”int” Key=”false” CaseSensitive=”false” Length=”256″ MinLength=”0″ />
</ClassType>

My modest advice:

  1. closely monitor instance space change every time you import a new MP (it can be done with a SQL Query). Frequent changes in a static environment means bad discovery processes.
  2. define a custom rule to collect HS CPU Usage and report on it for average usage above 5%
  3. override bad discovery scripts with more polite ones (as we did for ISA and DNS)
  4. raise your voice and ask Microsoft for more testing efforts in MPs, patching and so on. We should concentrate on bringing value building monitoring blocks and reports for LOB apps not on fixing bugs in Microsoft code, shouldn’t we?

Written by Daniele Grandini

January 19, 2009 at 7:06 pm

Posted in Uncategorized

ISA 2006 MP – Discovery WORST PRACTICE

without comments

After investigating the HealthService CPU issue described in my previous post, I created a rule to collect HealthService Processor usage counter on our customers environment. What I found was, that I still have some Servers with abnormal CPU usage. All those servers are running ISA 2006, so I looked inside the ISA 2006 MP (version 6.0.6351.0) to see if there is a Discovery issue.

I found 2 discoveries that update a property every time they run :

1) Microsoft.ISAServer.2006.Array.Discovery :  the following is a code snipped taken from the ISA MP. As you can see the property EnterprisePersistentName is updated with a string concatenation using, PersistenName taken from Enterprise Object and the difference in senconds between the installation date and the current time  :

Call objIsaArrayClass.AddProperty(“$MPElement[Name='Microsoft.ISAServer.2006.Array']/EnterprisePersistentName$”, objEnterprise.PersitentName &amp; “_” & DateDiff(“s”, objEnterprise.CreatedTime, Date & ” ” & Time))

2) Microsoft.ISAServer.2006.CSS.ServerRole.Discovery : the following is a code snipped taken from the ISA MP. As you can see the property EterpriseHostedPersistentName is updated with a string concatenation using, PersistenName taken from Enterprise Object and the difference in minutes between the installation date and the current time :

m_EnterpriseHostedPersistentName = enterpriseObj.PersistentName & “_” & DateDiff(“n”,enterpriseObj.CreatedTime, Date & ” ” & Time)

Call cssServerRoleInst.AddProperty(“$MPElement[Name='Microsoft.ISAServer.2006.CSS.ServerRole']/ EnterpriseHostedPersistentName$”, m_EnterpriseHostedPersistentName)

Those discoveries run every 15 minutes, I changed with an override the interval to 6 hours and I saw a benefit in term of CPU usage. I tried to understand why those properties are built in this way but I didn’t  find a good answer.

Written by Fabrizio Guaitolini

January 19, 2009 at 12:00 am

Posted in MP, SCOM

Tagged with

Class properties that get updated frequently is a WORST PRACTICE not only for RMS

with one comment

One of our customer complains that  HealthService consumes too much CPU on his servers, so I started to investigate the cause of the problem.

The following picture shows the HealthService performance graph taken from process explorer :

HealthServcie Performance Graph

What I found is that the cause of the high CPU usage was a discovery rule that updates a property LastRun every time it runs (it was scheduled to run every 5 minutes) . As outlined in the following blog post “WORST PRACTICE: Class properties that get updated frequently”, it is not a good idea to do that because it has a performance impact on RMS. In my tests I notice a performance impact on the Agent HelathServce that execute the discovery too. Every time the Discovery rule updates a Class property it seems that the RMS forces the agent to reload the configuration and the following event is logged in the RMS EventLog:

Event Source: OpsMgr Config Service | Event ID: 29102 | Date:  17/01/2009 | Time:  22.22.38 | Computer: RMS
Description:Configuration state of OpsMgr Health Service “{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}” running on “<agentfqdn>” may be out of date. It should contact OpsMgr Config Service to synchronize its configuration state.

AGENT requests an updated configuration :

Event Source: OpsMgr Connector | Event ID: 21024 | Date:  17/01/2009 | Time:  22.22.40 | Computer: AGENT
Description:OpsMgr’s configuration may be out-of-date for management group <MGNAME>, and has requested updated configuration from the Configuration Service. The current(out-of-date) state cookie is “XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX”

RMS receives the request :

Event Source: OpsMgr Config Service | Event ID: 29103 | Date:  17/01/2009 | Time:  22.22.43 | Computer: RMS
Description:OpsMgr Health Service “{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}” running on “<agentfqdn>” has contacted OpsMgr Config Service to synchronize its configuration state.  The configuration state cookie for the OpsMgr Health Service running on “<agentfqdn>” is “YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY”.

AGENT receives the new configuration :

Event Source: OpsMgr Connector  | Event ID: 21025 | Date:  17/01/2009 | Time:  22.22.49 | Computer: AGENT
Description:OpsMgr has received new configuration for management group <MGNAME> from the Configuration Service.  The new state cookie is “YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY”.

AGENT loads the new configuration :

Event Source: HealthService | Event ID: 1210 | Date:  17/01/2009 | Time:  22.23.08 | Computer: AGENT
Description:New configuration became active. Management group “<MGNAME>”, configuration id:”YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY YY”

The configuration reload is a CPU intensive operation, for example it need to parse all the management packs (I saw it with filemon). After removing the property LastRun in the discovery, CPU usage decreases to an acceptable value.

If you want to try in your LAB Here is a sample MP with a discovery that runs every 2 minutes to simulate this behaviour : 
Discovery.HighCPU.xml (remove the .doc extension used only to upload the MP)  – DON’T USE IN A PRODUCTION ENVIRONMENT

Written by Fabrizio Guaitolini

January 18, 2009 at 12:00 am

Posted in MP, SCOM

Tagged with

Eventually we can start to track agent patch level

without comments

On January 12th has been released an hotfix to address the limitation of the current patch field. This patch is critical to mantain control of your agents… “When you try to view the Patch List property, the list of Operations Manager agent hotfixes may be truncated on System Center Operations Manager 2007 Service Pack 1 systems

Written by Daniele Grandini

January 17, 2009 at 5:39 pm

Posted in Bug, SCOM

Tagged with ,

DNS MP – DNS 2003 External Resolution Monitor minor bug

with 2 comments

The new DNS MP (6.0.6480.0) fixes several nasty bugs, but it adds a minor typo that breaks the External Resolution Monitor. By default the monitor checks for external name servers (querytype=ns) given a domain name, alas the default search is set to www.microsoft.com that’s not properly a domain, so the monitor always fails. Just ovverride the monitor and set to microsoft.com (or some other external DNS domain) to make it works.

Written by Daniele Grandini

January 17, 2009 at 12:01 pm

Posted in MP