SCCM MP scripts keep failing


This task has been hanging out for a while, it’s no mystery I consider it a miserable waste of time, but that’s life and since SCCM is a major component of our services offer I had to put a stop to those nasty alerts about SCCM MP scripts failures. I’m sure you had many %PROCESSOR_ARCHITECTURE environment variable related alerts if you’re using this MP.

image

The SCCM MP is still written using the backward compatibility modules (come on it’s 2012), so it’s really a waste of time trying to fix something that should be rewritten from scratch. And indeed I didn’t fix it, rather I found a way to keep it quieter. There’s a good discussion thread on the topic on Kevin’s blog: http://blogs.technet.com/b/kevinholman/archive/2011/09/30/mp-update-new-configmgr-2007-mp-version-6-0-6000-3-resolves-top-issues.aspx

The proposed solution is to edit the downloaded version of script to substitute the WMI check for environment variables, this solution has the following drawbacks:

1)       If, for any reason, the local agent cache is rebuilt from scratch the mod is lost

2)       If you have many SCCM servers you must repeat the mod on every server

But before going any further, let’s take a step backward (;-)) and set the issue straight. The SCCM MP needs to assess if it is running on a 32bit or a 64bit system, to do this it checks for the environment variable PROCESSOR_ARCHITECTURE. So far everything makes sense. What doesn’t make sense is the way the environment variable is checked, instead of using the handy and lightweight WScript.Shell object, the author decided to use WMI and query the Win32_Environment class. You know what? WMI is not the most polished piece of code we have on Windows and sometimes it fails or times out. Every time it fails the script post an error event in the pipe and a specific rule matches the event and raises an alert. In this poorly written MP we have at least 6 different places where the same code is used (copy and paste oh yeah) and about 10 rules that execute the code every 5 minutes. If we match WMI “disturbed” behavior with this coding technique it’s no surprise we have tons of alerts, even if the rules are somewhat running fine. What I mean is the rule fails on one run, but gets executed ok on the next. What we have is pure noise.

After a closer look at the MP I faced several options:

1)       Completely rewrite the MP, not an option I didn’t have a few weeks to spend on it, I had at most a few hours

2)       Try to disable the guilty rules and rewrite them, not an option it would take time, the code is spread all over the MP and it uses that nasty backward compatibility syntax (poorly documented furthermore)

3)       Try to disable the noisy alert generating rules (those who match the event generated by the monitoring rule) and get notified only when failures are repeating. I tried this.

To implement my choice I took several shortcuts: my time was scarce, the job was uninteresting.

So these are the conditions, if they fit your needs you’re free to use (at your own risk) this MP:

§  I decided to use one catch all monitor, I don’t care which rule is failing but just how many are.

§  I couldn’t differentiate on the type of error, since the event id is always the same and I don’t want to rewrite tens of rules checking for event description. So the monitor checks if too many errors (scripts related) occur in a given timeframe, if so it raises an alert and it’s up to you to check in the collected events for the actual error

§  I simply disabled the old error generating rules using an override

§  I didn’t bother to check for documentation on backward compatibility modules, I simply used a rule to dump backward compatible events into the OpsMgr event log and the picked up from there with a Microsoft.Windows.RepeatedEventLogTimer2StateMonitorType

§   I decided to give a minimum of instructions to the poor operator using a knowledge article

In the MP we’ll have two distinct workflows, the first one will pick up all the backward compatible events generated in case of a script failure and dump them in the OpsMgr event log, the secondo one is the repeated event monitor that checks for failure events in the OpsMgr Event Log and if more than 10 are recorded in one hour raises an alert. The alert will auto-resolve after 30 minutes if the errors stop.

image

The QND.SCCM.EventDumper is a simple rule based on a simple WriteAction

   <WriteActionModuleType ID="QND.BackwardEventDumper.WA" Accessibility="Public" Batching="false">

      <Configuration></Configuration>

      <ModuleImplementation>

        <Composite>

          <MemberModules>

            <WriteAction ID="Dumper" TypeID="Windows!Microsoft.Windows.ScriptWriteAction">

              <ScriptName>QND.EventDumper.vbs</ScriptName>

              <Arguments>$Data/EventDisplayNumber$ "$Data/PublisherName$" $Data/EventLevel$ "$Data/EventCategory$" "$Data/EventDescription$" "$Data/Params/Param[1]$"</Arguments>

              <ScriptBody>

                <![CDATA[

                Set oArgs = WScript.Arguments

                Set g_API = CreateObject("MOM.ScriptAPI")

                Number = oArgs(0)

                If Number > 20000 Then Number = 20000

                sMessage = "EventLevel: " & oArgs(2) & " Category: " & oArgs(3) & VbCrLf & oArgs(4) & vbCrLf & "Param1: " & oArgs(5)

                Call g_API.LogScriptEvent(oArgs(1), oArgs(0), oArgs(2), sMessage)

                ]]>

              </ScriptBody>

              <TimeoutSeconds>30</TimeoutSeconds>

            </WriteAction>

          </MemberModules>

          <Composition>

            <Node ID="Dumper" />

          </Composition>

        </Composite>

      </ModuleImplementation>

      <OutputType>System!System.CommandOutput</OutputType>

      <InputType>BB!System.Mom.BackwardCompatibility.Event.Data</InputType>

    </WriteActionModuleType>

 

The monitor in itself uses the standard Microsoft.Windows.RepeatedEventLogTimer2StateMonitorType:

      <UnitMonitor ID="QND.SCCM.Fix.RepeatedScriptErrors" Accessibility="Public" Enabled="true" ConfirmDelivery="false" Priority="Normal" Remotable="true"

                   TypeID="Windows!Microsoft.Windows.RepeatedEventLogTimer2StateMonitorType" ParentMonitorID="SystemHealth!System.Health.AvailabilityState" Target="SCCM!Microsoft.SystemCenter.ConfigurationManager.2007.SMS_Server_Class">

        <Category>AvailabilityHealth</Category>

        <AlertSettings AlertMessage="QND.SCCM.Fix.RepeatedScriptErrors.AlertMessage">

          <AlertOnState>Warning</AlertOnState>

          <AutoResolve>true</AutoResolve>

          <AlertPriority>Normal</AlertPriority>

          <AlertSeverity>Warning</AlertSeverity>

          <AlertParameters>

            <AlertParameter1>$Data/Context/EventDescription$</AlertParameter1>

          </AlertParameters>

        </AlertSettings>

          <OperationalStates>

            <OperationalState ID="Warning" MonitorTypeStateID="RepeatedEventRaised" HealthState="Warning" />

            <OperationalState ID="Success" MonitorTypeStateID="TimerEventRaised" HealthState="Success" />

          </OperationalStates>

        <Configuration>

          <RepeatedComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</RepeatedComputerName>

          <RepeatedLogName>Operations Manager</RepeatedLogName>

          <RepeatedExpression>

              <And>

                <Expression>

                  <SimpleExpression>

                    <ValueExpression>

                      <XPathQuery>PublisherName</XPathQuery>

                    </ValueExpression>

                    <Operator>Equal</Operator>

                    <ValueExpression>

                      <Value>Health Service Script</Value>

                    </ValueExpression>

                  </SimpleExpression>

                </Expression>

                <Expression>

                  <SimpleExpression>

                    <ValueExpression>

                      <XPathQuery>EventDisplayNumber</XPathQuery>

                    </ValueExpression>

                    <Operator>Equal</Operator>

                    <ValueExpression>

                      <Value>1102</Value>

                    </ValueExpression>

                  </SimpleExpression>

                </Expression>

                <Expression>

                  <RegExExpression>

                    <ValueExpression>

                      <XPathQuery>Params/Param[1]</XPathQuery>

                    </ValueExpression>

                    <Operator>MatchesWildcard</Operator>

                    <Pattern>ConfigMgr 2007 Monitor*</Pattern>

                  </RegExExpression>

                </Expression>

              </And>

          </RepeatedExpression>

          <Consolidator>

            <ConsolidationProperties />

            <TimeControl>

              <WithinTimeSchedule>

                <Interval>3600</Interval>

              </WithinTimeSchedule>

            </TimeControl>

            <CountingCondition>

              <Count>10</Count>

              <CountMode>OnNewItemTestOutputRestart_OnTimerSlideByOne</CountMode>

            </CountingCondition>

          </Consolidator>

          <TimerWaitInSeconds>1800</TimerWaitInSeconds>

        </Configuration>       

      </UnitMonitor>

    </Monitors>

One final word of warning, if you import the MP as is you must know the following rules will be disabled:

§  SMSv4PXEServicePointHealthScripterror11Rule

§  SMSv4SoftwareUpdatePointHealthScripterror3Rule

§  SMSv4StateMigrationPointHealthScripterror4Rule

§  SMSv4SiteDatabaseServerHealthScripterror1Rule

§  SMSv4ManagementPointHealthScripterror17Rule

§  SMSv4SiteMaintenanceTasksScripterror1Rule

§  SMSv4NLBManagementPointHealthScripterror17Rule

§  SMSv4ComponentHealthMonitoringscripterror2Rule

You can find the MP in the new repository I set up on SugarSync: https://www.sugarsync.com/pf/D6284134_0813286_45094

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

 

 

Advertisements
  1. [SCOM] les scripts du Management Pack SCCM n’arrêtent pas de remonter en erreur - Jean-Sébastien DUCHENE Blog's

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: