Archive for August 2009
Alert raised time (local)
Recently we added a global customer to our owns. It spans 6 time zones with several sites and branch offices. This moved our monitoring scenarios one step further and sadly rised what I see as a major lack in OpsMgr. The first thing our support people asked me was: I need to know when this has happened in local time, it’s ok to have alerts in my local time but I need to know when this happened in agent local time. Easy I though, I could remember from MOM 2005 times the alert table had a local time raised field… alas this is no more the case with OpsMgr.
We needed a solution so I started drilling into the alert exposed fields. First there’s no help in the extended alert properties as they’re visible in console, they all just report the alert time in console local time.
I took a closer look at the database table and I confirmed there’s no direct or indirect reference at the alert local agent time, but things were not so bad at least the alert context had all the data I needed.
From my research *almost* every alert has a context (there are exception for rollup generated alerts) and in the alert context there’s always the time property referring to the agent local time.
So it was time to bring this piece of information in console and the natural way to do so is via an alert targeted task. The logic is very simple in powershell:
# Get the named parameters
param($alertGUID)
#let’s register the opsmgr snapin
#required error checking is missing
$Setup =Get-Item -Path “HKLM:\Software\Microsoft\Microsoft Operations Manager\3.0\Setup”
$dir = $setup.GetValue(“InstallDirectory”)
cd $dir
add-pssnapin Microsoft.EnterpriseManagement.OperationsManager.Client
.\Microsoft.EnterpriseManagement.OperationsManager.ClientShell.Functions.ps1;
.\Microsoft.EnterpriseManagement.OperationsManager.ClientShell.NonInteractiveStartup.ps1
$alert = Get-Alert -id:$alertGUID
if($alert -ne $null)
{
if ($alert.context -ne $null -and $alert.context -ne “”)
{
$c = [xml] $alert.context
$localtime = [datetime] $c.DataItem.Time.Substring(0,19)
}
else
{
$localtime = “unknown”
}
$message = ($alert.Name) + “`nLocal Agent Time Raised: ” + $localtime + “`nUTC Time Raised: ” + ($alert.TimeRaised) `
+ “`nUTC Time Last Modified: ” + ($alert.LastModified)
}
else
{
$message = “alert not found”
}
#finally return the raw message
$message
The net result is not elegant but it works:
This is an internal tool so it lacks error checking and a proper installation procedure. To have a better interface I could have written a small managed code utility with a few lines of OpsMgr SDK and some UI, but it was out of scope at this time. Maybe in the future if more data is needed.
To make the while thing work you have to:
- Copy the powershell script in the OpsMgr installation directory on every console computer
- Import the attached management pack
Hope this help.
You can find the attachments here: Get Alert local time
- Daniele
This posting is provided “AS IS” with no warranties, and confers no rights.
On Demands and cookdown errata corrige
I want to make you aware I corrected my previous post on On Demand Detections. I must thank Pete Zerger of SystemCenterCentral, he pointed out cookdown is working between rules and monitors. He was referring to this sentence of mine:
cookdown seems to work for workflow type, it cooks for monitors and cooks for rules but not across the two this is why I have two S1 runs at sync time when I have both monitors and rules even if they’re time synced
the statement above implies we can have race conditions between monitors and rules when they use the same data source / provider and they’re time synced
This proved to be incorrect but it took a long to understand while I was observing this behavior. I indeed had two runs one for rule and one for monitors but because monitors had the ConfirmDelivery property set to false, while the rule had it set to true (it has been a copy and paste error). This makes the HS split the datasource in two and in fact I have two copies of the script on the file system. Now the above sentence has been corrected as follows:
– Daniele
This posting is provided "AS IS" with no warranties, and confers no rights.
On Demand Detections caveats (a cookdown story)
I’m a big fan of On Demand Detections, but in the past few weeks a note from Raphael Burri and then this thread (http://www.systemcentercentral.com/tabid/60/indexId/21097/tag/Forums+MP_Development/Default.aspx#vindex21461) on System Center Central made me wonder if I’m right on this topic.
In this post I’m going to explain On Demand Detections (ODDs) and their relationships with cookdown that leads to overall agent performance impact on monitored systems.
For those who are new to ODDs, they’re the way to ask a monitor to recalculate its state. As you know many monitors are scheduled base, in the sense they check the state every x minutes/hours, for these monitors what you see in console is not the real time picture of what’s going on. It’s just the picture of the last monitor run. ODDs are the way you can ask a refresh of the picture without waiting fot the next polling cycle. If a monitor has ODDs it will respond to “Recalculate Now” in Health Explorer, if not the button is useless (yes, I’ve already ask the team to gray out the button if the monitor hasn’t ODDs).
If present ODDs are called when a managed entity exits from maintenance mode or is initialized at agent startup. This is why I consider ODDs a strategic part of every scheduler based monitor. Finally I must you need one ODD for every monitor state, so typically two or tree ODDs per monitor.
As usual, when developing a MP and defining the monitor strategy, it’s fundamental to consider what will happen on the agents and if the load we’re going to add is acceptable or not. This is especially tru if we’re going to monitor multiple instances of our monitored class on a single agent. For example if I write a monitor for a SQL Database I know I will have many of them on a SQL box or if I write a monitor to check folders size or age I know I can potentially have several of them for single agent. Let’s think of a script based monitor that targets a SQL database, if I have 50 dbs I will launch 50 scripts all together, right? Not exactly, here comes to help “cookdown”. Cookdown is an agent functionality that consolidates identical data sources into one. So if I properly write my monitoring script I can have it run just once for all my 50 dbs. In this case the script will need to calculate the state for all dbs and return a property bag with all the states, the monitor will have a filter to grab just the state for the db it is targeted to. Given the level of my english it is harder to explain than to do it. Anyway what we need to remember is, with properly written data sources, I can limit the load on my agents executing the script just once regardless of the number of instances monitored.
What Raphael told me basically was “cookdown doesn’t work for ODDs”, but I was sure I had tested them, so something was not clear to me. Marius, in his post, confirmed that cookdown for ODDs would work just per instance. So if I have a monitor with 3 ODDs, accordingly to Marius, it will run just once, but If I have a monitor with 3 ODDs targeted to 10 instances, it will run 10 times, one for each instance. If cookdown wouldn’t be in place (as it was with SP1) I would have had 3 x 10 = 30 runs.
Since I have many MPs with ODDs it was time to clear things up, I’m going to share what I’ve found with you.
The testing case:
- one script based probe (P1). Script will be referenced as S1.
- one trigger probe composed with P1 (T1)
- one datasource (DS) composed with a scheduler and P1
- two monitors using DS for standard detection and T1 for on demand detections (2 detections defined)
- one collecting rule composed with DS and a perf mapper condition
The monitors are targeted to Class C, the environment has 3 instances of class C managed by one Healthservice (HS).
The two monitors and the collecting rules are time synced and have the same recurring schedule.
What I was expecting:
- One run of S1 at synced time for the two monitors and the rule
- One run of S1 at agent startup caused by ODDs
- One run of S1 when exiting maintenance mode
Observed behavior:
- at HS initialization one instance of S1 is run for every monitor and every instance. So we have 6 runs (2 monitors x 3 instances). Where HS intialization is: at service startup and when the instance exits maintenance mode.
- 8 copies of S1 present under health Service State directory
- at sync time two instances of S1 are run, if I disable the colleting rule, just one instance of S1 is run.
- if I remove the sync time from monitors and the rule I have no effect on cookdown: when a new instance of class C is discovered it triggers a config reload (event id 21025) and this make all instances on par
- Removing sync time means that at agent startup I’ll have 8 runs of S1: 6 runs from on demand detections plus 1 run for two monitors and 1 run for rule
I repeated the same tests with an OleDB module as a probe with the same results. So in this respect the behavior is consistent.
Notes and conclusions:
- cookdown for On Demand Detections works just per entity and per monitor, so having On Demands Detection for multiple instances classes can lead to serious performance issues.
- since cookdown for ODDs is per instance and per monitor we can assume it is a mistake to use the same data source / provider used for standard detections, in fact we can have race conditions between the detections. Remember if you’ll have one run for each instance if the provider calculates the state for every single instance you’re going to recalculate the state of every instance n times where n is the number of instances. Suppose for example you put a lock (directly or indirectly) on a resource to get it’s state
- cookdown doesn’t work between workflows with different ConfirmDelivery property. So if you have workflows that are using the same datasource but with ConfirmDelivery mismatch you will have two runs of the datasource (and no more than two since the ConfirmDelivery property has a boolean type and accept just true or false). This is why in my testing case I had two runs of S1.
- the statement above implies we can have race conditions between workflows when they use the same data source / provider but have a different ConfirmDelivery property.
- At the state of the art I must discourage ODDs for multiple instances monitoring
- At the same time I raise a call to the product team: ODDs are so useful, if it’s too difficult to cookdown across entities and monitors just disable them at startup and after maintenance mode, after all they’re called On Demand. I speculate this shouldn’t be a huge QFE to implement. In any case, please spend a final word on how you’re going to implement ODDs in future releases so that we can understand in which cases to use them and in which not. In particular will they cookdown across entities and monitors?
- Daniele
This posting is provided "AS IS" with no warranties, and confers no rights.