OpsMgr – Is alert reporting reliable?


The short answer is no, it is not reliable in OpsMgr 2007 R2 nor in OM2012 Beta, at least in the environments I manage.

First of all, for me reliable means data is consistent between the live database and the data warehouse where reporting is targeted. In my experience I have alerts present in LiveDB missing in data warehouse, and alerts present in data warehouse with a resolution state "not closed" (<> 255) not present in LiveDB. (Obviously I can have alerts with a closed state in the DW that are not present anymore in the LiveDB where they get purged much faster).

Let’s set how we can get the resolution state for alerts in the DW. We need to join the vAlertResolutionState tables that contains an history of all state changes, we can assume the last recorded state change is the current alert resolution state:

select ResolutionState from  Alert.vAlertResolutionState  ARS where ARS.AlertGuid=’some alert guid’ AND ARS.StateSetdateTime = (select max(StateSetDateTime) form Alert.vAlertResolutionState  where AlertGuid=ARS.AlertGuid)

I’m curious if this is a problem of mine or if it is a generalized issue. If you want to take a try here are the queries, they assume the liveDb and DW are on the same SQL server using default names.

Alerts modified in the last 30 days present on LiveDB but missing on DW or present on DW with a different resolution state

select DW.ResolutionState ‘DW’
,A.ResolutionState ‘Live’, A.*
from dbo.AlertView A
left join (
select A.AlertGuid, ARS.ResolutionState
from OperationsManagerDW.Alert.vAlert A
inner join (
select MAX(R.StateSetDateTime) As ‘StateSetDateTime’,
A.AlertGuid from  OperationsManagerDW.Alert.vAlert A
inner join  OperationsManagerDW.Alert.vAlertResolutionState R WITH(NOLOCK) on A.AlertGuid=R.AlertGuid
group by A.AlertGuid) AL on AL.AlertGuid=A.AlertGuid
inner join  OperationsManagerDW.Alert.vAlertResolutionState ARS on ARS.AlertGuid=A.AlertGuid AND ARS.StateSetDateTime=AL.StateSetDateTime
) DW on DW.AlertGuid=A.Id
where A.StateLastModified>DATEADD(dd,-30, getutcdate())
AND ISNull(DW.ResolutionState, 189) <> A.ResolutionState

In my experience you’ll find in this list alerts whose creation time is older than the grooming days set in the DW. Example: if you groom alerts out of DW after 90 days and you have an unresolved alert that’s been created more then 90 days in the past it won’t be in the DW even if it’s still present in the LiveDB. As a matter of fact the grooming procedure in the DW doesn’t take into account the resolution state of the alert nor the DW last modified date (this immo should be the field to use), but it just checks on the creation date:

SET @Statement = ‘INSERT #AlertGroom (AlertGuid)’
               + ‘ SELECT TOP ‘ + CAST(@MaxRowsToGroom AS varchar(15)) + ‘ AlertGuid’
               + ‘ FROM ‘ + QUOTENAME(@SchemaName) + ‘.’ + QUOTENAME(@MainTableName)
               + ‘ WHERE ([DateTime] < CONVERT(datetime, ”’ + CONVERT(varchar(50), @CutoffDateTime, 120) + ”’, 120))’

 

Alerts modified in the last 30 days not closed in DW but missing in LiveDB or present with a different resolution state

select DW.ResolutionState ‘DW’
,DW.AlertGuid, A.ResolutionState ‘Live’, A.Name, DW.*

from (
select ARS.ResolutionState, ARS.StateSetDateTime, A.*
from OM2012DW.Alert.vAlert A
inner join (
select MAX(R.StateSetDateTime) As ‘StateSetDateTime’,
A.AlertGuid from OperationsManagerDW.Alert.vAlert A
inner join OperationsManagerDW.Alert.vAlertResolutionState R WITH(NOLOCK) on A.AlertGuid=R.AlertGuid
group by A.AlertGuid) AL on AL.AlertGuid=A.AlertGuid
inner join OM2012DW.Alert.vAlertResolutionState ARS on ARS.AlertGuid=A.AlertGuid AND ARS.StateSetDateTime=AL.StateSetDateTime
) DW
left join dbo.AlertView A on A.Id=DW.AlertGuid
where 
DW.ResolutionState<255 AND
DW.DWLastModifiedDateTime>DATEADD(dd,-30, getutcdate())
AND DW.ResolutionState <> IsNull(A.ResolutionState, 198)

In my experience you’ll find alerts present in DW but not present in LiveDB (Live = Null), or alerts present in DW with a resolution state <> 255 and present in LiveDB with resolution state=255 (these are recent alerts not yet purged from liveDB). This means there are situations where alerts are closed/deleted from the liveDB and this change is not propagated to the DW. I observed this behavior in the following cases (but I fear the list is not exhaustive):

– an entity with open alerts is deleted

– an alert state change occurs within the same second and/or is written at the same time in the liveDB such as it is not able to properly order the change (for example it sees a close and an open). This has an incidence of about 5% on mismatched alerts

The latter is the more intriguing of all, since I discovered the DW sets the StateSetDateTime to the live db TimeResolutionStateLastModifiedinDB or TimeAdded and not to the TimeResolutionStateLastModified as I was expecting (this is a bug immo)

from the liveDB

image 

from vAlertResolutionState in DW

image

So I conclude that every alert is written in the DW (almost… I had a few exceptions), but not every alert closure state change is recorded in the DW, from observations other state changes are correctly recorded. Same observation on the current beta of OpsMgr 2012. This means you cannot use your DW data to produce reports like: all active (not closed) alerts since you’ll have false positives (i.e. alerts reported as active that in reality are closed)

I will followup with further observations and if possible workarounds while I complete my alerting reporting task.

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Advertisements
  1. #1 by Ramesh on October 13, 2011 - 6:26 am

    Daniele – I ran these queries in my environment and observed the following:

    1. Alerts modified in the last 30 days not closed in DW but (missing in LiveDB or present with a different resolution state)

    [Returned 1041 rows]

    2. Alerts modified in the last 30 days present on LiveDB but (missing on DW or present on DW with a different resolution state)
    [Returned 46 rows]

    What does it mean to me? Also for a particular ‘ProblemId’ I verified that the DW does not set the StateSetDateTime to the live db TimeResolutionStateLastModifiedinDB or TimeAdded as you’ve mentioned. Appreciate if you could highlight more on the above queries and their results discrepancies

    Regards
    Ramesh

    • #2 by Daniele Grandini on October 14, 2011 - 8:13 am

      Hi Ramesh,
      first of your findings confirm mine. What this all means is that you can have statistical reporting for Alerts in the DW, but you cannot use the DW to answer questions like:
      – return all the alerts not closed
      – return all alerts with the xyz resolution state
      In particular the result of the first query tells you that many alert resolution state changhes are missing in the DW, how many? 1041. So you have 1041 alerts in the DW with an incorrect resolution state, it is highly probable all these 1041 alerts are to be considered closed.
      The results of the second query tell you, you have alerts still open in console that have been purged from the DW. The date raised for these alert shoud be far in the past, older than the alerts retention time in the DW.

      btw I’m still working on possible workaround avoiding a lookup on the LiveDB so that the report doesn’t introduce dependencies on such a db.
      – Daniele

  1. OpsMgr – Is alert reporting reliable? Reprise with some light. « Quae Nocent Docent
  2. OpsMgr – Is alert reporting reliable? Reprise « Quae Nocent Docent

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: