The case of post dated monitoring data


A few week ago a customer of ours has been hit by a time issue, the internal time reference server jumped to anno domini 2020. Since this happened during the weekend it took a few hours to be fixed, in the meantime opsmgr agents did their job posting data to the opsmgr infrastructure. The net result was a bunch of monitors unhealthy with last changed time set to 2020. Annoying I thought I need to reset all the broken monitors via powershell. Alas this was not only annoying, but blocking as well. Monitors won’t reset nor their state change in any case.

First consideration the opsmgr data access layer should block any data insertion with a too large time skew from its own date time reference (10’ to 15’ should be the maximum threshold immo).

Anyway, time for some reverse engineering once again.

First of all a few queries to identify the bogus state data. We have two tables involved here StateChangeEvent and State. The former collects all event state events, the ones you can check in health explorer, the latter reports the last known state for any given managed entity / monitor pair.

Easy enough, let check for all data updated after December 31st

select * from dbo.StateChangeEvent
where TimeGenerated > ’12-31-2009′

select ME.FullName, M.MonitorName, State.* from dbo.State with (nolock)
inner join dbo.BaseManagedEntity ME with (nolock) on ME.BaseManagedEntityId=State.BaseManagedEntityId
inner join dbo.Monitor M with (nolock) on M.MonitorId=State.MonitorId
where State.LastModified > ’12-31-2009′

Obviously my first though has been lets modify the LastModified field, but here we’re in the unsupported realm and before any mods a further analisys of the insight working needs to be accomplished. The core stoerd procedure for any stage change turned to be 

PROCEDURE [dbo].[p_StateChangeEventProcess]
(      
    @BaseManagedEntityId uniqueidentifier,
    @EventOriginId uniqueidentifier,
    @MonitorId uniqueidentifier,
    @NewHealthState tinyint,
    @OldHealthState tinyint,    
    @TimeGenerated datetime,
    @Context nvarchar(max) = NULL
)

this one in turns calls

PROCEDURE [dbo].[p_StateUpsert]
(
    @BaseManagedEntityId uniqueidentifier,
    @MonitorId uniqueidentifier,
    @HealthState tinyint,
    @LastModified datetime
)

and if p_StateUpsert returns with success it will insert a row in the StateChangeEvent table.

p_StateUpsert, among other checks, sets a control on the state update date time if it is earlier in the timeline respect the last time a monitor state has been updated the state change is discarded. This makes sense since state change are not guaranteed to arrive in chronological order. At the same time without a control on a time skew we can have a dos here.

Anyway from my analysis the LastModified field can be safely changed (still unsupported realm):

update dbo.StateChangeEvent set TimeGenerated=TimeAdded
where TimeGenerated > ’12-31-2009′

update dbo.State set LastModified = GETUTCDATE()
where State.LastModified > ’12-31-2009′

From this change on state changes will restart to flow in.

Issues: monitor needs to be reset or you must wait for the first state change for them to be updated or you could use Marius’ utility Tool- OpsMgr 2007 – RuntimeHealthExplorer or you could use a powershell script to reset all the postdated monitors. The basic statements need to be:

$obj = Get-MonitoringObject -id:<<basemanagedentityid from previous queries>>

$obj.ResetMonitoringState([guid]’<<monitorid from previous queries’)

 

Last Warning: if you reset the monitor from UI and from a Watcher view then the new healthstate won’t rollup (at least in my env). For example if you reset any unit monitor related to a HealthService starting from the Health Explorer for the related HealthServiceWatcher, the unit monitor will reset but the new status won’t rollup. If you do the same reset from the HealthService Health Explorer view it will rollup.

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

About these ads
  1. #1 by Daniele Muscetta on November 22, 2009 - 10:42 am

    You notice this, because states won’t reset, etc… events and performance data collected in that timeframe are also stored, and they won’t be groomed for 11 years, you you don’t change their dates too…

    I did raise the “input validation” issue to the Product Group in the past… :-(

    • #2 by Daniele Grandini on November 25, 2009 - 6:17 pm

      Daniele, as you said before: you’re not alone ! :-)

  1. Post dated data – reprise « Quae Nocent Docent

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 277 other followers

%d bloggers like this: