SCOM Data Warehouse grooming

So my data warehouse has reached the 150 GB size limit and it’s time to check if everything’s ok. Not that this size is too much, with about 300 agents and 170 MPs in production this is not huge. Nevertheless I learned not to be too confident on OpsMgr reliability and I don’t want my dw to grow too much.

My retention period has always been 13 months, but really I don’t need 13 months of performance data or health state. 6 months are more than enough for our constantly changing environment. I opened the StandardDataSetAggregation table and changed accordingly the MaxDataAgeDays field then I waited 24 hours before doing any check (another lesson I learned is not to be in a hurry with OpsMgr, even if the dw promises to groom data very often, about every 4 hours).

Alas, after this timeframe, my dw size was almost the same. I run a few sql statements on the common views (Alert.vAlert, Perf.vPerfHourly and so on) and I found out that my data was still there, I started to fear the grooming process wasn’t working. I was wrong.

First of all I tried to understand the inner workings of grooming, I would recommend these two posts:

From these we can summarize as follows:

  • the core of all the grooming stuff is dbo.StandardDataSetGrooming stored procedure
  • data is not groomed on a row by row basis, but instead, as soon as you have more than one partition for a given dataset, it is groomed based on a partition by partition basis. In this situation the current partition is never groomed at least until it becomes the last partition
  • data partitioning is not based on a age rule (i.e. one partition per day/week/month) but on a size rule on rows or allocated MB.  Every dataset/aggregation has it own max table size specified in dbo.StandardDatasetAggregationStorage

The net result is the MaxDataAgeDays needs to be read as “at least these days” + one data partition. The size of the data partition depends on the dataset configuration. This explains why I still have data older than the MaxDataAgeDays limit.

To be sure and to have a way to regularly check the health state of the grooming process  I developed a TSQL script, I’d like to share with the community.

The following tables are involved in the grooming process:


Basically the check takes into account every Dataset / Aggregation and calculates oldest and newest data in inactive partitions, oldest data in current partition (the partition that is being used to insert data), rows and size of the current partition, maximum row count and size configured for the dataset / aggregation.

- Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Technorati Tags: ,,

, ,

  1. #1 by Ivan on April 8, 2016 - 1:54 pm

    Still working for SCOM2012, thx a lot

  2. #2 by David on October 16, 2015 - 6:03 am

    Does not appear this TSQL is available in either location now… Do you still have a copy?

    • #3 by Daniele Grandini on October 16, 2015 - 7:08 am

      Here you go, haven’t used it in a while, so I don’t know if it still works in 2012 R2:

      DECLARE @DataSet TABLE (
      DatasetId uniqueidentifier NOT NULL
      ,AggregationTypeId tinyint NOT NULL
      ,DataSetName varchar(255) NOT NULL
      ,Aggregation varchar(255) NOT NULL
      ,MaxDataAge datetime
      ,OldestInArchived datetime
      ,OldestInInsert datetime
      ,LastDataInArchived datetime
      ,CurrentTable varchar(255)
      ,[RowCount] bigint
      ,Reserved int
      ,Data int
      ,IndexSize int
      ,Unused int
      ,MaxTableRowCount int
      ,MaxTableSizeKb int

      DECLARE @TableSize TABLE (
      DatasetId uniqueidentifier
      ,TableName sysname NOT NULL
      ,[RowCount] bigint
      ,Reserved varchar(30)
      ,Data varchar(30)
      ,IndexSize varchar(30)
      ,Unused varchar(30)

      insert @DataSet (DataSetId, DatasetName, AggregationTypeId, Aggregation, MaxDataAge)
      select SD.DatasetId, SD.SchemaName, SDA.AggregationTypeId, AggregationTypeDefaultName, DATEADD(d, -MaxDataAgeDays, GETDATE())
      from dbo.StandardDatasetAggregation SDA inner join dbo.StandardDataset SD on SDA.DataSetId=SD.DataSetId
      inner join dbo.AggregationType AT on SDA.AggregationTypeId = AT.AggregationTypeId

      update @DataSet Set LastDataInArchived=EndDateTime
      FROM @DataSet TS
      inner join StandardDatasetTableMap TM on TM.dataSetID=TS.datasetId AND TM.AggregationTypeID=TS.AggregationTypeID
      AND TM.EndDateTime = (select MAX(EndDateTime) FROM StandardDatasetTableMap TM1 where TM1.DatasetId=TM.DatasetID AND TM1.AggregationTypeId=TM.AggregationTypeID)

      update @DataSet Set OldestInArchived=StartDateTime
      FROM @DataSet TS
      inner join StandardDatasetTableMap TM on TM.dataSetID=TS.datasetId AND TM.AggregationTypeID=TS.AggregationTypeID
      AND TM.StartDateTime = (select MIN(StartDateTime) FROM StandardDatasetTableMap TM1 where TM1.DatasetId=TM.DatasetID AND TM1.AggregationTypeId=TM.AggregationTypeID)

      — assume just 1 insert table, as of today this is the case for every dataset
      — get the selection code from [dbo].[StandardDatasetAllocateStorage]

      @MapRowId int
      ,@StorageRowId int
      ,@TableRowId int
      ,@BaseTableName sysname
      ,@TableNameSuffix varchar(50)
      ,@TableName sysname
      ,@TableGuid uniqueidentifier
      ,@MessageText nvarchar(max)
      ,@TableStatement nvarchar(max)
      ,@NewTableGuid uniqueidentifier
      ,@NewTableSuffix varchar(50)
      ,@MaxTableRowCount int
      ,@MaxTableSizeKb int
      ,@TableTemplate nvarchar(max)
      ,@SchemaName sysname

      declare @InsMin TABLE (
      datasetid uniqueidentifier
      ,mindate datetime

      DECLARE dataset_cursor CURSOR
      FOR SELECT datasetID, AggregationTypeID FROM @DataSet
      declare @datasetid uniqueidentifier
      declare @AggregationTypeID tinyint
      declare @stmt varchar(1000)

      OPEN dataset_cursor
      FETCH NEXT FROM dataset_cursor INTO @datasetID, @AggregationTypeID


      SELECT TOP 1
      @MapRowId = StandardDatasetTableMapRowId
      ,@TableNameSuffix = TableNameSuffix
      ,@TableGuid = TableGuid
      FROM StandardDatasetTableMap
      WHERE (DatasetId = @DatasetId)
      AND (AggregationTypeId = @AggregationTypeId)
      AND (InsertInd = 1)
      –AND (StandardDatasetTableMapRowId > @MapRowId)
      ORDER BY StandardDatasetTableMapRowId

      SELECT TOP 1
      @StorageRowId = StandardDatasetAggregationStorageRowId
      ,@BaseTableName = BaseTableName
      ,@MaxTableRowCount = MaxTableRowCount
      ,@MaxTableSizeKb = MaxTableSizeKb
      FROM StandardDatasetAggregationStorage
      WHERE (DatasetId = @DatasetId)
      AND (AggregationTypeId = @AggregationTypeId)
      –AND (StandardDatasetAggregationStorageRowId > @StorageRowId)

      update @Dataset SET MaxTableRowCount=@MaxTableRowCount, MaxTableSizeKb=@MaxTableSizeKB
      where datasetid=@datasetid and aggregationtypeid=@aggregationtypeid
      @SchemaName = SchemaName
      FROM StandardDataset
      WHERE DatasetId = @DatasetId

      SET @TableName = QUOTENAME(@SchemaName) + ‘.’ + QUOTENAME(@BaseTableName + ‘_’ + @TableNameSuffix)
      INSERT @TableSize (TableName, [RowCount], Reserved, Data, IndexSize, Unused)
      EXEC sp_spaceused @TableName

      set @stmt = ‘SELECT ”’ + CAST(@datasetid As varchar(255)) + ”’, MIN(dateTime) from ‘ + @TableName
      insert into @InsMin
      execute (@stmt)

      Update @Dataset SET OldestInInsert=mindate from @dataset DS inner join @InsMin IM on DS.datasetid=IM.datasetid
      delete from @InsMin
      UPDATE @TableSize SET DataSetId=@datasetid

      UPDATE @DataSet SET Currenttable=@TableName,
      Reserved = CAST(REPLACE(TS.Reserved, ‘KB’, ”) AS int),
      Data = CAST(REPLACE(TS.Data, ‘KB’, ”) AS int),
      IndexSize = CAST(REPLACE(TS.IndexSize, ‘KB’, ”) AS int),
      Unused = CAST(REPLACE(TS.Unused, ‘KB’, ”) AS int)
      from @dataSet DS inner join @TableSize TS on DS.DatasetId=TS.datasetID
      where DS.Datasetid=@datasetId AND DS.AggregationTypeID=@AggregationTypeID

      delete from @TableSize

      FETCH NEXT FROM dataset_cursor INTO @datasetID, @AggregationTypeID

      CLOSE dataset_cursor
      DEALLOCATE dataset_cursor

      select * from @Dataset

      select * from @dataSet where MaxDataAge > LastDataInArchived OR [RowCount] > MaxTableRowCOunt OR MaxDataAge > OldestInInsert

  3. #4 by daniel queen on March 1, 2012 - 9:23 pm

    Great article! The link to the TSQL is broken though. Is there an alternate link?

  4. #6 by Ravi Gupta on July 22, 2009 - 9:33 am


    one question on DW DB grooming:

    let say:

    1. We have Data retention for alerts is 90 days.
    2. Fact: Maximum row count is 10Lac.
    3. After 90 days we have only 50k row in the table.

    i. Does it mean that Alert Data will not groomed until the table is filled with 10Lac rows?

    ii. What type of data is been represented by the columns Reserved, Data, & Unused?


    • #7 by Daniele Grandini on July 23, 2009 - 6:12 pm

      Hi ravi,
      yes you’re right the grooming process will truncate once it will fill up (10Lac row). Reserved stands for “allocated” so this is the space currently allocated for the table, data is the actual spce used for data and unused is the difference between the space allocated and the spce used. SQL allocates by discreet quantities.


  5. #8 by Jonathan Almquist on February 23, 2009 - 9:21 pm

    One suggestion, for the benefit of your future visitors. Format this post so the T-SQL script does copy into one long line. I had to break this up in Query Builder.

  6. #9 by Jonathan Almquist on February 23, 2009 - 9:19 pm

    This is a great idea. Thank you for sharing this with the community.

  1. A tale of Identity and bad agent behavior « Quae Nocent Docent
  2. 2010 in review « Quae Nocent Docent

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: