So my data warehouse has reached the 150 GB size limit and it’s time to check if everything’s ok. Not that this size is too much, with about 300 agents and 170 MPs in production this is not huge. Nevertheless I learned not to be too confident on OpsMgr reliability and I don’t want my dw to grow too much.
My retention period has always been 13 months, but really I don’t need 13 months of performance data or health state. 6 months are more than enough for our constantly changing environment. I opened the StandardDataSetAggregation table and changed accordingly the MaxDataAgeDays field then I waited 24 hours before doing any check (another lesson I learned is not to be in a hurry with OpsMgr, even if the dw promises to groom data very often, about every 4 hours).
Alas, after this timeframe, my dw size was almost the same. I run a few sql statements on the common views (Alert.vAlert, Perf.vPerfHourly and so on) and I found out that my data was still there, I started to fear the grooming process wasn’t working. I was wrong.
First of all I tried to understand the inner workings of grooming, I would recommend these two posts:
From these we can summarize as follows:
- the core of all the grooming stuff is dbo.StandardDataSetGrooming stored procedure
- data is not groomed on a row by row basis, but instead, as soon as you have more than one partition for a given dataset, it is groomed based on a partition by partition basis. In this situation the current partition is never groomed at least until it becomes the last partition
- data partitioning is not based on a age rule (i.e. one partition per day/week/month) but on a size rule on rows or allocated MB. Every dataset/aggregation has it own max table size specified in dbo.StandardDatasetAggregationStorage
The net result is the MaxDataAgeDays needs to be read as “at least these days” + one data partition. The size of the data partition depends on the dataset configuration. This explains why I still have data older than the MaxDataAgeDays limit.
To be sure and to have a way to regularly check the health state of the grooming process I developed a TSQL script, I’d like to share with the community.
The following tables are involved in the grooming process:
Basically the check takes into account every Dataset / Aggregation and calculates oldest and newest data in inactive partitions, oldest data in current partition (the partition that is being used to insert data), rows and size of the current partition, maximum row count and size configured for the dataset / aggregation.
This posting is provided "AS IS" with no warranties, and confers no rights.