#DPM 2012 Addendum management pack #sysctr

Today is an exceptionally snowy day, power is on and off again, lovely day to write something using my surface battery power. Today subject is DPM SLA monitoring.

In my experience monitoring is just a matter of perspectives. Whichever is yours, it’s always useful to have very actionable alerts and begin able to predict systems behavior in such a way adequate capacity can be provisioned and service loss avoided.

System Center Data Protection Manager (DPM), after a long period of under development and tough deployments, raised to new heights in development priorities. In fact if you consider the data protection story from Microsoft for the hybrid cloud you can see tons of new features, starting from Azure Backup, Azure Site Recovery, DPM support for cloud storage, DPM support for data deduplication, DPM support for Azure based protection and much more.

One of the characteristics of DPM is the ability to self repair, sometimes recovery points fail, but at the following runs they perform just fine. The current DPM Central Console leveraging Operations Manager is a great place to consolidate all the DPM alerts and performance statistics, with Update Rollup 4 first and with Update Rollup 5 now a new monitoring scenario has been introduced: SLA based alerting.

SLA Based alerting and more

SLA based alerting leverages the new capability of defining a service level for every protection group. The SLA is set in hours using powershell, if a data source has a recovery point older than the defined SLA an alert is raised. This check is done daily.

The new cmdlet are:

  • Set-DPMProtectionGroupSLA
  • Get-DPMProtectionGroupSLA

On the ProtectionGroup (Microsoft.Internal.EnterpriseStorage.Dls.UI.ObjectModel.OMCommon.ProtectionGroup) class we have two new methods to accompllish the same results:

  • SetProtectedGroupSLAForAlert
  • ReadProtectedGroupSLAForAlert

If no SLA is set the value is 0, any other value is the maxiumum age of any data source within the protection group in hours.

Setting a SLA is straightforward:

Get-DPMProtectionGroup | where {$_.Name –ieq ‘MyProtectionGroup’} | Set-DPMProtectionGroupSLA –SLAInHours 24

I stumbled into this new feature because I was thinking at a new way of monitoring DPM, to start I had three topics in my mind:

Incidentally the last two points are both addressed by the Protection group SLA scenario. However the out of the box feature doesn’t fit all our requiements:

  • It checks for any SLA breach once a day, we need a much more short check
  • It doesn’t disable the standard alerting, while we need to have just the SLA breaches for data sources related alerts

This led to a new addendum Management Pack. It can be downloaded here.

The MP documentation can be found here.

MP design choices

As usual every MP has its own challenges. In this case I had to be careful about the creations of new objects since the DPM MP instance space is pretty crowded, so much Microsoft advises a dedicated Management Group for large deployments. The second challenge is the new way of alerting should only apply to UR4 enabled DPM servers and only for those Protection Group with a SLA defined.

Eventually I decided to go with the following design choices:

  • I just added a new class Protection Group with SLA
    • Protection Group are a small numbers compared top data sources
    • Since I don’t define any additional key, I’m just extending the existing Protection Group class hence just adding properties (a single property) to existing class instances / object / managed entities
  • I then defined a group “Data sources with SLA”, basically all the data sources contained in Protection Groups with SLA. This class is used as a target for the overrides disabling old style alerting
  • Lastly another group complementary to the first one, “Data Sources without SLA” is used as a target for the override that disable the new alerting monitors on data sources that are contained in Protection group without a SLA defined (or on DPM servers not at UR4)

In this release I’ve been really prudent in disabling existing alerts, I need to work with my operations team to understand how to tune it. As usual any suggestion is welcome.

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: