CSV monitoring with #scom #sysctr reprise


Things never come easy, I should have known better this Cluster Shared Volume (CSV) monitoring story would lead to more work to be done. But, after all, this is what Quae Nocent Docent (what hurts teaches) is here for.

In my last post and project on CSV monitoring I started from a wrong assumption based on faulty observations: the cluster group keeps count of everything happens on CSVs. This is not the case and (now) it’s obvious that CSV access is indeed distributed across nodes, every node knows its own part of IOs and responsiveness for the shared volume. As you can see in the following screenshot the total CSV IO is the sum of the IOs on the nodes (2A, 2B) while the cluster group network name just collects the IOs from the node it is currently hosted on.

clip_image002

Standard cluster disk aren’t on the contrary, shared by multiple nodes, so it’s correct to use the owning group network name to collect performance and monitor status.

The net result is I had to rewrite the management pack from scratch modifying the health model to take into account this distributed model. The first consequence is that the new management pack is not backward compatible with the previous one: the old one must be removed before installing version 1.0.0.100 and above. The second consequence is I got in touch with Sergei Jemeljanov who added some views to the original management pack and suggested to set all the monitors and rules disabled by default (with the exception of discovery rules) and then deliver a couple of overrides management pack to enable the monitoring. As a result the management pack is now available not only on Technet Gallery, but as a codeplex project you are encouraged to contribute to.

I had the following requirements:

– integrate the new health model with the existing one, where the CSV is monitored by the cluster group in terms of availability and space

– CSV state must contribute to Windows Cluster state

– Since CSV access is distributed it’s handy to have a container object that sums up all the contributing nodes for the specific CSV. This way performance views, dashboards and reports for each and every CSV can be easily setup starting with this container.

– I don’t want, by default, to be alerted from every single node when a CSV has performance issues. Rather I need a single alert and then I must be able to use health explorer to drill through the issue.

In the following diagram the green boxes are the classes added by the management pack and the red arrows are the added relationships.

clip_image004

The resulting health model, starting with the cluster object, rolls up to the availability and performance nodes. The availability is based on the free space monitoring delivered by the Microsoft Management Pack, the performance node is built from QND performance monitors. Both have a group for every single CSV and for the performance node the performance perspective from every contributing node.

clip_image006

The management implements the following collections and monitors:

  • Cluster Disk – Monitor – Disk latency
  • Cluster Disk – Collection- Overall disk response time (sec/transfer)
  • Cluster Disk – Collection- Disk reads per second
  • Cluster Disk – Collection – Disk writes per second
  • Cluster Disk – Collection – Free MB
  • Cluster Disk – Collection – Free space %
  • CSV – Monitor – Disk Read Latency
  • CSV – Monitor – Disk Write Latency
  • CSV – Monitor – Disk Redirected Read Latency
  • CSV – Monitor – Disk Redirected Write Latency
  • CSV – Collection – Disk Read Latency
  • CSV – Collection – Disk Write Latency
  • CSV – Collection – Disk Reads per second
  • CSV – Collection – Disk writes per second
  • CSV – Collection – Disk Redirected Read Latency
  • CSV – Collection – Disk Redirected Write Latency
  • CSV – Collection – Disk Redirected Reads per second
  • CSV – Collection – Disk Redirected writes per second

The management pack adds a few views under the Microsoft Windows Cluster folder tree

clip_image007

Under the “CSV State” view all the CSVs are listed with the contributing nodes and specific monitoring scope:

clip_image009

Release Notes.

The management pack monitors CSV starting from Windows Server 2012 and above. I don’t have a Windows Server 2008 R2 cluster to test the MP on, so I set the discovery to consider only Windows Server 2012 and above.

The management pack badly misses reporting. Given the distributed nature of CSVs the standard performance reporting falls short. We need a report that sums IOs across nodes and compute a weighted average across nodes for response times. More work to be done.

– Daniele

This posting is provided “AS IS” with no warranties, and confers no rights.

Advertisements
  1. #1 by Janez on May 12, 2016 - 4:01 pm

    Hi Daniele,
    do you have any information about this MP monitoring Windows 2008 R2 Cluster. In my case we have mount point cluster disks but no performance collection is done. I also import this mp in environment where CSV are present and it is working fine. CSV is also discovered as \\?\Volume{2a8869b9-eb1d-11e5-a3ed-d8d385ab6a54}
    Could the problem be with 2008 R2?
    This performance rules would really make things easier because we are investigating some performance issues in SQL.
    Thanks

    • #2 by Daniele Grandini on May 13, 2016 - 4:32 pm

      Hi Janez,
      the MP has been tested only on 2012R2 and specifically on Hyperv systems, it should work on any CSV disk sa far as it is on 2012R2 and it is not remote (i.e. no SOFS support yet)

      – Daniele

  2. #3 by Sergei on April 20, 2014 - 2:12 am

    didn’t know u have plans to release new version too soon. i have more ideas how to add/update the views, will do this later, as at the moment i short on time.

    but thank’s for updated version.

    • #4 by Janez on May 14, 2016 - 4:03 pm

      Hi Daniele,
      thanks for quick reply. Yes in customer environment where CSV are on server 2012 and Hyper-V is present it is working fine.
      It is strange to me because I see this mount point disk under discovered inventory as CLUSTER DISK. Also some performance (free space MB and %) are collected from Windows Server Cluster Disk Monitoring MP but none from QND Adendum MP.
      Maybe the target in the QND rule should be different as is now ‘$Target/Property[Type=”ClusterDisk!Microsoft.Windows.Server.ClusterDisksMonitoring.ClusterDisk”]/PartitionName$’

      Do you have any advice regarding this?
      Janez

      • #5 by Daniele Grandini on May 20, 2016 - 4:53 pm

        Free Space is collected using a script, while other metrics through perf counters. You need to check if in perfmon you have the required counters for CSVs and if the instance name is the CSV mount point path.

  1. NeWay Technologies – Weekly Newsletter #91 – April 18, 2014 | NeWay
  2. NeWay Technologies – Weekly Newsletter #91 – April 17, 2014 | NeWay

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: