** 25th July 2013 UPDATE ** the issue has been fixed with Update Rollup 3, so happy dashboarding **
Dashboards are a new UI addition introduced with System Center 2012 Operations Manager (OpsMgr), the goal was to build a complete new UI with same look, feel and behavior on every console (fat, web and sharepoint). The goal was right it’s just the implementation that falls short.
Anyone who tried the dashboards outside a lab or poc environment experimented with the sluggishness of the performance related views (widget). While attending the MVP Summit (virtually) I took a couple of hours off to drill through the issue. My aim was to give a task based dashboard with the basic performance indicators:
The dashboard included:
– Cpu usage
– Memory usage
– The response time for all disks
– The bandwidth usage for all NICs
Alas, the dashboard was so slow to be unusable and sometimes some counters weren’t displayed at all.
To begin my troubleshooting effort I started from the MP definition (Microsoft.SystemCenter.Visualization.Library) to rapidly get to the SQL Server side: in fact all the data interaction is performed with the Data warehouse using newly defined stored procedures with the SDK schema/prefix. Time for a good SQL Profiler session, I had all the information I needed to setup a proper trace. What I found has been astonishing, tens even hundreds of calls like the following for every dashboard refresh:
exec SDK.Microsoft_SystemCenter_Visualization_Library_SinglePerformanceDataSeriesGet@ManagementGroup=‘6934C4FC-5C84-2C84-C5A0-88726754720D’,@StartTime=‘2013-02-17 14:57:26.730’,@EndTime=‘2013-02-20 14:57:26.730’,@ManagedEntityGuid=‘CD4CBEB2-85A0-0F82-DFC3-12186EDB7F5A’,@PerfRuleInstanceRowId=132633,@NumberOfDataPoints=100,@RequestedDataPointType=4
It was clear something was wrong (badly wrong), not only the perf widget is slow it taxes the SQL engine as well. At the beginning of the trace another stored procedure was called, this time 4 times with different parameters: clearly once for every widget in the dashboard.
Trying to execute each one of these showed the culprit:
– The memory one returned just one row in no time
– The CPU two rows
– The network about 200 rows in 30”
– The disc about 5100 rows in more than 3’. The dashboard are supposed to refresh every 60”, ahem.
What was happening is that the stored procedure returned one row for every instance (and rule, more on this later) of the requested performance counter. The dashboard logic in turn, for every single row, executed the SinglePerformanceDataSeriesGet, which retuned data only for the instances effectively present on the computer targeted by the dashboard (to be precise present on the targeted managed entity). This is what I call a bad (very bad) design.
Time for some serious TSQL code review for SDK.Microsoft_SystemCenter_Visualization_Library_PerformanceCounterListByManagedEntityUsingContainerME.
The results of the review are embarrassing, in summary:
– Management pack versioning was not considered so multiple performance rule instances are returned for the same counter. This happens when the MP defining the rule has been upgraded one or more time, quite a common scenario in a production environment.
– Managed entity life time was not considered, so dismissed managed entity were taken into account, contributing to execution time.
– All the performance rule instances were retuned even if such instance doesn’t exist on the target managed entity
– Lastly the logic never consider the chance to have multiple collecting rules for the same performance counter
Not bad for a single stored procedure.
Before going further, I want to make clear the scope of post: I just modified the stored procedure needed for my specific goal, there are other similar stored procedures that, I guess, suffer of the same issues. I’m going to explain how I modified the stored procedure, if you want to go this way you just remember it is at your own risk.
Lastly I won’t post the complete TSQL, you must go through the process of modifying the stored procedure by yourself. Remember any update to the MP will overwrite the stored procedure.
Step 1. Create a script for ALTER of the stored procedure
Step 2. In SQL management studio rename the original stored procedure so that you can restore it if needed.
Step 3. Ready to modify the script the following are the required mod, they are all tagged with [QND] with a short description.
The first correction is to the statement that populates all the possible targets for the given performance counter, I added the check on validity and removed the explosion for all the instances
The second attack point is at the final select where we need to perform several task:
– Filter only valid instances for the targets
– Remove duplicated collection rules
– Consider only valid objects just in case something has skipped from previous queries
The net result is perf widgets dashboard have acceptable performances, now. For example the selection of disk instances who returned more than 5K rows in more than 3’ now for a specific server return 4 rows (one for each disk) in 5”, in turn this generates 4 stored procedure calls instead of the 5K of the previous implementation.
With such a design is no mystery (now) what performance dashboards are so slow.
This posting is provided “AS IS” with no warranties, and confers no rights.