So we eventually completed our SC 2012 – Operations Manager upgrade and we have a lot of tales to tell and share.
In this installment I will try to outline changes in Sites and Gateways.
Just a little background before proceeding further.
Gateways are special purpose management servers: they are a contact point for agents, but do not talk directly with the SQL backend in fact they need to report to a standard management server. Gateways are useful with untrusted forests and for bandwidth optimizations for remote locations.
Sites are a construct OM inherited from System Center Essentials and are basically a way to group resources managed by a one or more gateways. They are an handy way to organize managed objects in different locations (both geographical or logical), plus every alert gets stamped with the generating site. Sites are useful for console scoping and alert management.
To create a site (without using the SDK) you need to create a gateway, more a site can only be created at gateway approval time, this the command line to create MySite site associated with dummygw.dom.lab gateway:
Sites are slightly changed in OM2012, nothing you can get in the UI, but the code has been reviewed and a new relationship of type Microsoft.SystemCenter.SiteContext has been added to track site memberships.
This change has an important implication (probably a bug, but not confirmed yet): you must avoid orphaned sites. Orphaned sites are sites without any gateway associated with. If you set yourself in this situation you will mess the entire sites relationships and all managed objects will be set as members of the orphaned site.
You can detect you’re in this situation checking the groups hierarchy and the site property of your alerts. The group hierarchy in a normal situation is like this:
While if you’re having the issue it will have the orphaned site as the root element:
This is where things get interesting, aren’t they?
Avoiding orphaned sites
The first thing to think about is how to avoid an orphaned site, this is pretty easy don’t (ever) delete a gateway associated with a site from Operations Console.
This implies you have to delete the gateway and its associated site with the same tool you used to create them:
And… you’ll be surprised (maybe not if you’re a fellow QND reader) it doesn’t work:
This is not a new bug, in fact the exact same issue was present in OM2007R2 sigh (Gateway decomissioning post updated). The recovery from this specific error is similar, but need some updates due to the fact the DB schema has changed.
Before going on you just remember the following database updates and queries
ARE TOTALLY UNSUPPORTED IF YOU DO IT, YOU DO IT AT YOUR OWN RISK
The following update will remove the relationship HealthServiceShoudManageEntity between the gateway and site, so that the command line can be executed successfully:
Get out of troubles
Now, let’s face the case where you deleted the gateway from the Console and you’re in troubles. Here again some SQL queries will bring the day home. First thing to know is which is the orphaned site, obviously you can get it in Console in the groups view, but chances are that you can have multiple orphaned sites, so better have a comprehensive list.
The sites list is in the following tables: MT_Site if you upgraded from OM2007, MT_Microsoft$SystemCenter$Site if you have a clean OM2012 installation. In the rest of the post I will use MT_Site in my queries ‘cause it is shorter J. I tested the query in both scenarios fresh install and upgraded DB.
The sites not listed by the following query are orphaned
Now that we have a list of orphaned sites, we have basically two options:
1. Add a dummy gateway for each site and then remove the gateway following the steps I outlined
2. Delete the site from the Operations Manager database
Option 1 should be clear by now, to implement option 2 we need to unlink the site from the discovery rule, this way Operations Manager “knows” it must remove the managed entity (the site itself) and replicate the removal to the data warehouse. To do this we must use the p_RemoveEntityFromDiscoverySourceScope stored procedure and pass the TypedManagedEntityId and DiscoverySOurceId for our site. Incidentally the DiscoverySourceId is sdk generated so you won’t find any actual discovery rule associated (Discovery table join on DiscoveryId field), this is appears to be a constant ’85AB926D-6E0F-4B36-A951-77CCD4399681′, getting the TypeManagedEntityId is then just a matter of proper table joining
I strongly suggest you execute the first SELECT statement with SELECT * to check for the returned data.
One last bug
Once you get rid of all the orphaned sites you’ll see your groups hierarchy returning flat (almost), but you’ll have an incorrect assignment for alerts for about two days. The statement above marks the objects for deletion (IsDeleted=1), but it doesn’t physically remove them. The internal grooming process will take care of it, right now it hard deletes object deleted from more than 48 hours. Sadly the p_InsertAlert stored procedura has a bug when it relates the alert to the site, in fact it doesn’t take into consideration the IsDeleted flag. The stored procedure should filter out logically deleted items, but it does not.
Supporting queries and other info
Troubleshooting the issue I used several supporting queries, I list them all for future reference
The process that creates the SiteContext relationship is “SC DAL – EDP”