This is indeed an uncommon scenario, but still it happens: DCs get demoted and not reinstalled. What would you expect from OpsMgr? I guess to simply catch up with the new situation and remove all the reference to the DC roles from the computer namespece. But, as you can see in many newsgroup post, this is not the case. This has been for a while in my shortlist of issues to be investigated. (btw if you are in this situation you can override the DC role discovery to disable it for the given instance).
So I started my investigation to follow the discovery chain for a generic Windows 2003 AD domain controller, this is what I found.
The DC role discovery (Microsoft.Windows.Server.2003.AD.DomainControllerRole.DCDiscovery) is targeted to Microsoft.Windows.Server.DC.Computer, the discovery of the latter is in turn targeted to Microsoft.Windows.Computer. So a Windows Computer is first checked for being a generic DC (via a WMI discovery) and then if this is the case the exact role and version is discovered by a version specific discovery script.
<Discovery ID="Microsoft.SystemCenter.DiscoverWindowsServerDCComputer" Comment="Discover Windows Domain Controller Computers" Enabled="true" Target="Windows!Microsoft.Windows.Computer" ConfirmDelivery="false" Remotable="true" Priority="Normal">
<Discovery ID="Microsoft.Windows.Server.2003.AD.DomainControllerRole.DCDiscovery" Enabled="onEssentialMonitoring" Target="Windows!Microsoft.Windows.Server.DC.Computer" ConfirmDelivery="false" Remotable="false" Priority="Normal">
- a Microsoft.Windows.Server.DC.Computer is a Windows Server computer who in turn is a Windows Computer
- a Microsoft.Windows.Server.2003.AD.DomainControllerRole is a Computer Role (inherits from Microsoft.Windows.ComputerRole and so inherits the hosting relationship between Windows Computer and ComputerRole)
<RelationshipType ID="Microsoft.Windows.ComputerHostsComputerRole" Accessibility="Public" Abstract="false" Base="System!System.Hosting"> <Source>Microsoft.Windows.Computer</Source> <Target>Microsoft.Windows.ComputerRole</Target> </RelationshipType>
This means that I should a have, for my ghost DC, a Windows Server DC Computer and a Domain Controller (role), but this is not the case. The Windows Server DC Computer has gone but the Domain Controller role is still there with all the related and derived services. I always assumed that when a hosting instance is deleted all the hosted instances are removed as well. In fact if I just remove the Windows Server DC Computer I lose the target from my DC role discovery, if the discovery doesn’t run I don’t have a chance to remove the discovered instances. Can this be the case? If so we have a serious issue with discovery when a hosting instance is removed and the hosted instances do not get a chance to un-discover.
To check this I built a simple MP that using the registry provider discovers this classes hierarchy:
- ClassA is a Windows Computer
- ClassB is a Computer Role and the discovery is targeted to ClassA
- ClassC is a Computer Role and the discovery is targeted to ClassB
- ClassD is a generic Entity hosted by Windows Computer, the discovery is targeted to Windows Computer
- ClassE is a generic Entity hosted by ClassD, the discovery is targeted to ClassD
- ClassF is a Windows Computer
- ClassG is a generic Entity hosted by ClassF, the discovery is targeted to ClassF
The parent classes (A, D, F) discovery runs every 60 seconds, the "child" (not strictly) classes discovery runs every 301 seconds. This way I’m reasonably sure that parent classes discovery gets a chance to run and remove instances before chained discoveries run (or before the agent recognizes it has no parent instances and unloads the related workflows). You can find this mock MP at this link QND.Chained.Disco.Test.xml.
Test Case 1. ClassA and B (with C as an option) simulate the DC discovery behavior.
Test Case 2. ClassD and E use an hosting independent from the Windows Computer class, so I can remove such a dependency.
Test Case 3. ClassF and G use a Windows Computer host and a generic entity as the hosted class, so I can remove any dependency to ComputerRole class.
Test Case 1
The MP discovered the three classes, then I removed them one at time starting from ClassC up to ClassA, just to be sure the removal works. Waited for another discovery cycle and then removed ClassA… ClassB and ClassC remained (I waited half an hour to be sure) and inspecting the running workflows on the agent the discovery related to ClassB targeted to ClassA was (obviously) gone. Bad. If I remove ClassB (with ClassA still there) ClassC goes away as well. Fine.
Test Case 2
The MP discovered the two classes. If I remove ClassD, ClassE is removed. Fine.
Test Case 3
The MP discovered both classes. If I remove ClassF, ClassG remains there. Bad.
Huston we got a problem! From my test it seems that if a hosting class instance (inherited from Windows Computer) is un-discovered the hosted class instances are not. To me this doesn’t make sense, in fact if I have hosting relationships between different classes not involvin Windows Computer everything works as I’m expecting. The behavior could be randomic and maybe dependent to the delta time between the hosting class discovery (ClassA in my example or Windows Server DC Computer for DCs) and the hosted class discovery (ClassB and ClassC in my sample, Domain Controller Role for DCs). I didn’t dig so far. Basically what happens is the hosting class is removed while the hosted class are not, the discovery targetd to the host is unloaded and you get the orphans.