Decommissioning a gateway server when used with Sites


** Updated on Feb 08, 2011 **

Our outsourced service based on OpsMgr uses a gateway infrastructure to monitor customers sites. Every customer has an appropriate number of gateways reporting to our data center. We used the site concept to map our customers, so every customer is a site. When we approve the gateway we specify the site name. From time to time gateways need to be replaced, moved or added.

Replacing a gateway with a new one is the main topic of this article, and no, it’s not easy nor straightforward.

Let’s recap the scenario:

  • gateway server needs to be replaced
  • agents are assigned to the gateway at install time, no Active Directory integration
  • gateway is associated to a Site at approval time

The common action plan is the following:

  1. add a new gateway ad associate it to the same Site of the old one
  2. from OpsMgr console move the agents to the newly added gateway
  3. uninstall the old gateway
  4. remove the gateway approval with microsoft.enterprisemanagement.gatewayapprovaltool and action=delete

Adding a new gateway is a no brainer, the product documentation is clear enough so I won’t spend time on it. Moving the agent to a new MS is straightforward as well, and you can do it from the UI, right? Wrong.

Let’s start with a quick background on how Management Servers (MS) and agents work together, remember the gateway is a type of MS. When you change the MS agent for an agent, the old one removes the agent from the managed agents, the new one, on the other side, adds the agent to the authorized list. If the agent is not in the authorized list the MS won’t respond to it and it will register an event in the event log saying the agent is not authorized for this MS. Agents get their configuration information from the assigned MS a MS switch is part of this configuration. Now the question is: how can an agent learn it should refer to a new MS? In this situation simply it cannot. We have closed our agent out of the door. The agent cannot receive any configuration from old MS because it is not in the authorized list, so it won’t learn it should get its configuration from the new MS. I saw a few agents trying to refer to the RMS in this situation, saying something like "failing over to secondary MS (RMS)", I don’t know if this is the expected behavior to avoid this deadlock, but for sure it cannot work for gateways. Gateways are in untrusted domains their managed agents are not able to reach the RMS.

If you are in this situation, the first thing to do is to regain control of your agents, to do so you should reset your agents to the old gateway.

Now that we’re in control once again let’s do a different action plan:

  1. add the new MS to the secondary MS of the agents
  2. Wait for the new configuration to propagate (21025 event)
  3. Switch the primary and secondary server so that the new MS will become the primary one.
  4. Remove the old MS from the agents

This can be done via Active Directory integration or via powershell. Since we cannot use AD, let’s share a simple powershell script:

#old ms
$msp=Get-ManagementServer | where {$_.Name -eq ‘gw1.somedomain.it’}
#new ms
$ms=Get-ManagementServer | where {$_.Name -eq ‘gw2.somedomain.it}
$failoverServers = New-Object System.Collections.Generic.List“1"[[Microsoft.EnterpriseManagement.Administration.ManagementServer,Microsoft.EnterpriseManagement.OperationsManager,Version=6.0.4900.0,Culture=neutral,PublicKeyToken=31bf3856ad364e35]]"
$failoverServers.Add($ms)
$agents = $msp.GetAgentManagedComputers()
foreach ($a in $agents)
{
    $a.SetManagementServers($msp, $failoverServers)
}

The script add the new gateway ($ms) as a failover server to the agents managed by the old one ($msp). You can then test the move from UI. Once you have moved a few agents and checked their working as expected with the new gateway, you can move over the other agents. But first remember to give enough time to the agents to get the new configuration (they must learn the new failover MS list), if not you will close the agent out of the door once again.

# wait for 21025 on every MS and agent and reset the primary GW and the failover MS list
$agents = $msp.GetAgentManagedComputers()

foreach ($a in $agents)
{
    $a.SetManagementServers($ms, $null)
}

Steps 1 and 2 completed.

Removing the gateway binaries from the old gateway is a matter of running the uninstall procedure, by the way you can simply turn off the gateway and you’ll get the same effect. In either case you must manually remove the gateway from the approved gateways list of your OpsMgr management group.

Let’s move on to step 4. Using the gateway approval tool with /Action=delete we are supposed to be able to remove the gateway:

image

Interesting, isn’t it? We just removed all the agents from the gateway management space, but still we got this error, and the gateway stays there.

Using SQL trace is simple to track down the steps performed by the tool:

  1. it looks for a Microsoft.SystemCenter.GatewayManagementServer with the given gateway name
  2. it looks for relationships of type Microsoft.SystemCenter.HealthServiceCommunication and Microsoft.SystemCenter.HealthServiceShouldManageEntity related to the gateway

Replaying the queries becomes evident the gateway has still a relationship of type HealthServiceShouldManageEntity with the associated Site. To remove the gateway we must remove that relationship, but we have no supported way to do that, at least as far as I know. So this is my *unsupported* way to decommission a gateway when it has been associated with a Site.

First check if we really are in the situation where the only relationship still in place is the one related to the site (if not we must check what’s left). For this reason I will split the query in two, the first part will return a list of relationships and the second one will mark the relationship deleted:

declare @nodeHS nvarchar(255)

Set @nodeHS=N’Microsoft.SystemCenter.HealthService:gatewayfqdn’

SELECT DSR.DiscoverySourceId, DSR.RelationshipId, RGV.SourceMonitoringObjectFullName, RGV.TargetMonitoringObjectFullName,
RGV.MonitoringRelationshipClassId

FROM dbo.RelationshipGenericView RGV
inner join dbo.DiscoverySourceToRelationship DSR on DSR.RelationshipId=RGV.Id
WHERE ((RGV.[MonitoringRelationshipClassId] = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceShouldManageEntity()) AND (((RGV.[IsDeleted] = 0))))

AND (RGV.[SourceMonitoringObjectId]

IN (select BaseManagedEntityId from BaseManagedEntity where FullName =@NodeHS))

If we have just one releationship left with the site, we can mark the relationship deleted (unsupported, you know):

declare @utc datetime
Set @utc = GETUTCDATE()
exec dbo.p_RemoveRelationshipFromDiscoverySourceScope @RelationshipID=’RelationshipID returned from previous query’,
    @DiscoverySourceId=’Discdovery Source GUID returned from previous query’, @TimeGenerated=@Utc

In conclusion, decommissioning and replacing a gateway server has some caveats, if you add to the equation a site relationship then you need to perform the above hack.

Let me know if this works for you.

For future reference I list the queries performed by the gateway approval tool:

exec sp_executesql N’– MTV_SelectProperty_c1721bcc-35f7-5a49-5d5f-6880687c3d48 <ManagedTypeId,PrincipalName0>

SELECT [MTV_HealthService].[BaseManagedEntityId], [MTV_HealthService].[DisplayName_55270A70_AC47_C853_C617_236B0CFF9B4C], [MTV_HealthService].[ActionAccountIdentity], [MTV_HealthService].[ActiveDirectoryManaged], [MTV_HealthService].[AuthenticationName], [MTV_HealthService].[CreateListener], [MTV_HealthService].[HeartbeatEnabled], [MTV_HealthService].[HeartbeatInterval], [MTV_HealthService].[InstalledBy], [MTV_HealthService].[InstallTime], [MTV_HealthService].[IsAgent], [MTV_HealthService].[IsGateway], [MTV_HealthService].[IsManagementServer], [MTV_HealthService].[IsManuallyInstalled], [MTV_HealthService].[IsRHS], [MTV_HealthService].[MaximumQueueSize], [MTV_HealthService].[MaximumSizeOfAllTransferredFiles], [MTV_HealthService].[PatchList], [MTV_HealthService].[Port], [MTV_HealthService].[ProxyingEnabled], [MTV_HealthService].[RequestCompression], [MTV_HealthService].[Version], [MTV_HealthService].[AutoApproveManuallyInstalledAgents_9189A49E_B2DE_CAB0_2E4F_4925B68E335D], [MTV_HealthService].[ManagementServerSCP_9189A49E_B2DE_CAB0_2E4F_4925B68E335D], [MTV_HealthService].[NumberOfMissingHeartBeatsToMarkMachineDown_9189A49E_B2DE_CAB0_2E4F_4925B68E335D], [MTV_HealthService].[ProxyAddress_9189A49E_B2DE_CAB0_2E4F_4925B68E335D], [MTV_HealthService].[ProxyPort_9189A49E_B2DE_CAB0_2E4F_4925B68E335D], [MTV_HealthService].[RejectManuallyInstalledAgents_9189A49E_B2DE_CAB0_2E4F_4925B68E335D], [MTV_HealthService].[UseProxyServer_9189A49E_B2DE_CAB0_2E4F_4925B68E335D], [MTV_HealthService].[WebConsoleUrl_F9069CA9_A790_E274_0C2C_DE210E57F67C], [MTV_HealthService].[SiteId_CECAAFDA_33B6_B628_0CDA_445E21B7291D], [MTV_HealthService].[SiteName_CECAAFDA_33B6_B628_0CDA_445E21B7291D], [MTV_HealthService].[PrincipalName] FROM dbo.[MTV_HealthService]

INNER JOIN dbo.[TypedManagedEntity] AS TME ON TME.[BaseManagedEntityId] = [MTV_HealthService].[BaseManagedEntityId]

WHERE (MTV_HealthService.[PrincipalName] LIKE @PrincipalName0) AND (((TME.[ManagedTypeId] = @ManagedTypeId)))’,N’@ManagedTypeId uniqueidentifier,@PrincipalName0 ntext’,@ManagedTypeId=’C1721BCC-35F7-5A49-5D5F-6880687C3D48′,@PrincipalName0=N’gateways.somedomain.it’

— TypeID ‘C1721BCC-35F7-5A49-5D5F-6880687C3D48’ = Microsoft.SystemCenter.GatewayManagementServer

exec sp_executesql N’– RelationshipWithCriteria <TargetEntityId0,IsDeleted0,RelationshipTypeId0>

SELECT [Relationship].[RelationshipId], [Relationship].[TargetEntityId], [Relationship].[IsDeleted] FROM dbo.Relationship

WHERE Relationship.[TargetEntityId] = @TargetEntityId0 AND Relationship.[IsDeleted] = @IsDeleted0 AND Relationship.[RelationshipTypeId] = @RelationshipTypeId0′,N’@TargetEntityId0 uniqueidentifier,@IsDeleted0 bit,@RelationshipTypeId0 uniqueidentifier’,@TargetEntityId0=’FC75E426-26C5-B237-FE9F-F14F540CCB0E’,@IsDeleted0=0,@RelationshipTypeId0=’37848E16-37A2-B81B-DAAF-60A5A626BE93′

— Relationship Microsoft.SystemCenter.HealthServiceCommunication

exec sp_executesql N’– RelationshipWithCriteria <SourceEntityId0,IsDeleted0,RelationshipTypeId0>

SELECT [Relationship].[RelationshipId], [Relationship].[SourceEntityId], [Relationship].[IsDeleted] FROM dbo.Relationship

WHERE Relationship.[SourceEntityId] = @SourceEntityId0 AND Relationship.[IsDeleted] = @IsDeleted0 AND Relationship.[RelationshipTypeId] = @RelationshipTypeId0′,N’@SourceEntityId0 uniqueidentifier,@IsDeleted0 bit,@RelationshipTypeId0 uniqueidentifier’,@SourceEntityId0=’FC75E426-26C5-B237-FE9F-F14F540CCB0E’,@IsDeleted0=0,@RelationshipTypeId0=’2F71C644-E092-B80A-040B-5C81BA1EC353′

— Relationship Microsoft.SystemCenter.HealthServiceShouldManageEntity

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Advertisements
  1. #1 by Raphael Burri on March 20, 2012 - 1:12 pm

    Thanks Daniele; I’d read the post when it was new and then never used it – but now it just saved me some time as it’s still an issue on SCOM2012 (RC at least). Since the DB has slightly changed I needed to tweak the query slightly, though.

    Here’s the “not so elegant” query to list the relationships between ‘Site’ and ‘Gateway’ on SCOM2012:

    declare @nodeHS nvarchar(255)

    Set @nodeHS=N’Microsoft.SystemCenter.HealthService:___GATEWAY_FQDN_HERE___’

    SELECT
    *
    FROM dbo.Relationship REL
    WHERE
    (REL.[IsDeleted] = 0)
    AND (REL.[SourceEntityId] IN (select BaseManagedEntityId from BaseManagedEntity where FullName =@NodeHS))
    AND REL.TargetEntityId in (
    — get any sites
    select BaseManagedEntityId from dbo.MTV_Microsoft$SystemCenter$Site
    )

  2. #3 by Daniele Muscetta on December 13, 2009 - 9:26 am

    E ancora una volta (sempre piu’ frequente…): evviva il reverse engineering :-)
    Grande Daniele!

  1. Gateway decomissioning post updated « Quae Nocent Docent

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: