VMM wrong Virtual Switch discovery


** Updated Feb 10, 2011 **

Eventually Microsoft fixed the issue with the February 2011 VMM hotfix rollup http://support.microsoft.com/kb/2492980

In the last month I’ve been engaged in some serious System Center Virtual Machine Manager (VMM) debugging. It all started with the impossibility to manage a virtual switch (VS) from VMM and ended in filing a bug for VMM.

This post is going to be quite long and will start from the problem evidence drilling down to VMM tracing and winrm debugging with some registry hack as well, so, as usual, I anticipate my conclusions.

The beginning of it all was a network mod on the hyper-v cluster, two separate NICs were joined in a team. One of these two NICs was bound to the virtual switch, the VS bind became invalid and it was reconfigured to bind to the newly created team. On the hyperv side this went perfectly fine, on the VMM side it broke the VS configuration on three nodes on four (hyper-v is in a failover cluster configuration). VMM blocked any virtual switch change, terminating in error.

Long story short, it turned out the switch discovery in VMM in presence of network teaming just sucks, or, if you want just doesn’t work. My hypothesis is this behavior will emerge every time you have multiple NICs with the same MAC address on a Hyper-v hosts. Mind hyper-v works perfectly, it is just VMM fault. Right now I don’t have any solution and I’m working with PSS to file a bug, I anything new arises I’ll update this post.

** August 31st Update **

Eventually MS PSS confirmed the bug, they anticipated a QFE will be released this fall (October? November?). At the moment we can just wait.

The issue

Let’s start from the VMM behavior, for any mods on the virtual switch the errors returned were 2915 and 12700

image

image

I immediately switched to hyper-v manager, but there things were working fine.

Further diagnosing the issue I found a curious mismatch between the VS configuration in Hyperv and in VMM (obviously Hyperv is right and VMM is wrong)

the VMM view

image

As you can see for Hyper-v the VS1-Inside is bound to Team #1, while for VMM it is bound to NIC5.

Debugging the VMM discovery and tracing winrm calls I arrived to a point where two external binds were returned instead of just one, I will skip all the tracing details, what VMM does is something similar to this:

  1. enumerate all possible external ports (

    [SELECT * FROM Msvm_ExternalEthernetPort WHERE EnabledState = 2])

  2. enumerate all the switches (wmi/root/virtualization/Msvm_VirtualSwitch)
  3. enumerate all ports
  4. enumerate all external ports (wmi/root/virtualization/Msvm_GlobalEthernetPortSAPImplementation)
  5. for every external ports, even if it is not associated to any VS, read all the properties
  6. and then on enumerating and getting the NICs properties (WIn32_NetworkAdapterConfiguration, Win32_NetworkAdapter, Win32_NetworkAdapterSetting)

The first thing I noticed is that in step #5, when I replayed the winrm query, two ethernet ports were returned, digging more and querying for Msvm_SwitchLANEndpoint I obtained two end points: the team and the NIC5 one.

It seemed I had an an orphan "Msvm_SwitchLANEndpoint". I started to quest for a way to delete it, nope I found no supported way to get rid of this orphan. So I had to delete the corresponding registry key under HKLM\System\CurrentControlSet\Services\VMSMP\Parameters\NICList on the Hyperv node. Alas, as it turned out this was just part of the issue, after I removed the orphan endpoint I could manage a little more but not everything, basically I just moved my target. Now I could accomplish simple VS operations but not changing the bind to another NIC, this what I got in my test plan:

1) deleted the VS – success

2) created a new VS bound to the NIC team – failure

image

image

image

Error (12700)
VMM cannot complete the Hyper-V operation on the XXXXX server because of the error: <INSTANCE CLASSNAME="Msvm_Error"><PROPERTY NAME="CIMStatusCode" TYPE="uint32"><VALUE>1</VALUE></PROPERTY><PROPERTY NAME="CIMStatusCodeDescription" PROPAGATED="true" TYPE="string"></PROPERTY><PROPERTY NAME="ErrorSource" PROPAGATED="true" TYPE="string"></PROPERTY><PROPERTY NAME="ErrorSourceFormat" TYPE="uint16"><VALUE>0</VALUE></PROPERTY><PROPERTY NAME="ErrorType" TYPE="uint16"><VALUE>4</VALUE></PROPERTY><PROPERTY NAME="Message" TYPE="string"><VALUE>Switch set up failed, name=’VS1-Inside-7f31205c-6c3b-4382-87f8-2036fff33435′, external port=’External-462f92e8-6a14-4dcc-bf49-7c3eca6d3f67′, internal port=’Internal-462f92e8-6a14-4dcc-bf49-7c3eca6d3f67′, NIC='{871FF47B-12BB-41FF-92EA-14450DB3A8C0}’, internal name=’Internal-462f92e8-6a14-4dcc-bf49-7c3eca6d3f67′, internal friendly name=’Internal-462f92e8-6a14-4dcc-bf49-7c3eca6d3f67′, error=2147500037, mof code=32786.</VALUE></PROPERTY><PROPERTY.ARRAY NAME="MessageArguments" TYPE="string"><VALUE.ARRAY><VALUE>VS1-Inside-7f31205c-6c3b-4382-87f8-2036fff33435</VALUE><VALUE>External-462f92e8-6a14-4dcc-bf49-7c3eca6d3f67</VALUE><VALUE>Internal-462f92e8-6a14-4dcc-bf49-7c3eca6d3f67</VALUE><VALUE>{871FF47B-12BB-41FF-92EA-14450DB3A8C0}</VALUE><VALUE>Internal-462f92e8-6a14-4dcc-bf49-7c3eca6d3f67</VALUE><VALUE>Internal-462f92e8-6a14-4dcc-bf49-7c3eca6d3f67</VALUE><VALUE>2147500037</VALUE><VALUE>32786</VALUE></VALUE.ARRAY></PROPERTY.ARRAY><PROPERTY NAME="MessageID" TYPE="string"><VALUE>14070</VALUE></PROPERTY><PROPERTY NAME="OtherErrorSourceFormat" PROPAGATED="true" TYPE="string"></PROPERTY><PROPERTY NAME="OtherErrorType" PROPAGATED="true" TYPE="string"></PROPERTY><PROPERTY NAME="OwningEntity" TYPE="string"><VALUE>Microsoft-Windows-Hyper-V-Network</VALUE></PROPERTY><PROPERTY NAME="PerceivedSeverity" TYPE="uint16"><VALUE>3</VALUE></PROPERTY><PROPERTY NAME="ProbableCause" TYPE="uint16"><VALUE>0</VALUE></PROPERTY><PROPERTY NAME="ProbableCauseDescription" PROPAGATED="true" TYPE="string"></PROPERTY><PROPERTY.ARRAY NAME="RecommendedActions" PROPAGATED="true" TYPE="string"></PROPERTY.ARRAY></INSTANCE>
(Unknown error (0x8012))

3) Created the new VS in Hyper-v then refreshed the Hyper-v host in VMM – same story

image

4) Changed VS configuration to internal only – successful

image

5) Changed VS configuration to the nic team again – failure (same error as in validation step 2)

6) Changed VS configuration  to Internal Network – bad bad failure even if the job completed in success: a new NIC has been created on the host "Local Area Connection 3" this invalidated the cluster configuration. Recovery step: bound the VS to private from VMM, if done from Hyper-v manager the new NIC will become orphaned.

image

As you can see removing the orphan was just part of the story.

From here on I can only speculate:

  1. I fear VMM matches the NIC bound to the VS to the physical NIC via a MAC address match
  2. If you have multiple NICs with the same MAC address the last one returned by the WMI query is assumed to be the right one and it is associated to the VS. Clearly this is not the correct way to discover VSs, since the virtual switch class reports the NIC ID this should be used instead.

As you can see in the following snippet the last returned NIC for the team is #5

image

On the working node the last returned NIC for the team is the Team NIC.

How to replay VMM logs

As you can see from the snippets in this post VMM uses winrm to perform remote WMI queries. Once you have the log is really easy to replay the query and get the results. For example a log entry like this:

[844] 034C.1728::04/01-15:51:56.942#04:WsmanAPIWrapper.cs(1163): WSMAN: URL: [http://XXX:80] Verb: [ENUMERATE], resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_NetworkAdapterSetting], wqlQuery: []

can be translated in: winrm e wmi/root/cimv2/Win32_NetworkAdapterSetting  -r:http://XXX:80

So basically it is possible to replay VMM discovery step by step with this simple technique.

References

How to collect traces in System Center Virtual Machine Manager <http://support.microsoft.com/kb/970066>

WMI tracing on Windows 2008 http://blogs.technet.com/askperf/archive/2008/03/04/wmi-debug-logging.aspx

winrm samples http://blogs.technet.com/jonjor/archive/2009/01/09/winrm-windows-remote-management-troubleshooting.aspx

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Advertisements
  1. VMM wrong Virtual Switch discovery post updated « Quae Nocent Docent

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: