Quae Nocent Docent

What hurts, teaches – Ordinary tales from management trenches

The case of post dated monitoring data

leave a comment »

A few week ago a customer of ours has been hit by a time issue, the internal time reference server jumped to anno domini 2020. Since this happened during the weekend it took a few hours to be fixed, in the meantime opsmgr agents did their job posting data to the opsmgr infrastructure. The net result was a bunch of monitors unhealthy with last changed time set to 2020. Annoying I thought I need to reset all the broken monitors via powershell. Alas this was not only annoying, but blocking as well. Monitors won’t reset nor their state change in any case.

First consideration the opsmgr data access layer should block any data insertion with a too large time skew from its own date time reference (10’ to 15’ should be the maximum threshold immo).

Anyway, time for some reverse engineering once again.

First of all a few queries to identify the bogus state data. We have two tables involved here StateChangeEvent and State. The former collects all event state events, the ones you can check in health explorer, the latter reports the last known state for any given managed entity / monitor pair.

Easy enough, let check for all data updated after December 31st

select * from dbo.StateChangeEvent
where TimeGenerated > ‘12-31-2009′

select ME.FullName, M.MonitorName, State.* from dbo.State with (nolock)
inner join dbo.BaseManagedEntity ME with (nolock) on ME.BaseManagedEntityId=State.BaseManagedEntityId
inner join dbo.Monitor M with (nolock) on M.MonitorId=State.MonitorId
where State.LastModified > ‘12-31-2009′

Obviously my first though has been lets modify the LastModified field, but here we’re in the unsupported realm and before any mods a further analisys of the insight working needs to be accomplished. The core stoerd procedure for any stage change turned to be 

PROCEDURE [dbo].[p_StateChangeEventProcess]
(      
    @BaseManagedEntityId uniqueidentifier,
    @EventOriginId uniqueidentifier,
    @MonitorId uniqueidentifier,
    @NewHealthState tinyint,
    @OldHealthState tinyint,    
    @TimeGenerated datetime,
    @Context nvarchar(max) = NULL
)

this one in turns calls

PROCEDURE [dbo].[p_StateUpsert]
(
    @BaseManagedEntityId uniqueidentifier,
    @MonitorId uniqueidentifier,
    @HealthState tinyint,
    @LastModified datetime
)

and if p_StateUpsert returns with success it will insert a row in the StateChangeEvent table.

p_StateUpsert, among other checks, sets a control on the state update date time if it is earlier in the timeline respect the last time a monitor state has been updated the state change is discarded. This makes sense since state change are not guaranteed to arrive in chronological order. At the same time without a control on a time skew we can have a dos here.

Anyway from my analysis the LastModified field can be safely changed (still unsupported realm):

update dbo.StateChangeEvent set TimeGenerated=TimeAdded
where TimeGenerated > ‘12-31-2009′

update dbo.State set LastModified = GETUTCDATE()
where State.LastModified > ‘12-31-2009′

From this change on state changes will restart to flow in.

Issues: monitor needs to be reset or you must wait for the first state change for them to be updated or you could use Marius’ utility Tool- OpsMgr 2007 – RuntimeHealthExplorer or you could use a powershell script to reset all the postdated monitors. The basic statements need to be:

$obj = Get-MonitoringObject -id:<<basemanagedentityid from previous queries>>

$obj.ResetMonitoringState([guid]’<<monitorid from previous queries’)

 

Last Warning: if you reset the monitor from UI and from a Watcher view then the new healthstate won’t rollup (at least in my env). For example if you reset any unit monitor related to a HealthService starting from the Health Explorer for the related HealthServiceWatcher, the unit monitor will reset but the new status won’t rollup. If you do the same reset from the HealthService Health Explorer view it will rollup.

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Written by Daniele Grandini

November 16, 2009 at 8:51 am

Posted in Bug, Debugging, SCOM

A rollup you don’t want to miss

leave a comment »

If you’re still running OpsMgr 2007 SP1 you definitely need to apply the rollup package has been released yesterday (finally): Update Rollup for Operations Manager 2007 Service Pack 1 (KB971541).

It should set an end to the patching blues I complained so many times about.

Hopefully a similar rollup will be delivered for R2 sooner than later from what I know. Some issues fixed in SP1 are still present in R2.

In the and my advice is if you’re still on SP1 it is still a good idea to move to R2.

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

.

Written by Daniele Grandini

November 7, 2009 at 12:02 pm

Posted in KB, SCOM

Failover cluster monitoring – quick insight

with one comment

While I was trying to understand why my cluster nodes won’t dismiss I dug a little more inside cluster monitoring with opsmgr. As usual I have  no access to source code so I can go wrong on some assumptions.

The core logic behind cluster discovery and management is coded (natively) in mommodules.dll.

image

The dll exports:

  • ClusterGroupStateChange the name gives us some clues
  • ClusterDiscovery this one is used by the discovery workflow

Basically every cluster node discovers every resource group (Virtual Server) in the cluster and establish a relationship of type HealthServiceShouldManageEntity. This tells OpsMgr infrastructure to route the workflows for any Virtual Server to every cluster node. In this scenario every cluster node receives all the workflows even for VS it is not owning at the moment. Obviously we just want the owning node to monitor the proper VS. Without some custom logic here we would have an issue, in fact the agent has builtin logic to understand which VS it is supposed to manage (i.e. it is owning). On the passive nodes the workflows get unloaded. Second issue to face is the management of VS failover (i.e. when a resource group changes owning node). From what I understand the agent uses the ClusterGroupStateChange to understand when a VS changes ownership, I measured a 60” maximum delay from resource group failover to workflows reload on the proper node. So far so good, the agent (as we expect) is able to manage the VS where ever it is. I had a couple of cases where this was not working properly on SP1 and it resolved restarting the health service.

One more thing to add, the VSs are managed by the healthservice as a proxied systems, this has an important implication if you’re a MP author: all the workflows you want to execute against a VS must be tagged as remotable=”true”

image

If you miss this important requirement you’ll get event id 1207 after the agent reloads the workflows

Event Type: Warning

Event Source: HealthService

Event Category: Health Service

Event ID: 1207

Date: 10/17/2009

Time: 3:01:23 PM

User: N/A

Computer: ARES1

Description:

Rule/Monitor "QND.Test.Cluster.LogEvent" running for remote instance "XXXX.progel.org" with id:"{A16D2CDA-378D-E9AC-7913-404A9999BEEE}" will be disabled as it is not remotable. Management group "Progel Labs".

This is it, straightforward but useful if you need to debug issues on your clustered agents.

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Written by Daniele Grandini

November 7, 2009 at 11:57 am

Posted in Failover cluster, SCOM

Windows Scheduled task MP on MP catalog

leave a comment »

The Progel Windows Scheduled Task MP is now available from the MP catalog. We have received a single alert it has issues on extended character set locales (i.e. Traditional Chinese) but with too few details at the moment to assess if it’s our own bug or something external to us. If you manage to try the MP and encounter any issue drop an email to sst@progel.it.

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Written by Daniele Grandini

October 31, 2009 at 12:17 pm

Posted in MP, SCOM

Disk performance reporting

leave a comment »

Disk performance reporting and trending is probably the most difficult part of performance troubleshooting and capacity planning and trending. During the years I used several counters just to understand none of them can give a synthetic and accurate answer on disk performance. Ever tried to user % idle time or % disk time? Or Avg queue length if that matters. In the last few years I standardized on Avg Disk Sec / read / Write / Transfer as a single indicator of disk responsiveness. Generally I take 20ms to 30ms as a threshold should not  be exceeded on average.

However, even these counters are prone to errors (I don’t know when the OS team will change the perf counters architecture, but it will never be too early). First of all you must be aware of the following issues that arise with virtualized Windows 2003 servers with more than one core:

But issues are there to hit on Windows 2008 as well. Look at the following table that reports Avg Disk sec / Transfer on a Windows 2008 server:

image

As you can see, among “normal” values (0,xxx , is the decimal separator in Italy), we have clearly bad ones. It is obvious the server is not taking 93” or 3139” seconds (little less than 1 hour) to execute an I/O on average. The presence of these bad values can wrack havoc your reporting experience, the average of the above values is 285”, now it is clear this cannot be the case.

I didn’t find a root cause for this behavior. I can observe it on several Windows 2008 servers with a predominance of hyper-v hosts, it can be hardware related or just a bug in the perf counter (immo both), in any case your reports are doomed.

The only thing I can advice on is to change your SQL query to filter out obviously bad values. For example filtering out response time above 2 seconds, changes my average on the period to 0.029 seconds or 29 msec that denotes a fairy busy storage subsystem.

If I manage to find more info on this issue I’ll keep you posted, in the meantime take your reports on disk response time with a grain of salt.

– Daniele

This posting is provided "AS IS" with no warranties, and confers no rights.

Written by Daniele Grandini

October 31, 2009 at 12:11 pm

Posted in Bug, Reporting, SCOM

R2 and cluster monitoring – issues and pitfalls

with one comment

I think it’s time to wrap the various answers we can find on cluster monitoring and decommissioning in R2.

Failover clustering is an high availability solution and my quote is “if you want to keep it HA you need to monitor it”. So we all expect OpsMgr to do an egregious job in monitoring failover clusters. In some respect it does but you must be aware of glitches that are still around.

I observed at least 3 issues on failover cluster monitoring:

  1. abnormal CPU usage on the cluster node and on the controlling management server
  2. simply put you cannot decommission a cluster in a supported way
  3. the agentless managed via in the administration space of the monitoring console is just useless
  4. cluster discovery is noisy

Abnormal CPU usage is caused by internal tasks taking the wrong route. In some cases the config service on the RMS thinks resources are on the wrong node (i.e a node that’s not owning the resources) and sends health recalculation tasks to that node. In this cases you’ll see the following:

  • healthservice edb grows we had examples of > 3GB, this grow is caused by tasks queuing in
  • healthservice.exe on the affected node consumes up to one cpu core
  • healthservice.exe on the primary management server registers an increased cpu usage

We’re waiting for a fix from PSS, at the time the only work around is a group failover, but hey I’m supposed to run critical applications on my clusters a failover cannot be considered a workaround. Btw things can return bad after a while, so I would define this a temporarily rag that will break again sooner or later.

Decommissioning is another issue, the short story is that you cannot remove agents from cluster nodes, but you cannot remove the agents from your console. In other words you have zombies in console.

There’s an *unsupported* workaround here http://ops-mgr.spaces.live.com/blog/cns!3D3B8489FCAA9B51!163.entry and a discussion thread here http://social.technet.microsoft.com/Forums/en-US/operationsmanagergeneral/thread/10ffee08-b875-47af-b788-db07dbfa1b56.

See my own unsupported way later in this post, I’m not comfy with the workaround I cite before.

I want to say that the following rapid publishing KB just doesn’t work in my case (and from what I found it should never work): OpsMgr 2007: How to decommission a cluster monitored by System Center Operations Manager 2007

I want to add that the release notes are not clear as well:

“You cannot uninstall or delete an agent from a node in a cluster

When you try to uninstall or delete an agent from a node in cluster, the following error message is displayed:

Agent is managing other devices and cannot be uninstalled. Please resolve this issue via Agentless managed view in Administration prior to attempting uninstall again.

Notice that the agent can be uninstalled from the node that is agentlessly managing the virtual servers. However, the agent cannot be uninstalled from the node that is managing the virtual servers.

Workaround: None at this time.” http://technet.microsoft.com/en-us/library/dd827187.aspx

The agentless managed view is useless in terms that the result’s you’re seeing is just unpredictable and the change proxy action won’t work. This is all related on how the agentless management works for the UI.

An agentless managed Windows Computer (this is the class) is defined as a Windows Computer managed by an healthservice on another Windows Computer (using the HealthServiceShouldManageEntity relationship), this is the query the SDK runs against your live db where the guid is the id for Microsoft.Windows.Computer:

exec sp_executesql N’– AgentlessManagedDevicesByType <ManagedTypeId>

SELECT [T].[Id], [T].[Name], [T].[Path], [T].[FullName], [T].[DisplayName], [T].[IsManaged], [T].[IsDeleted], [T].[LastModified], [T].[TypedManagedEntityId], [T].[MonitoringClassId], [T].[TypedMonitoringObjectIsDeleted], [T].[HealthState], [T].[StateLastModified], [T].[IsAvailable], [T].[AvailabilityLastModified], [T].[InMaintenanceMode], [T].[MaintenanceModeLastModified], [PXH].[BaseManagedEntityId] AS [HealthServiceId], [PXH].[DisplayName] AS [ProxyAgentPrincipalName] FROM dbo.ManagedEntityGenericView AS T

INNER JOIN dbo.BaseManagedEntity AS BME 
    ON

            BME.[BaseManagedEntityId] = T.[Id]

            AND BME.[BaseManagedTypeId] = @ManagedTypeId

INNER JOIN dbo.Relationship AS R 
    ON R.[TargetEntityId] = T.[Id]

INNER JOIN dbo.BaseManagedEntity AS PXH 
    ON PXH.[BaseManagedEntityId] = R.[SourceEntityId]

WHERE ((

            T.[IsDeleted] = 0 AND T.[TypedMonitoringObjectIsDeleted] = 0 AND R.[IsDeleted] = 0 AND

            R.[RelationshipTypeId] = dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceShouldManageEntity()

          ))’,N’@ManagedTypeId uniqueidentifier’,@ManagedTypeId=’EA99500D-8D52-FC52-B5A5-10DCD1E9D2BD’

What happens for cluster is that we have two HealthServiceShouldManageEntity relationships discovered, one from an internal discovery source (in my case with guid “85AB926D-6E0F-4B36-A951-77CCD4399681”), I would call it the standard one or the discovery source the sdk call SetProxyAgent works with, the other is discovered by the cluster management pack by the discovery rule Microsoft.Windows.Cluster.Classes.Discovery”. This discovery is implemented as a native COM in MOMModules.dll.

Then you must add that every cluster node disocvers the entire hierarchy of cluster resources (bad, bad, bad and huge load on every node for complex cluster implementations), so every cluster virtual server is managed by every single node. Let’s take a simple example a basic two node cluster (nodeA and nodeB) with just the cluster virtual server (CLUSTER). The Microsoft.Windows.Computer CLUSTER is discovered by both nodes so that we have that nodeA and nodeB have a HealthServiceShouldManageEntity relationship with CLUSTER with source  Microsoft.Windows.Cluster.Classes.Discovery and another one with the standard disocvery source.

  So what happens here is that every cluster node ShouldManage every cluster Virtual Server, for this reason the proxy assignment you see in console is just unpredictable. The selection is based on HealthServiceShouldManageEntity relationship which returns one row for each cluster node for each cluster virtual server.

Using SetAgentProxy from SDK (or change proxy in UI) is useless because it resets just the standard discoverysource and not the Microsoft.Windows.Cluster.Classes.Discovery one.

The root cause of your inhability to delete the virtual servers first and the the cluster nodes is related to this discovered relationship. So if you delete the relationship you’ll find you way home.

Here is a SQL snippet that given the cluster node name (FQDN) will delete all the relationships to cluster virtual servers:

declare @nodeHS nvarchar(255)

declare @nodeHS nvarchar(255)

Set @nodeHS=N’Microsoft.SystemCenter.HealthService:FQDN’

DECLARE Rel_Cursor CURSOR FOR

(SELECT [RelationshipGenericView].[Id]

FROM dbo.RelationshipGenericView

WHERE ((RelationshipGenericView.[MonitoringRelationshipClassId] =  dbo.fn_ManagedTypeId_MicrosoftSystemCenterHealthServiceShouldManageEntity()) AND (((dbo.[RelationshipGenericView].[IsDeleted] = 0))))

AND ([RelationshipGenericView].[SourceMonitoringObjectId]

    IN (select BaseManagedEntityId from BaseManagedEntity where FullName =@NodeHS)))

OPEN Rel_Cursor;

declare @relId uniqueidentifier

declare @discoSource uniqueidentifier

declare @now datetime

set @now = GETUTCDATE()

FETCH NEXT FROM Rel_Cursor INTO @RelId;

WHILE @@FETCH_STATUS = 0

BEGIN

    SELECT @discoSource=DSTR.[DiscoverySourceId]

    –SELECT *

    FROM  dbo.[DiscoverySourceToRelationship] DSTR

        inner join dbo.[DiscoverySource] DS on DS.DiscoverySourceId = DSTR.DiscoverySourceId

        inner join Discovery on Discovery.DiscoveryId = DS.DiscoveryRuleId

                   WHERE [DiscoveryName] = ‘Microsoft.Windows.Cluster.Classes.Discovery’

                   AND [RelationshipId]=@relId

                   AND DSTR.[IsDeleted] = 0

    exec dbo.p_RemoveRelationshipFromDiscoverySourceScope

    @RelationshipId=@relId,

    @DiscoverySourceId=@discoSource,@TimeGenerated=@now                  
    FETCH NEXT FROM Rel_Cursor INTO @RelId;

END;

CLOSE Rel_Cursor;

DEALLOCATE Rel_Cursor;

After executing this SQL statement the cluster node and virtual servers can be deleted from the Operations Console. Obviously this is totally unsupported. So do this at your own risk, but if you followed my analysis it is clear it should be safe.

Cluster discovery is noisy, this means too much 21025 events (i.e. agents reloads) and the way it is performed on every node can lead to serious impact and cluster nodes. While I was about to post the detail of the noisy discovery (starting with mac address mismatch with OS discovery) a new cluster MP has been released (6.0.6720.0), this one promises to solve a lot of the issue I detected in previous versions. So this one is strongly recommended. btw it adds support for Windows Server 2008 R2 failover clusters (yes CSV included).

- Daniele

This posting is provided “AS IS” with no warranties, and confers no rights.

Written by Daniele Grandini

October 18, 2009 at 6:28 pm

Posted in Bug, Failover cluster, SCOM

USMT4 + SCCM 2007 SP2 RC – Downlevel Manifests folder is not present.

leave a comment »

I encountered this problem while preparing a demo consisting in a migration from Windows XP to Windows 7. I was using USMT 4, MDT 2010 and SCCM 2007 SP2 RC. In my task sequence I used hard-links to capture user data from a Windows XP installation and then to restore data back after applying a Windows 7 Image previously created. The migration process completed successfully but no system component settings were migrated. I looked at the scanstate.log created in %systemroot%\system32\ccm\logs\SMSTSLog to see if something went wrong and I found the following line :

2009-10-17 17:19:44, Info [0x000000] Downlevel Manifests folder is not present. System component settings will not be gathered.

The error is related to a not present manifest folder, so I used FileMon lo verify in which location scanstate was looking :

image

Scanstate looked for that folder in %Systemroot%\system32, same location as the “Current Directory” of the process (as shown by Process Explorer).

By doing a simple search with Google I’ve found that I’m not alone and that other people experienced the same issue (http://systemcenterideas.com/2009/09/usmt-issues-with-mdt-2010). The fix proposed in that blog does not apply to me because I use a “Capture Task” and not a script to execute scanstate.

So while waiting to see if it will be fixed in RTM (I hope so) I needed a workaround.

I decided to create a wrapper that :

  • executes a renamed scanstate.exe as a child process.
  • passes the same parameters received from the command line to the child process
  • passes the Application path as the Current Directory of the child process
  • returns the child process exit code.

I used :

  • GetCommandLine : to retrieve parameters passed to the wrapper
  • GetModuleFileName : to get the application path (after eliminating the application file name)
  • CreateProcess : to call the renamed scanstate and to pass the application folder as the Default Directory
  • GetExitCodeProcess : to retrieve the child process exit code and to pass it to the task sequence.

I renamed scanstate.exe to _scanstate.exe and I putted my wrapper in the USMT x86 folder, naming it scanstate.exe.

image

As it can be seen in the previous picture, my wrapper calls the real scanstate application (named _scanstate.exe) as a child process and forces the “Current Directory” to be the same where the wrapper is located. With the correct “Default directory” _scanstate.exe is able to access the manifest folder.

This is a sort of hack, it is not supported an probably it is not the best solution to this problem, but in this case I’m running an RC version of SCCM 2007 SP2 and I’m only interested in have the demo working in the right way. I really hope that this will be fixed in the RTM version of SP2 or that if there is a supported alternative solution to this problem, that will be published soon (I know that I can copy manifest folder to the system32 before calling scanstate, but I don’t like to do that).

– Fabrizio

This posting is provided "AS IS" with no warranties, and confers no rights.

Written by Fabrizio Guaitolini

October 17, 2009 at 3:23 pm

Posted in OSD, SCCM

Tagged with ,

Strongly recommended non OpsMgr patches

with one comment

I read several post about issues agents are facing even with R2. Yes R2 has still issues (clustering anyone?), but before pointing your finger at OpsMgr you should consider that a monitoring agent uses interfaces that are not normally used and this can lead to “new” bug discovered in OS or application components. So it’s not an agent issue, but the bug rises up only after agent installation. Bottom line: monitoring is never for free even agentlessly.

Before pointing your finger at OpsMgr this is our recommended fix list.

On every OS

  • KB 968967 it resolves high CPU utilization related to MSXML
  • KB 968760 it resolves an handle leak in .net framework 2.0 that affects monitoringhost (in our experience)
  •  

    On Windows Server 2008

    • Service Pack 2

    On Windows Server 2003

    • windows scripting host 5.7 on Windows 2003 (and Windows 2000)
    • KB 952523 on Windows 2003 to address a memory leak in WMI
    • KB 931320 another issue with WMI on Windows 2003
    • KB 943071 issue with event provider in managed code and WMI on Windows 2003
    • KB 933061 it fixes several issues in WMI on Windows 2003, it is of great help with WMI issues even if it won’t resolve them all

    On Biztalk 2007 servers (very noisy MP btw)

    I’ll try to keep this post up to date with any new fix we’ll consider useful for agent health.

    – Daniele

    This posting is provided "AS IS" with no warranties, and confers no rights.

    Written by Daniele Grandini

    October 15, 2009 at 12:20 pm

    Posted in Agent health, SCOM

    Publishing Operations Manager 2007 Web Console with ISA Server 2006 – Performance view problem

    leave a comment »

    I came across this problem months ago but I didn’t post anything in this blog because I thought this isn’t a common scenario. Today I found a post on a Microsoft newsgroup with a guy searching help for this, so I decided to post an article with the solution.

    If you publish an OpsMgr Web console by using a publishing rule in Microsoft Internet Security and Acceleration (ISA) Server 2006 using Forms-based authentication the following error may appear if you try to visit a performance view :

    image

    and at the same time an error will appear in the Eventlog of the server holding the Web Console Role :

    Log Name:      Operations Manager
    Source:        Web Console
    Date:          10/10/2009 4:01:27 PM
    Event ID:      10
    Task Category: None
    Level:         Warning
    Keywords:      Classic
    User:          N/A
    Computer:      OpsMgr-RMS.domain.lab
    Description:
    Instance: heraycn1hpnzfe45a0lvit45.

    View request processing error:

    Microsoft.EnterpriseManagement.OperationsManager.WebConsole.Utility.WebRequestArgumentException: Invalid format of the list of selected performance counters.
    Parameter name: Counters —> System.FormatException: Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
       at System.Guid..ctor(String g)
       at ViewTypePerformance.GetSelectionsFromCookie()
       at ViewTypePerformance.GetCountersList(String countersList)
       — End of inner exception stack trace —
       at ViewTypePerformance.GetCountersList(String countersList)
       at ViewTypePerformance.ProcessViewRequest()
       at ResultPaneBase.Base_Load(Object sender, EventArgs e)

    It seems that the code is trying to get the list of counters to show, from a cookie an that an invalid or malformed GUID is found. The problem occurs only if you access the site through ISA so I supposed that ISA makes some modification to the request. To verify this hypothesis I captured a Network Monitor trace on the requesting client side and another one on the server side (on the server holding the Web Console Role).

    The following picture contains a network packet fragment captured on the client side. In this fragment we can see a part of the cookie contained in the request :

    image

    The following picture contains the same network packet fragment captured on the server side. In this fragment we can see the same part of the cookie contained in the request :

    image

    The two frames are different, we could see that commas used to separate numbers in the color definitions are replaced in the second frame with semicolons. At this point it was clear that ISA replaced commas with semicolons in the cookie content and I thought this could be the cause of the issue.

    I did a little research with google and I found the following KB that confirms my hypothesis:

    Web pages do not appear as expected when you publish a Web site by using a publishing rule in Internet Security and Acceleration (ISA) Server 2006

    “You publish a Web site by using a publishing rule in Microsoft Internet Security and Acceleration (ISA) Server 2006. When a user visits the Web site, the Web pages do not appear as expected. For example, the page layout may be incorrect, or parts of a Web page may not appear.
    You experience this problem if the following conditions are true:

    • You use Forms-based authentication (CookieAuth) in ISA Server 2006 to authenticate the users who visit the Web site.
    • The Web site is running a Web application that uses one or more commas as part of the cookie content.”

    I executed the script contained in the article on my ISA server to change this behavior and now I’m able to access the performance view without any issue.

    – Fabrizio

    This posting is provided "AS IS" with no warranties, and confers no rights.

    Written by Fabrizio Guaitolini

    October 10, 2009 at 4:38 pm

    Posted in SCOM

    Tagged with

    Discoveries, multihoming and cookdown

    leave a comment »

    We have a few customers that are using multihoming for opsmgr agents. They all complains for slow discovery in the added MG. I’ve been asked about this delay online as well, so I’m going to wrap my answers and give my view of the issue.

    The slow discovery in the added MG, from my internal tests, it’s due to cookdown. Cookdown applies to every workflow, discoveries included. So let’s take for example a discovery of your own for Component C, that targets component B and that in turns targets Windows.Computer. Important: you’re discovering the components in both MGs. When you add a new MG to an agent the install process does a very basic discovery and restarts the agent, when the agent is restarted all the non synced discoveries are run. After the restart Windows.Computer is discovered for MG2, this will cause a reload (event 21025) for the specific MG, this in turn forces a download of all the workflows targeted at Windows.Computer. What we expect now is that newly downloaded discoveries (in our example for component B) run. But , since the discovery workflow for component B is already there for MG1 and that it has been run at agent startup after the new MG has been added, cookdown will step in and say “this is the same workflow, with the same signature so I can safely wait for the next scheduled time”. If the scheduling is 24 hours you must wait 24 hours (actually a little less) for component B to appear in MG2. And then the same process applies for Component C and so on.

    So what you can do to speed up the discovery process for newly added MGs? 

    First, to check if this is really the issue you’re facing, restart the agent on one sample system and check if discovery data flows in, restarting the agent forces all non time synced discoveries to run. Between one restarts and another give the agent enough time (5’ to 10’ typically)  to complete the discovery cycle.

    Second, if you’re the MP author you should use the System.Discovery.Scheduler instead of the System.Scheduler datasource. This has been adopted by the latest OS MPs, so

    Third, install the latest OS MPs at least version 6.0.6667.0.

    For the curious of you, the difference between System.Scheduler and System.Discovery.Scheduler is only in the signature, in the latter the target managed entity Id has been added so it won’t be cooked down (the signature will always be different)

         <DataSourceModuleType ID="System.Scheduler" Accessibility="Public" Batching="false">

            <Configuration>

              <IncludeSchemaTypes>

                <SchemaType>System.ExpressionEvaluatorSchema</SchemaType>

              </IncludeSchemaTypes>

              <xsd:element name="Scheduler" type="PublicSchedulerType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />

            </Configuration>

            <ModuleImplementation Isolation="Any">

              <Native>

                <ClassID>C3339855-80B3-4c06-B7AB-5C5D97B59A0D</ClassID>

              </Native>

            </ModuleImplementation>

            <OutputType>System.TriggerData</OutputType>

          </DataSourceModuleType>

     

        <DataSourceModuleType ID="System.Discovery.Scheduler" Accessibility="Public" Batching="false">

            <Configuration>

              <IncludeSchemaTypes>

                <SchemaType>System.ExpressionEvaluatorSchema</SchemaType>

              </IncludeSchemaTypes>

              <xsd:element name="Scheduler" type="PublicSchedulerType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />

            </Configuration>

            <ModuleImplementation Isolation="Any">

              <Composite>

                <MemberModules>

               

                  <DataSource ID="DS1" TypeID="System.Discovery.Scheduler.Internal">

                    <Scheduler>$Config/Scheduler$</Scheduler>

                    <ManagedEntityId>$Target/Id$</ManagedEntityId>

                    <RuleId>$MPElement$</RuleId>

                  </DataSource>

                 

                </MemberModules>

                <Composition>

                  <Node ID="DS1" />

                </Composition>

              </Composite>

            </ModuleImplementation>

            <OutputType>System.TriggerData</OutputType>

          </DataSourceModuleType>

         

          <DataSourceModuleType ID="System.Discovery.Scheduler.Internal" Accessibility="Internal" Batching="false">

            <Configuration>

              <IncludeSchemaTypes>

                <SchemaType>System.ExpressionEvaluatorSchema</SchemaType>

              </IncludeSchemaTypes>

              <xsd:element name="Scheduler" type="PublicSchedulerType" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />

              <xsd:element name="ManagedEntityId" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />

              <xsd:element name="RuleId" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />

            </Configuration>

            <ModuleImplementation Isolation="Any">

              <Native>

                <ClassID>C3339855-80B3-4c06-B7AB-5C5D97B59A0D</ClassID>

              </Native>

            </ModuleImplementation>

            <OutputType>System.TriggerData</OutputType>

          </DataSourceModuleType>

    – Daniele

    This posting is provided "AS IS" with no warranties, and confers no rights.

    Written by Daniele Grandini

    October 9, 2009 at 5:25 pm

    Posted in Cookdown, SCOM, multi homing