Microsoft Operations Management suite (OMS) is a cloud borne management suite that make it possible to monitor, automate, protect your workloads wherever they are: Azure, AWS, on-premises and more.
My good friend and fellow MVP Stan (https://cloudadministrator.wordpress.com/) has a very comprehensive series of articles on this new suite. If you haven’t given OMS a try to I strongly encourage you to do so, there’s a free plan that permits you to test the solution (http://www.microsoft.com/oms).
On my part I started to deploy OMS on production systems for customers who wants to leverage the cloud architecture, but that still manage on-premises systems. Even if OMS is cloud borne, it’s an agent based solution (and yes I find it hard to think of a different way to implement it), so there are some decisions to take for on-premises systems. This article shares some direct experiences in operating an OMS architecture, in particular regarding the data flow from managed systems to the Microsoft cloud.
But, first of all, let’s take a step back and see how OMS works in terms of agents and connections to the cloud.
OMS data flow
OMS managed systems need an agent, incidentally the agent is the good old Microsoft Monitoring Agent (MMA), the same agent used for System Center OpsMgr and Application Insights. This agent is able to send data to both OpsMgr and OMS. Basically we have two models:
- Agents operated through an OpsMgr infrastructure – super easy way to add value to your OpsMgr deployment. When an agent uses OpsMgr to communicate to OMS all the data is funneled through the OpsMgr infrastructure up to the Management Server pool in charge of OMS communication.
- Agents directly connected to OMS
In the real world the two architectures are often mixed, for at least a couple of reasons:
- For some high volume scenarios the agents always connect directly to OMS (such as Windows Security Events or IIS W3C logs)
- Since OMS is not yet multi-tenant and the systems cannot be scoped for different management teams, there are cases where specific systems, even if managed via OpsMgr, are directly connected to OMS This way every team uses a specific and separate OMS workspace (the minimum entity that can be used to delegate administrative rights today)
In both cases the traffic is plain https to a well known and documented sets of destinations (see https://azure.microsoft.com/en-gb/documentation/articles/operational-insights-proxy-firewall/ and related articles).
Controlling OMS data flow
The real challenge in deploying OMS is how to get the data to the cloud, in my experience I found a couple of common blockers:
- Typically the security team doesn’t want the single systems to be able to send data outside the company/perimeter
- The networking and security team wants to do all sorts of fancy stuff with https traffic, first of all cheating with ssl inspection.
At the very early steps of OMS, at that time simply System Center Advisor preview, I was screaming for a solution for directly connected agents. Now we know that basically all the agents are directly connected for some types of traffic, and afaik this is the direction the team is taking for every type of traffic. After all this way each agent has its own queue/buffer and can send more data if it needs to, without affecting the OpsMgr infrastructure.
For traffic funneled through OpsMgr the endpoints are already a restricted set (the MS pool) and probably they’re already able to communicate with the cloud, for example if you leverage GSM.
My first request was for an OpsMgr gateway like architecture: let me define a pool of gateways and use them to send data to the cloud, I asked. After 10 years of OpsMgr architectures it seemed to me the most logical thing to do. But it was not, as the team correctly pointed out.
OMS traffic can be very high volume and it’s https, not the mom channel we’re accustomed to (tcp/5723). So why reinvent the wheel? For https traffic we have good old standard http proxy, one or more of them for redundancy and we’re set. Over the last few months the MMA has been implemented to support agent specific proxy and to expose the proxy setting through the UI (vs only in command line / powershell). Now, each agent comes with proxy settings that it will honor, only the OMS agent is made proxy-aware, the rest of the machine doesn’t have to be.
In the OpsMgr infrastructure the proxy settings are centralized (you can entered through the Console) and will be propagated to all agents. So, the agents send data ‘directly’ but they are able to pass thru a commodity HTTP proxy.
Problem solved? Not yet.
You know networking and security guys are like bureaucrats, they set rules just to be sure to continue to exist (but this is another story), so our blockers changed into:
- Typically the security team doesn’t want the burden of setting specific rules for single server systems, nor they want to treat them as a group. What the heck they’re special, they’re servers!
- The networking and security team wants to do all sort of fancy thing with https traffic, first of all cheating with ssl inspection.
At the end of the day an OMS gateway/proxy is still needed, this way we can ask the networking guys to apply specific rules just to a limited set of systems (i.e. the proxy system(s)).
Solutions for OMS data flow
Now that the challenge is stated, I want to share the two approaches I’m currently using with the community . These solutions are independent of the fact that OMS agents are OpsMgr attached or directly attached.
Use the existing proxy infrastructure – if possible this is the easiest way to centrally control the data flow. Just remember do not add ssl inspection and that OMS data flow can be high volume, so you need to be sure the proxy infrastructure can support the increased load
Use a custom tailored, OMS dedicated proxy infrastructure – in this case you can have one or more systems specifically used for OMS agents.
In the latter case the proxy itself can be any commodity proxy software or hardware that you trust. Unfortunately Microsoft TMG is a product that was de-invested in and it’s reaching end of life, but this problem of ‘funneling data thru a single point’ is a solved problem, and there are a lot of options – open source, commercial, etc… – to choose from, if you need this kind of setup.
In both cases you get the benefit of having a central place for your OMS communication, it just doesn’t use proprietary protocols nor queues the way OpsMgr did, but rather industry standard protocols and patterns. And if your single proxy machine has too much load or you want to make it redundant – it’s just an HTTP Proxy, and you can scale it horizontally by adding more almost-identical boxes and place them behind a load balancer. Again, just using all commodity pieces, without worrying about proprietary failover behaviors and what not. One more (incidental) advantage is that, with a proxy, you have logs of where your agents connect to, when, and how much data they sent. And, by the way, you can collect those logs with OMS itself.
Before posting this article I had the luck and pleasure to share these ideas with Daniele Muscetta from the OMS Team (@dani3l3), from our chat it turned out this approach is what the team currently suggests. So, as a SysCtr / OMS community, we can start working from here and build sound architectures for our OMS deployments.
Choosing a dedicated proxy for OMS doesn’t necessarily mean you have to pay for a commercial product, for example you can use Squid.
An Example Setup with Squid
That’s enough theory, below there’s an example of such as setup – it’s not meant to be prescriptive, but more to be used just as an example. Squid ( http://www.squid-cache.org/ ) can be run on Linux as well as on Windows, it’s really up to you which version to choose based on your support statements and the skills of the team in charge of the OMS communication infrastructure. The Windows version comes with an MSI installer and it’s hosted here https://github.com/diladele/squid3-windows. I actually used that one this time when experimenting with this configuration. The windows installer does all the nice things you’d expect on Windows: register as a service, install an icon in the tray to stop/start/access configuration from your desktop, write startup events and errors in the ‘Application’ event log. It took me 5 minutes to install it and probably 20 minutes or so to find out the right configuration.
The way I set it up is with the following policy – I didn’t use authentication, just allowed traffic for a specific subnet (172.16.0.0/16) that corresponds to the subnet where my servers are. This is likely the only configuration you’d have to change/adapt if using this, to describe your ‘LocalNet’ (also note there are a couple of examples of addresses – IPv4 and IPv6 – in the file below). The rest of the configuration makes sure those machines are only allowed to connect with HTTP/HTTPS to the set of destinations that make up the OMS OpInsights service. Trying to point to this proxy and browse anywhere else is prevented:
Without further ado, here’s the configuration file (squid.conf) I am using:
# Rule allowing access from your local networks.
# Adapt to list your (internal) IP networks from where connection
# to OMS should be allowed
acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
acl localnet src 172.16.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines
acl SSL_ports port 443 #https
acl Safe_ports port 80 # http
acl CONNECT method CONNECT
#destinations for OMS
acl omsdst dstdomain scadvisor.accesscontrol.windows.net
acl omsdst dstdomain scadvisorservice.accesscontrol.windows.net
acl omsdst dstdomain .blob.core.windows.net
acl omsdst dstdomain .ods.opinsights.azure.com
acl omsdst dstdomain .oms.opinsights.azure.com
acl omsdst dstdomain .systemcenteradvisor.com
#destinations for console auth
acl omsdst dstdomain .live.com
acl omsdst dstdomain .microsoft.com
acl omsdst dstdomain .microsoftonline.com
acl omsdst dstdomain login.windows.net
# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager
# allow the internal network to connect to OMS domain names
http_access allow localnet omsdst
# Deny requests to certain unsafe ports
http_access deny !Safe_ports
# Deny CONNECT to other than secure SSL ports
http_access deny CONNECT !SSL_ports
# Deny requests to any other destination
http_access deny !omsdst
# And finally deny all other access to this proxy
http_access deny all
# Squid normally listens to port 3128
# Leave coredumps in the first cache dir
dns_nameservers 18.104.22.168 22.214.171.124
Obviously the same configuration can be used on Squid on Linux.
Conclusion and call for proxy ‘recipes’
I hope the above information is useful when planning how to have your infrastructure send data to OMS. If you have configured your HTTP proxy/firewall software (not squid, but if you used anything else) in a similar way, please share the love and describe/blog about your configuration and let me know in the comment section! It will help other users in the community to set up their own!
This posting is provided “AS IS” with no warranties, and confers no rights.