Extending #MSOMS with the Linux Agent


Extending #MSOMS with the Linux Agent

Introduction

Log Analytics is a great platform to store and analyze machine data, the more data you ingest the more value you can get. While Microsoft will add more and more data sources there will always be the case when your own specific device or application won’t be covered. So why don’t develop your own integration and leverage the Linux agent to add your data to Log Analytics. This guide set the basis for adding new data in Log Analytics.
The idea is to keep a single reference point on how to extended OMS Log Analytics through the Linux agent.

DRAFT if you find empty or uncompleted sections this is normal, writing all the things I’ve learned will take time

Previous posts on the same subject:

You can find a working solution based on LInux for OMS in this github project

Table of Contents

Inside the OMS Agent for Linux
OMS Agent Architecture
Configuring the OMS Agent for Linux
Important output plugins
Extending the OMS Agent for Linux
Adding FluentD plugins
Developing custom code for OMS Agent for Linux
Prepare your dev environment
Common classes
Anatomy of an input plugin
Anatomy of a filter plugin
Debugging your code
Deploying your code
Ingesting custom data with the Linux agent
Ingesting custom dataset
Ingesting standard datasets
Building a complete OMS Solution

Inside the OMS Agent for Linux

The OMS Linux Agent is an open source project hosted on github and currently based on FluentD (v0.12.24). The agent leverages the FluentD framework and adds its own plugins.

You can find the complete documentation for the agent package on github

The agent payload includes two other components:

  • OMI that basically ports the CIM/WBEM standards to Linux and that is used mostly for performance data gathering
  • SCX an open source OMI provider for xplat monitoring originally developed for System Center Operations Manager, used to enhance the data that OMI can get from the system. SCX was originally posted as opensource on Codeplex and later moved to github.

The agent is implemented as a standard service/deamon called “omsagent”.
It runs in a non-privileged context under the omsagent user account. The omsagent rights are defined in the etc/sudoers file or /etc/sudoers.d/omsagent.

The agent binaries and components are installed in the following directories:

  • /opt/omi, for the OMI package
  • /opt/microsoft/scx for the SCX provider
  • /opt/microsoft/omsconfig for files used continuosly to install and configure the agent
  • and finally /opt/microsoft/omsagent for the running FluentD engine

Conversely the log files can be consulted in the /var/opt tree:

  • /opt/omi/log, for the OMI package
  • /opt/microsoft/scx/log for the SCX provider
  • /opt/microsoft/omsconfig/log to check the configuration auto-update mechanism for the agent
  • and finally /opt/microsoft/omsagent/log for the running FluentD engine

The running configuration for the agent can be found in /etc/opt/microsoft/omsagent/conf/omsagent.conf and in the files in /etc/opt/microsoft/omsagent/conf/omsagent.d

OMS Agent Architecture

The OMS agent architecture is a standard FluentD stack, with input plugins getting the data inside the system, filter plugins processing it and finally a set of custom output plugins that manage the core buffering, authentication and encryption to the OMS cloud.

fluentd.png

A single data flow can be summarized as follow.

OMS-Linux-1.png

For those new to FluentD let’s add that every single plugin has its own type and configuration properties (see the following paragraph). Every single data document is identified with a tag (set in the input plugin configuration) that is used throughout the entire flow to identify that specific data type.

Configuring the OMS Agent for Linux

It must be clear now that the OMS agent configuration is just plain FluentD configuration. As indicated previously the main configuration file is /etc/opt/Microsoft/omsagent/conf/omsagent.conf this file must not be modified unless explicitly requested by Microsoft support or in the limit of the Microsoft public documentation. This configuration file and the companion files under /etc/opt/Microsoft/omsagent/conf/omsagent.d are managed by the platform and updated based on the properties that explicitly or implicitly are set in the OMS workspace. If you check the files last modification datetime you’ll find they’re updated pretty often behind the curtain every time a solution is silently updated at the OMS platform level.

The agent continuously updates its configuration and possibly some modules using a powershell DSC provider for OMI. The OMS workspace is configured as a DSC server that gets polled every 5 minutes by default. The update process is orchestrated via OMI which implements the DSC client needed by the config. The defualt configuration for the DSC compoenent is as follows:

...
RefreshMode = "Pull";
 AllowModuleOverwrite = True;
 RefreshFrequencyMins = 5;
 RebootNodeIfNeeded = False;
 ConfigurationModeFrequencyMins = 5;
 ConfigurationMode = "ApplyAndAutoCorrect";

In the end, unless you know what you’re doing the only way to configure the OMS agent is to add your own configuration file to the omsagent.d folder.
Your configuration file must comply with the standard FluentD format:

<br />@type XXXXXX
tag oms.tag.tag
…
@type filter plugin type
…
@type output plugin type
…

Important output plugins

Plugin name Usage Notes
out_oms native data types no hard requirements for tag
out_oms_blob Native and custom data types, record must conform specific format. See below Tag must have at least 4 parts: oms.blob..
out_oms_api HTTP ingestion API, posts the fluentD record Tag must have 3 or 4 parts. The third element is the log name in OMS, the fourth if present is the timestamp field

For the out_oms_blob the FluentD record must have the following schema:

{
"DataType": {"type": "string"},
"IPName": {"type": "string"},
"DataItems"{
"type": "array",
"items": {"type": "object"}
}

Example:

wrapper = {
"DataType"=>"LINUX_PERF_BLOB",
"IPName"=>"LogManagement",
"DataItems"=>records
}

Extending the OMS Agent for Linux

Adding FluentD plugins

Since the OMS agent is based on FluentD its functionalities can be extended with standard plugins from the community as well as with custom developed plugins.

The first approach doesn’t require any development skills, so it’s the easiest path. As an example let’s try to ingest data into OMS from the Nginx status page.

First of all you need to have a system with NGinx installed, the latest build of Nginx already contain the status module, all you have to do is to enable the module. Let’s say we want it enabled for the default website. To install NgInx just type:

sudo apt-get install nginx
sudo ufw allow 'Nginx Full'

Then you need to edit the configuration file for the website:

sudo nano /etc/nginx/sites-enabled/default

and add a location directive to the server{} section

...
location /nginx_status {
stub_status on;
access_log off;
}
...

Nginx-config.png

More info on how to configure Nginx on Ubuntu:

Once you’re Nginx server is properly configured you must add the needed FluentD plugins. In our case we need the in_nginx plugin and the filter_record_modifier plugin. The latter will be used to format the record in a way it is more useful for OMS without writing any code.

Warning you cannot use sudo /opt/microsoft/omsagent/ruby/bin/fluent-gem install fluent-plugin-nginx-status as you would have done for a standard FluentD installation. This is due to the fact that default directories for the
ruby gem are misconfigured in the omsagent build. Just run

sudo wget -O /opt/microsoft/omsagent/plugin/in_nginx.rb https://raw.githubusercontent.com/robertpitt/fluent-plugin-nginx-status/master/lib/fluent/plugin/in_nginx.rb
sudo wget -O /opt/microsoft/omsagent/plugin/filter_record_modifier.rb https://raw.githubusercontent.com/repeatedly/fluent-plugin-record-modifier/master/lib/fluent/plugin/filter_record_modifier.rb

Once the plugins are in place you just have to configure the oms agent to start ingesting data, you can find a sample configuration later in the article.

Developing custom code for OMS Agent for Linux

Right or wrong I consider Linux machines as GUI-less, I like them that way. For me developing on a GUI less system is not very productive, especially if you consider the fact that I’m an hobbyist developer and not a full time one. So my FluentD and Ruby development environment is on Windows. If you have the same need you can follow the step by step guide below.

Prepare your dev environment

You can indeed install Ruby on your windows system, just go to RubyInstaller and download the version you need. Ruby is an interpreted language so multiple versions can run side by side without any issue.

AS a development environment I opted for the free and opensource Visual Studio Code while not the best Ruby dev environment in town it does its job and can be used with different languages as well.

After you installed Ruby and Visual Studio Code, you need to configure VSCode for Ruby.

Inside VSCode install the ruby extension (ext install Ruby), this install the Peng Lv plugin for Ruby

After that you need to gem-install some Ruby modules to help VSCode parse, autocomplete and debug you code. The following is my shortlist:

gem install rcodetools
gem install fastri
gem install ruby-debug-ide
gem install debase

gem install rubocop
gem install ruby-lint

Once your Ruby environment is up and running you need to prepare and install FluentD, again is a matter of gem installing:

gem install fluentd
gem install win32-ipc
gem install win32-event

Lastly you need to install your preferred test framework, being no experienced Ruby programmer I just chose the one used in the OMS Linux Agent repo: Test Unit

gem install test-unit

Now your Ruby environment is ready to code and debug.

Common classes

require_relative ‘omslog’
require_relative ‘oms_common’

OMS::IPcache

Anatomy of an input plugin

Anatomy of a filter plugin

Debugging your code

To test ruby regex use http://rubular.com/

fluentd -p C:\Users\grandinid\SkyDrive\Dev\GitHub\OMS-Agent-for-Linux\source\code\plugins -c “C:\Users\grandinid\SkyDrive\Dev\GitHub\OMS-Agent-for-Linux\LocalOnly\kempdebug.conf” -vv

sudo su omsagent -c “/opt/microsoft/omsagent/bin/omsagent -vv -c /etc/opt/microsoft/omsagent/conf/omsagentdebug.conf”

Running the debugagent alongside the production agent
beware no overlapping plugins
log_level property from info to debug

Basic tshoot
Sudo service omsagent restart
Sudo cat /var/opt/microsoft/omsagent/log/omsagent.log
Sudo service omid restart

For general guidance on troubleshooting the Linux OMS agent check the tshoot page on github

Deploying your code

Ingesting custom data with the Linux agent

Ingesting data is pretty easy with the OMS ingestion API, also called HTTP data collector API, but once you want to create a reliable solution things are a little more complex. The are two different aspects that need to be addressed:

  1. First, the ingestion must take into account that the service can have outages, they can be connectivity outages as well as the service itself having issues. So the process leveraging the API must have its own buffering and retry mechanism to maintain the dataset integrity. Obviously it won’t be able to buffer data forever, but it must have some time limited resiliency to outages
  2. Secondly, it is highly probable that the solution would benefit ingesting standard datasets, the first that comes to mind is the performance dataset. Leveraging standard datasets means inheriting all the perspectives and built-in analysis the OMS platform provides for those type of data.
    While the first topic, resiliency, can be addressed on Windows with some custom coding, the second one cannot be implemented on Windows today, unless the agent is also connected to a SCOM management group and in that case you must develop a custom Management pack.
    On the Linux agent, on the other hand, resiliency and standard datasets ingestion comes free. To be more precise, resiliency is built-in in the OMS fluentd output plugins, and standard dataset ingestion just requires a minimum of reverse engineering. This is why today the Linux platform is the premiere platform of choice to develop custom OMS solutions.

Ingesting custom dataset

Ingesting custom data using the OMS HTTP API is fairly simple and fully integrated in the Linux agent. Once the data is in the FluentD pipeline you just have to match the tag with the out_oms_api plugin. There are a couple of important settings you want to be aware of:

  • The tag must have at least three elements and a maximum of four. The third element is the name of the custom log in OMS. So if you have a tag like oms.qnd.LibraEsva your custom log will be named LibraEsva_CL.
  • It is a good idea to have a Computer or Host property name if you need to correlate the entries you’re ingesting with other sources based on the hostname
  • You can define your own timestamp property, in this case it must be declared with the time_generated_field configuration parameter or you can use a 4 parts tag where the latest part is the name of the filed containing the custom timestamp. Beware if both time_generated_field and a 4 parts tag are specified the tag wins.
  • It is a good idea to specify dedicated buffer files for each match configuration
    As a simple example let’s into account the need to ingest Nginx status as provided by the built in status page. In this case we can use the HTTP API to send the activity data to OMS. All we have to do is to use the community based nginx_status FluentD plugin:
<br />@type nginx_status
tag oms.qnd.NginxStatus
host localhost
port 80
path /nginx_status
interval 10


@type out_oms_api
log_level info
num_threads 1
buffer_chunk_limit 1m
buffer_type file
buffer_path /var/opt/microsoft/omsagent/state/out_oms_nginx*.buffer
buffer_queue_limit 10
buffer_queue_full_action drop_oldest_chunk
flush_interval 10s
retry_limit 10
retry_wait 30s
max_retry_wait 10m

With this simple configuration you can start to ingest data in a log called NginxStatus_CL, alas if you try you’ll discover that the metrics are ingested as strings and that you don’t have a Computer property to match the data with other sources for the same Linux system.
A more complete configuration that adds the Computer property and transform the data into numeric uses the record_modifier FluentD filter

<br />@type nginx_status
tag oms.qnd.NginxStatus
host localhost
port 80
path /nginx_status
interval 10
@type record_modifier
#remove unused properties
remove_keys accepted,handled,total

Computer "#{Socket.gethostname}"
Provider QND
active ${record['active'].to_f}
reading ${record['reading'].to_f}
writing ${record['writing'].to_f}
waiting ${record['waiting'].to_f}

@type out_oms_api
log_level info
num_threads 1
buffer_chunk_limit 1m
buffer_type file
buffer_path /var/opt/microsoft/omsagent/state/out_oms_nginx*.buffer
buffer_queue_limit 10
buffer_queue_full_action drop_oldest_chunk
flush_interval 10s
retry_limit 10
retry_wait 30s
max_retry_wait 10m

Ingesting standard datasets

As I previously explained the OMS Linux agents provides several output plugins for FluentD, to ingest standard datatsets you can use the out_oms_blob and the out_oms plugins. All you have to know is the schema required by the plugin and by the platform in the cloud. In this article I’ll show how to ingest a performance dataset using the out_oms_blob plugin.
First of all you need to know the schema for the blob payload, for performance dataitems the json schema is

//pseudo schema for LINUX_PERF_BLOB DataItems
{
"Timestamp": {"type": "string"},
"Host": {"type": "string"},
"ObjectName": {"type": "string"},
"InstanceName": {"type": "string"},
"Collections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"CounterName": {"type": "string"},
"Value": {"type": "number"}
}
}
}
}

As you can see the blob is organized by performance Object Name, Instance Name and then a collection of counters with their values. Exactly as we’re accustomed on the Windows platform. Once you know the schema it’s a matter of creating an array of “LINUX_PERF_BLOB DataItems” and preparing the hashtable the out_oms_blob plugin expects, setting the DataType property to “LINUX_PERF_BLOB” and the IPName to “LogManagement”, as in the following example.

object['Timestamp']=OMS::Common.format_time(time)
object['Host']='myhostname'
object['ObjectName']='QNDProcessor'
object['InstanceName']='_Total'
object['Collections'] << {'CounterName'=>'% Processor Time', 'Value'=>80.0}

wrapper = {
"DataType"=>"LINUX_PERF_BLOB",
"IPName"=>"LogManagement",
"DataItems"=>data_items
}
router.emit(@tag, time, wrapper)

In case of an input plugin, as in the above example you just have to call the router.emit FluentD method the insert your data in the fluentD pipeline.
A working code that uses the above technique can be found in the in_qnd_kemp_rest plugin in the github repository of the project.
In case you’re coding a FluentD filter you just have to return the “wrapper” object in your “filter” method. You can find a working sample in the filter_kemp module in the project source code.

Building a complete OMS Solution

OMS Solutions format
ARM Templates
Can deploy any Azure ARM artifacts
Currently not a streamlined process
Create your views
Export the view
Copy and paste in the template
Add other artifacts needed
Deploy using standard ARM methods

Advertisements

, ,

  1. OMS e Linux, stato dell’arte

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: