Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

Priyanka Verma

Artificial Intelligence / Machine Learning

Introduction

The world continues to go through digital transformation at an accelerating pace. Modern applications and infrastructure continues to expand and operational complexity continues to grow. According to a recent ManageEngine Application Performance Monitoring Survey:

  • 28 percent use ad-hoc scripts to detect issues in over 50 percent of their applications.
  • 32 percent learn about application performance issues from end users.
  • 59 percent trust monitoring tools to identify most performance deviations.

Most enterprises and web-scale companies have instrumentation & monitoring capabilities with an ElasticSearch cluster. They have a high amount of collected data but struggle to use it effectively. This available data can be used to improve availability and effectiveness of performance and uptime along with root cause analysis and incident prediction

IT Operations & Machine Learning

Here is the main question: How to make sense of the huge piles of collected data? The first step towards making sense of data is to understand the correlations between the time series data. But only understanding will not work since correlation does not imply causation. We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc.

It’s very likely that due to one component something goes wrong with another component. In such cases, operational historical data can be used to identify the root cause by investigating through a series of intermediate causes and effects. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

Let's see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

Anomaly Detection using Elastic's machine learning with X-Pack

Step I: Setup

1. Setup Elasticsearch: 

According to Elastic documentation, it is recommended to use the Oracle JDK version 1.8.0_131. Check if you have required Java version installed on your system. It should be at least Java 8, if required install/upgrade accordingly.

  • Download elasticsearch tarball and untar it

CODE: https://gist.github.com/velotiotech/82b3a8e28e033b3a85c5a89780bb1cc6.js

  • It will then create a folder named elasticsearch-5.5.1. Go into the folder.

CODE: https://gist.github.com/velotiotech/7974ab22750ab24089e5e4c1a4bc0d50.js

  • Install X-Pack into Elasticsearch

CODE: https://gist.github.com/velotiotech/01f3dab07737432a2be60046c07f64de.js

  • Start elasticsearch

CODE: https://gist.github.com/velotiotech/ced3b3d4adc17b82962147a063806b5a.js

2. Setup Kibana

Kibana is an open source analytics and visualization platform designed to work with Elasticsearch.

  • Download kibana tarball and untar it

CODE: https://gist.github.com/velotiotech/4761c67eeaaee0416ad9350098282250.js

  • It will then create a folder named kibana-5.5.1. Go into the directory.

CODE: https://gist.github.com/velotiotech/d89c557777c07219d9650e367ec321fa.js

  • Install X-Pack into Kibana

CODE: https://gist.github.com/velotiotech/712bb09f80f64e61c6fb2b5902cbd04e.js

  • Running kibana

CODE: https://gist.github.com/velotiotech/113b75f08f605be2cc159860f9ac19ec.js

  • Navigate to Kibana at http://localhost:5601/
  • Log in as the built-in user elastic and password changeme.
  • You will see the below screen:
 Kibana: X-Pack Welcome Page
Kibana: X-Pack Welcome Page

 

3. Metricbeat:

Metricbeat helps in monitoring servers and the services they host by collecting metrics from the operating system and services. We will use it to get CPU utilization metrics of our local system in this blog.

  • Download Metric Beat's tarball and untar it

CODE: https://gist.github.com/velotiotech/7fa47f82727d4900f44ffc15d09cce8e.js

  • It will create a folder metricbeat-5.5.1-linux-x86_64. Go to the folder

CODE: https://gist.github.com/velotiotech/fc74e3ba89bd9976e918ec3f188b7cb5.js

  • By default, Metricbeat is configured to send collected data to elasticsearch running on localhost. If your elasticsearch is hosted on any server, change the IP and authentication credentials in metricbeat.yml file.
 Metricbeat Config
 Metricbeat Config

 

  • Metric beat provides following stats:
  • System load
  • CPU stats
  • IO stats
  • Per filesystem stats
  • Per CPU core stats
  • File system summary stats
  • Memory stats
  • Network stats
  • Per process stats
  • Start Metricbeat as daemon process

CODE: https://gist.github.com/velotiotech/a8f9c2b9b4aa62e1c73a963522981dde.js

Now, all setup is done. Let’s go to step 2 to create machine learning jobs. 

Step II: Time Series data

  • Real-time data: We have metricbeat providing us the real-time series data which will be used for unsupervised learning. Follow below steps to define index pattern metricbeat-*  in Kibana to search against this pattern in Elasticsearch:
    - Go to Management -> Index Patterns  
    - Provide Index name or pattern as metricbeat-*
    - Select Time filter field name as @timestamp
    - Click Create

You will not be able to create an index if elasticsearch did not contain any metric beat data. Make sure your metric beat is running and output is configured as elasticsearch.

Kibana - Time Series Data
  • Saved Historic data: Just to see quickly how machine learning detect the anomalies you can also use data provided by Elastic. Download sample data by clicking here.
  • Unzip the files in a folder: tar -zxvf server_metrics.tar.gz
  • Download this script. It will be used to upload sample data to elastic.
  • Provide execute permissions to the file: chmod +x upload_server-metrics.sh
  • Run the script.
  • As we created index pattern for metricbeat data, in same way create index pattern server-metrics*

Step III: Creating Machine Learning jobs

There are two scenarios in which data is considered anomalous. First, when the behavior of key indicator changes over time relative to its previous behavior. Secondly, when within a population behavior of an entity deviates from other entities in population over single key indicator.

To detect these anomalies, there are three types of jobs we can create:

  1. Single Metric job: This job is used to detect Scenario 1 kind of anomalies over only one key performance indicator.
  2. Multimetric job: Multimetric job also detects Scenario 1 kind of anomalies but in this type of job we can track more than one performance indicators, such as CPU utilization along with memory utilization.
  3. Advanced job: This kind of job is created to detect anomalies of type 2.

For simplicity, we are creating following single metric jobs:

  1. Tracking CPU Utilization: Using metric beat data
  2. Tracking total requests made on server: Using sample server data

Follow below steps to create single metric jobs:

Job1: Tracking CPU Utilization

Job2: Tracking total requests made on server

  • Go to http://localhost:5601/
  • Go to Machine learning tab on the left panel of Kibana.
  • Click on Create new job
  • Click Create single metric job
  • Select index we created in Step 2 i.e. metricbeat-* and server-metrics* respectively
  • Configure jobs by providing following values:
  1. Aggregation: Here you need to select an aggregation function that will be applied to a particular field of data we are analyzing.
  2. Field: It is a drop down, will show you all field that you have w.r.t index pattern.
  3. Bucket span: It is interval time for analysis. Aggregation function will be applied on selected field after every interval time specified here.
  • If your data contains so many empty buckets i.e. data is sparse and you don’t want to consider it as anomalous check the checkbox named sparse data  (if it appears).
  • Click on Use full <index pattern=""> data to use all available data for analysis.</index>
 Metricbeats Description
Metricbeats Description
 Server Description
Server Description
  • Click on play symbol
  • Provide job name and description
  • Click on Create Job

After creating job the data available will be analyzed. Click on view results, you will see a chart which will show the actual and upper & lower bound of predicted value. If actual value lies outside of the range, it will be considered as anomalous. The Color of the circles represents the severity level.

Data Prediction
Here we are getting a high range of prediction values since it just started learning. As we get more data the prediction will get better.
Data Prediction
You can see here predictions are pretty good since there is a lot of data to understand the pattern
  • Click on machine learning tab in the left panel. The jobs we created will be listed here.
  • You will see the list of actions for every job you have created.
  • Since we are storing every minute data for Job1 using metricbeat. We can feed the data to the job in real time. Click on play button to start data feed. As we get more and more data prediction will improve.
  • You see details of anomalies by clicking Anomaly Viewer.
  Anomaly in metricbeats data
Anomaly in metricbeats data
  Server metrics anomalies
Server metrics anomalies  

We have seen how machine learning can be used to get patterns among the different statistics along with anomaly detection. After identifying anomalies, it is required to find the context of those events. For example, to know about what other factors are contributing to the problem? In such cases, we can troubleshoot by creating multimetric jobs.

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

Introduction

The world continues to go through digital transformation at an accelerating pace. Modern applications and infrastructure continues to expand and operational complexity continues to grow. According to a recent ManageEngine Application Performance Monitoring Survey:

  • 28 percent use ad-hoc scripts to detect issues in over 50 percent of their applications.
  • 32 percent learn about application performance issues from end users.
  • 59 percent trust monitoring tools to identify most performance deviations.

Most enterprises and web-scale companies have instrumentation & monitoring capabilities with an ElasticSearch cluster. They have a high amount of collected data but struggle to use it effectively. This available data can be used to improve availability and effectiveness of performance and uptime along with root cause analysis and incident prediction

IT Operations & Machine Learning

Here is the main question: How to make sense of the huge piles of collected data? The first step towards making sense of data is to understand the correlations between the time series data. But only understanding will not work since correlation does not imply causation. We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc.

It’s very likely that due to one component something goes wrong with another component. In such cases, operational historical data can be used to identify the root cause by investigating through a series of intermediate causes and effects. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

Let's see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

Anomaly Detection using Elastic's machine learning with X-Pack

Step I: Setup

1. Setup Elasticsearch: 

According to Elastic documentation, it is recommended to use the Oracle JDK version 1.8.0_131. Check if you have required Java version installed on your system. It should be at least Java 8, if required install/upgrade accordingly.

  • Download elasticsearch tarball and untar it

CODE: https://gist.github.com/velotiotech/82b3a8e28e033b3a85c5a89780bb1cc6.js

  • It will then create a folder named elasticsearch-5.5.1. Go into the folder.

CODE: https://gist.github.com/velotiotech/7974ab22750ab24089e5e4c1a4bc0d50.js

  • Install X-Pack into Elasticsearch

CODE: https://gist.github.com/velotiotech/01f3dab07737432a2be60046c07f64de.js

  • Start elasticsearch

CODE: https://gist.github.com/velotiotech/ced3b3d4adc17b82962147a063806b5a.js

2. Setup Kibana

Kibana is an open source analytics and visualization platform designed to work with Elasticsearch.

  • Download kibana tarball and untar it

CODE: https://gist.github.com/velotiotech/4761c67eeaaee0416ad9350098282250.js

  • It will then create a folder named kibana-5.5.1. Go into the directory.

CODE: https://gist.github.com/velotiotech/d89c557777c07219d9650e367ec321fa.js

  • Install X-Pack into Kibana

CODE: https://gist.github.com/velotiotech/712bb09f80f64e61c6fb2b5902cbd04e.js

  • Running kibana

CODE: https://gist.github.com/velotiotech/113b75f08f605be2cc159860f9ac19ec.js

  • Navigate to Kibana at http://localhost:5601/
  • Log in as the built-in user elastic and password changeme.
  • You will see the below screen:
 Kibana: X-Pack Welcome Page
Kibana: X-Pack Welcome Page

 

3. Metricbeat:

Metricbeat helps in monitoring servers and the services they host by collecting metrics from the operating system and services. We will use it to get CPU utilization metrics of our local system in this blog.

  • Download Metric Beat's tarball and untar it

CODE: https://gist.github.com/velotiotech/7fa47f82727d4900f44ffc15d09cce8e.js

  • It will create a folder metricbeat-5.5.1-linux-x86_64. Go to the folder

CODE: https://gist.github.com/velotiotech/fc74e3ba89bd9976e918ec3f188b7cb5.js

  • By default, Metricbeat is configured to send collected data to elasticsearch running on localhost. If your elasticsearch is hosted on any server, change the IP and authentication credentials in metricbeat.yml file.
 Metricbeat Config
 Metricbeat Config

 

  • Metric beat provides following stats:
  • System load
  • CPU stats
  • IO stats
  • Per filesystem stats
  • Per CPU core stats
  • File system summary stats
  • Memory stats
  • Network stats
  • Per process stats
  • Start Metricbeat as daemon process

CODE: https://gist.github.com/velotiotech/a8f9c2b9b4aa62e1c73a963522981dde.js

Now, all setup is done. Let’s go to step 2 to create machine learning jobs. 

Step II: Time Series data

  • Real-time data: We have metricbeat providing us the real-time series data which will be used for unsupervised learning. Follow below steps to define index pattern metricbeat-*  in Kibana to search against this pattern in Elasticsearch:
    - Go to Management -> Index Patterns  
    - Provide Index name or pattern as metricbeat-*
    - Select Time filter field name as @timestamp
    - Click Create

You will not be able to create an index if elasticsearch did not contain any metric beat data. Make sure your metric beat is running and output is configured as elasticsearch.

Kibana - Time Series Data
  • Saved Historic data: Just to see quickly how machine learning detect the anomalies you can also use data provided by Elastic. Download sample data by clicking here.
  • Unzip the files in a folder: tar -zxvf server_metrics.tar.gz
  • Download this script. It will be used to upload sample data to elastic.
  • Provide execute permissions to the file: chmod +x upload_server-metrics.sh
  • Run the script.
  • As we created index pattern for metricbeat data, in same way create index pattern server-metrics*

Step III: Creating Machine Learning jobs

There are two scenarios in which data is considered anomalous. First, when the behavior of key indicator changes over time relative to its previous behavior. Secondly, when within a population behavior of an entity deviates from other entities in population over single key indicator.

To detect these anomalies, there are three types of jobs we can create:

  1. Single Metric job: This job is used to detect Scenario 1 kind of anomalies over only one key performance indicator.
  2. Multimetric job: Multimetric job also detects Scenario 1 kind of anomalies but in this type of job we can track more than one performance indicators, such as CPU utilization along with memory utilization.
  3. Advanced job: This kind of job is created to detect anomalies of type 2.

For simplicity, we are creating following single metric jobs:

  1. Tracking CPU Utilization: Using metric beat data
  2. Tracking total requests made on server: Using sample server data

Follow below steps to create single metric jobs:

Job1: Tracking CPU Utilization

Job2: Tracking total requests made on server

  • Go to http://localhost:5601/
  • Go to Machine learning tab on the left panel of Kibana.
  • Click on Create new job
  • Click Create single metric job
  • Select index we created in Step 2 i.e. metricbeat-* and server-metrics* respectively
  • Configure jobs by providing following values:
  1. Aggregation: Here you need to select an aggregation function that will be applied to a particular field of data we are analyzing.
  2. Field: It is a drop down, will show you all field that you have w.r.t index pattern.
  3. Bucket span: It is interval time for analysis. Aggregation function will be applied on selected field after every interval time specified here.
  • If your data contains so many empty buckets i.e. data is sparse and you don’t want to consider it as anomalous check the checkbox named sparse data  (if it appears).
  • Click on Use full <index pattern=""> data to use all available data for analysis.</index>
 Metricbeats Description
Metricbeats Description
 Server Description
Server Description
  • Click on play symbol
  • Provide job name and description
  • Click on Create Job

After creating job the data available will be analyzed. Click on view results, you will see a chart which will show the actual and upper & lower bound of predicted value. If actual value lies outside of the range, it will be considered as anomalous. The Color of the circles represents the severity level.

Data Prediction
Here we are getting a high range of prediction values since it just started learning. As we get more data the prediction will get better.
Data Prediction
You can see here predictions are pretty good since there is a lot of data to understand the pattern
  • Click on machine learning tab in the left panel. The jobs we created will be listed here.
  • You will see the list of actions for every job you have created.
  • Since we are storing every minute data for Job1 using metricbeat. We can feed the data to the job in real time. Click on play button to start data feed. As we get more and more data prediction will improve.
  • You see details of anomalies by clicking Anomaly Viewer.
  Anomaly in metricbeats data
Anomaly in metricbeats data
  Server metrics anomalies
Server metrics anomalies  

We have seen how machine learning can be used to get patterns among the different statistics along with anomaly detection. After identifying anomalies, it is required to find the context of those events. For example, to know about what other factors are contributing to the problem? In such cases, we can troubleshoot by creating multimetric jobs.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings