Data Science

Real Time Analytics for IoT Data using Mosquitto, AWS Kinesis and InfluxDB

Real Time Analytics for IoT Data using Mosquitto, AWS Kinesis and InfluxDB

Internet of things (IoT) is maturing rapidly and it is finding application across various industries. Every common device that we use is turning into the category of smart devices. Smart devices are basically IoT devices. These devices captures various parameters in and around their environment leading to generation of a huge amount of data. This data needs to be collected, processed, stored and analyzed in order to get actionable insights from them. To do so, we need to build data pipeline.

In this blog we will be building a similar pipeline using Mosquitto, Kinesis, InfluxDB and Grafana. We will discuss all these individual components of the pipeline and the steps to build it.

Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms

Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms

According to the OpenAI Gym GitHub repository “OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.”

Open AI Gym has an environment-agent arrangement. It simply means Gym gives you access to an “agent” which can perform specific actions in an “environment”. In return, it gets the observation and reward as a consequence of performing a particular action in the environment.

Lessons learnt while building an ETL pipeline for MongoDB & Amazon Redshift using Apache Airflow

Lessons learnt while building an ETL pipeline for MongoDB & Amazon Redshift using Apache Airflow

Recently, the author was involved in building a custom ETL(Extract-Transform-Load) pipeline using Apache Airflow which included extracting data from MongoDB collections and putting it into Amazon Redshift tables. 

Each ETL pipeline comes with a specific business requirement around processing data which is hard to be achieved using off-the-shelf ETL solutions. This is why a majority of ETL solutions are built manually, from scratch. In this blog, I am going to talk about my learnings around building an optimized, efficient, near real-time and fault tolerant custom ETL solution using Apache Airflow which involved moving data from MongoDB to Redshift.

Real Time Text Classification using Kafka and scikit-learn

Real Time Text Classification using Kafka and scikit-learn

Text classification is one of the important tasks in supervised machine learning (ML). Assigning categories to text, which can be tweets, facebook posts, web page, library book, media articles, gallery etc. has many applications like spam filtering, sentiment analysis etc.

In this blog we build a text classification engine to classify topics in an incoming Twitter stream using Apache Kafka and scikit-learn - Python based Machine Learning Library.

Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

Let's see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

Surviving & thriving in the age of software accelerations

Enterprises need to adopt a new approach to software development and digital innovation. At Velotio, we are helping customers to modernize and transform their business with all of the approaches and best practices listed here in this blog. We talk in detail about how to achieve agility, cloud native development, DevOps maturity, micro-services adoption, digital transformation and build intelligent applications using data science in a secure environment.

Explanatory vs. Predictive Models in Machine Learning

My vision on Data Analysis is that there is continuum between explanatory models on one side and predictive models on the other side. The decisions you make during the modeling process depend on your goal. Let’s take Customer Churn as an example, you can ask yourself why are customers leaving? Or you can ask yourself which customers are leaving? The first question has as its primary goal to explain churn, while the second question has as its primary goal to predict churn. These are two fundamentally different questions and this has implications for the decisions you take along the way. The predictive side of Data Analysis is closely related to terms like Data Mining and Machine Learning.