Machine Learning

Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms

Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms

According to the OpenAI Gym GitHub repository “OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.”

Open AI Gym has an environment-agent arrangement. It simply means Gym gives you access to an “agent” which can perform specific actions in an “environment”. In return, it gets the observation and reward as a consequence of performing a particular action in the environment.

Lessons learnt while building an ETL pipeline for MongoDB & Amazon Redshift using Apache Airflow

Lessons learnt while building an ETL pipeline for MongoDB & Amazon Redshift using Apache Airflow

Recently, the author was involved in building a custom ETL(Extract-Transform-Load) pipeline using Apache Airflow which included extracting data from MongoDB collections and putting it into Amazon Redshift tables. 

Each ETL pipeline comes with a specific business requirement around processing data which is hard to be achieved using off-the-shelf ETL solutions. This is why a majority of ETL solutions are built manually, from scratch. In this blog, I am going to talk about my learnings around building an optimized, efficient, near real-time and fault tolerant custom ETL solution using Apache Airflow which involved moving data from MongoDB to Redshift.

Real Time Text Classification using Kafka and scikit-learn

Real Time Text Classification using Kafka and scikit-learn

Text classification is one of the important tasks in supervised machine learning (ML). Assigning categories to text, which can be tweets, facebook posts, web page, library book, media articles, gallery etc. has many applications like spam filtering, sentiment analysis etc.

In this blog we build a text classification engine to classify topics in an incoming Twitter stream using Apache Kafka and scikit-learn - Python based Machine Learning Library.

Building Stateless Bots using Rasa Stack

Building Stateless Bots using Rasa Stack

This blog aims at exploring the Rasa Stack to create a stateless chat-bot. We will look into how, the recently released Rasa Core, which provides machine learning based dialogue management, helps in maintaining the context of conversations using machine learning in an efficient way.

We will also build a sample chatbot using Rasa Core.

Chatbots with Google DialogFlow: Build a fun Reddit chatbot in 30 minutes

Chatbots with Google DialogFlow: Build a fun Reddit chatbot in 30 minutes

Google DialogFlow (formerly, api.ai) is a platform that provides a use-case specific, engaging voice and text-based conversations, powered by AI. In this blog, we will learn about DialogFlow and proceed to build a chatbot that can interact with Reddit.

Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

Let's see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

Surviving & thriving in the age of software accelerations

Enterprises need to adopt a new approach to software development and digital innovation. At Velotio, we are helping customers to modernize and transform their business with all of the approaches and best practices listed here in this blog. We talk in detail about how to achieve agility, cloud native development, DevOps maturity, micro-services adoption, digital transformation and build intelligent applications using data science in a secure environment.

Quickstart guide for building a serverless chatbot with Amazon Lex

Quickstart guide for building a serverless chatbot with Amazon Lex

Amazon Lex is a AWS service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. 

This blog is a detailed step-by-step tutorial for developing smart chatbots with serverless functions (Amazon Lambda).

Building an intelligent chatbot using Botkit and Rasa NLU

chatbot-banner-3.jpg

Bots are the flavour of the season. Everyday, we hear about a new bot catering to domains like travel, social, legal, support, sales, etc. being launched. Facebook Messenger alone has more than 11,000 bots when I last checked and must have probably added thousands of them as I write this article. The first generation of bots were dumb since they could understand only a limited set of queries based on keywords in the conversation. But the commoditisation of NLP(Natural Language Processing) and machine learning by services like Wit.ai, API.aiLuis.ai, Amazon Lex, IBM Watson, etc. has resulted in the growth of intelligent bots like donotpaychatShopper. I don’t know if bots are just hype or the real deal. But I can say with certainty that building a bot is fun and challenging at the same time. In this article, I would like to introduce you to some of the tools to build an intelligent chatbot.

The title of the blog clearly tells that we have used Botkit and Rasa (NLU) to build our bot. Before getting into the technicalities, I would like to share the reason for choosing these two platforms and how they fit our use case.

Bot development Framework — Howdy, Botkit and Microsoft (MS) Bot Framework were good contenders for this. Both these frameworks:
- are open source
- have integrations with popular messaging platforms like Slack, Facebook Messenger, Twilio etc
- have good documentation
- have an active developer community

Due to compliance issues, we had chosen AWS to deploy all our services and we wanted the same with the bot as well.

NLU (Natural Language Understanding) — API.ai (acquired by google) and Wit.ai (acquired by Facebook) are two popular NLU tools in the bot industry which we first considered for this task. Both the solutions:
- are hosted as a cloud service
- have Nodejs, Python SDK and a REST interface
- have good documentation
- support for state or contextual intents which makes it very easy to build a conversational platform on top of it.

As stated before, we couldn’t use any of these hosted solutions due to compliance and that is where we came across an open source NLU called Rasa which was a perfect replacement for API.ai and Wit.ai and at the same time, we could host and manage it on AWS.

You would now be wondering why I used the term NLU for Api.ai and Wit.ai and not NLP (Natural Language Processing). 
* NLP refers to all the systems which handle the interactions with humans in the way humans find it natural. It means that we could converse with a system just the way we talk to other human beings. 
* NLU is a subfield of NLP which handles a narrow but complex challenge of converting unstructured inputs into a structured form which a machine can understand and act upon. So when you say “Book a hotel for me in San Francisco on 20th April 2017”, the bot uses NLU to extract
date=20th April 2017, location=San Francisco and action=book hotel
which the system can understand.

RASA NLU

In this section, I would like to explain Rasa in detail and some terms used in NLP which you should be familiar with.
* Intent: This tells us what the user would like to do. 
Ex :  Raise a complaint, request for refund etc

* Entities: These are the attributes which gives details about the user’s task. Ex — Complaint regarding service disruptions, refund cost etc

* Confidence Score : This is a distance metric which indicates how closely the NLU could classify the result into the list of intents.

Here is an example to help you understand the above mentioned terms — 
Input: “My internet isn’t working since morning”.
    -  intent: 
      “service_interruption” 
     - entities: “service=internet”, 
      “duration=morning”.
     - confidence score: 0.84 (This could vary based on your training)

NLU’s job (Rasa in our case) is to accept a sentence/statement and give us the intent, entities and a confidence score which could be used by our bot. Rasa basically provides a high level API over various NLP and ML libraries which does intent classification and entity extraction. These NLP and ML libraries are called as backend in Rasa which brings the intelligence in Rasa. These are some of the backends used with Rasa

  • MITIE — This is an all inclusive library meaning that it has NLP library for entity extraction as well as ML library for intent classification built into it.
  • spaCy + sklearn — spaCy is a NLP library which only does entity extraction. sklearn is used with spaCy to add ML capabilities for intent classification.
  • MITIE + sklearn — This uses best of both the worlds. This uses good entity recognition available in MITIE along with fast and good intent classification in sklearn.

I have used MITIE backend to train Rasa. For the demo, I’ve taken a “Live Support ChatBot” which is trained for messages like this:
* My phone isn’t working.
* My phone isn’t turning on.
* My phone crashed and isn’t working anymore.

My training data looks like this:

NOTE — We have observed that MITIE gives better accuracy than spaCy + sklearn for a small training set but as you keep adding more intents, training on MITIE gets slower and slower. For a training set of 200+ examples with about 10–15 intents, MITIE takes about 35–45 minutes for us to train on a C4.4xlarge instance(16 cores, 30 GB RAM) on AWS.

This is a good tutorial on training Rasa with MITIE backend. If you are a beginner then you can refer to this doc to install Rasa.

Botkit-Rasa Integration

Botkit is an open source bot development framework designed by the creators of Howdy. It basically provides a set of tools for building bots on Facebook Messenger, Slack, Twilio, Kik and other popular platforms. They have also come up with an IDE for bot development called Botkit Studio. To summarize, Botkit is a tool which allows us to write the bot once and deploy it on multiple messaging platforms.

Botkit also has a support for middleware which can be used to extend the functionality of botkit. Integrations with database, CRM, NLU and statistical tools are provided via middleware which makes the framework extensible. This design also allows us to easily add integrations with other tools and software by just writing middleware modules for them.

I’ve integrated Slack and botkit for this demo. You can use this boilerplate template to setup botkit for Slack. We have extended Botkit-Rasa middleware which you can find here.

Botkit-Rasa has 2 functions: receive and hears which override the default botkit behaviour.
1. receive — This function is invoked when botkit receives a message. It sends the user’s message to Rasa and stores the intent and entities into the botkit message object.

2. hears — This function overrides the default botkit hears method i.e controller.hears. The default hears method uses regex to search the given patterns in the user’s message while the hears method from Botkit-Rasa middleware searches for the intent.

Let’s try an example — my phone is not turning on”.
Rasa will return the following
1. Intent — device_failure
2. Entites — device=phone

If you notice carefully, the input I gave i.e my phone is not turning on” is a not present in my training file. Rasa has some intelligence built into it to identify the intent and entities correctly for such combinations. 

We need to add a hears method listening to intent “device_failure” to process this input. Remember that intent and entities returned by Rasa will be stored in the message object by Rasa-Botkit middleware.

You should be able run this bot with slack and see the output as shown below (support_bot is the name of my bot).

You are now familiar with the process of building chatbots with a bot development framework and a NLU. Hope this helps you get started on your bot very quickly. If you have any suggestions, questions, feedback then tweet me @harjun1601. Keep following our blogs for more articles on bot development, ML and AI.


Full stack engineer, tech enthusiast, aspiring entrepreneur and coffee addict.
Arjun has a strong experience in designing and developing cloud native micro services. He has also worked as a DevOps engineer and has played a crucial role in setting up DevOps culture at various enterprises. He is currently exploring Machine Learning and NLP. Find him at @harjun1601.

Explanatory vs. Predictive Models in Machine Learning

My vision on Data Analysis is that there is continuum between explanatory models on one side and predictive models on the other side. The decisions you make during the modeling process depend on your goal. Let’s take Customer Churn as an example, you can ask yourself why are customers leaving? Or you can ask yourself which customers are leaving? The first question has as its primary goal to explain churn, while the second question has as its primary goal to predict churn. These are two fundamentally different questions and this has implications for the decisions you take along the way. The predictive side of Data Analysis is closely related to terms like Data Mining and Machine Learning.

Bots: Disruption or Bubble?

Bots are the new black! The entire tech industry seems to be buzzing with “bot” fever. Me and my co-founders often see a “bot” company and discuss it’s business model. We should consider that there are many types of “bots” — chat bots, voice bots, AI assistants, robotic process automation(RPA) bots, conversational agents within apps or websites, etc.

Over the last year, we have been building some interesting chat and voice based bots which has given me some interesting insights. I hope to lay down my thoughts on bots in some detail and with some structure.