We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.
Jenkins is the most popular Continuous Integration and Continuous Delivery (CI/CD) server. Jenkins is used for managing complex CI/CD pipelines that support building, deploying and automating software. Every team has different needs and CI/CD is a process that needs heavy customization.
Recently, I needed to develop a complex Jenkins plug-in for a customer in the containers & DevOps space. In this process, I realized that there is lack of good documentation on Jenkins plugin development. That’s why I decided to write this blog to share my knowledge on Jenkins plugin development.
Elasticsearch is currently the most popular way to implement free text search in your application. This blog post is an introduction to Elasticsearch including components and data types. It covers the some of the basic but important concepts of Clusters, different types of Nodes, Documents, Mappings, Indices, and Shards.
Enterprises need to adopt a new approach to software development and digital innovation. At Velotio, we are helping customers to modernize and transform their business with all of the approaches and best practices listed here in this blog. We talk in detail about how to achieve agility, cloud native development, DevOps maturity, micro-services adoption, digital transformation and build intelligent applications using data science in a secure environment.
Containerized applications and Kubernetes adoption in cloud environments is on the rise. One of the challenges while deploying applications in Kubernetes though is exposing these containerised applications to the outside world. This blog explores different options via which applications can be externally accessed with focus on Ingress - a new feature in Kubernetes that provides an external load balancer. This blog also provides a simple hand-on tutorial on Google Cloud Platform (GCP).
Amazon Lex is a AWS service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions.
This blog is a detailed step-by-step tutorial for developing smart chatbots with serverless functions (Amazon Lambda).
Containerized applications are becoming more popular with each passing year. A reason for this rise in popularity could be the pivotal role that they play in Continuous Delivery by enabling fast and automated deployment of software services. Security still remains the major concern mainly because of the way container images are being used. This blog provides an answer to the below concerns:
- There are so many docker images readily available on dockerhub, but are you sure the one that you are using is not injecting any vulnerability into your environment?
- Do you know where your containers come from?
- Are your developers downloading container images and libraries from unknown and potentially harmful sources?
- Do the containers use third party library code that is obsolete or vulnerable?
Bots are the flavour of the season. Everyday, we hear about a new bot catering to domains like travel, social, legal, support, sales, etc. being launched. Facebook Messenger alone has more than 11,000 bots when I last checked and must have probably added thousands of them as I write this article. The first generation of bots were dumb since they could understand only a limited set of queries based on keywords in the conversation. But the commoditisation of NLP(Natural Language Processing) and machine learning by services like Wit.ai, API.ai, Luis.ai, Amazon Lex, IBM Watson, etc. has resulted in the growth of intelligent bots like donotpay, chatShopper. I don’t know if bots are just hype or the real deal. But I can say with certainty that building a bot is fun and challenging at the same time. In this article, I would like to introduce you to some of the tools to build an intelligent chatbot.
The title of the blog clearly tells that we have used Botkit and Rasa (NLU) to build our bot. Before getting into the technicalities, I would like to share the reason for choosing these two platforms and how they fit our use case.
Bot development Framework — Howdy, Botkit and Microsoft (MS) Bot Framework were good contenders for this. Both these frameworks:
- are open source
- have integrations with popular messaging platforms like Slack, Facebook Messenger, Twilio etc
- have good documentation
- have an active developer community
Due to compliance issues, we had chosen AWS to deploy all our services and we wanted the same with the bot as well.
NLU (Natural Language Understanding) — API.ai (acquired by google) and Wit.ai (acquired by Facebook) are two popular NLU tools in the bot industry which we first considered for this task. Both the solutions:
- are hosted as a cloud service
- have Nodejs, Python SDK and a REST interface
- have good documentation
- support for state or contextual intents which makes it very easy to build a conversational platform on top of it.
As stated before, we couldn’t use any of these hosted solutions due to compliance and that is where we came across an open source NLU called Rasa which was a perfect replacement for API.ai and Wit.ai and at the same time, we could host and manage it on AWS.
You would now be wondering why I used the term NLU for Api.ai and Wit.ai and not NLP (Natural Language Processing).
* NLP refers to all the systems which handle the interactions with humans in the way humans find it natural. It means that we could converse with a system just the way we talk to other human beings.
* NLU is a subfield of NLP which handles a narrow but complex challenge of converting unstructured inputs into a structured form which a machine can understand and act upon. So when you say “Book a hotel for me in San Francisco on 20th April 2017”, the bot uses NLU to extract
date=20th April 2017, location=San Francisco and action=book hotel
which the system can understand.
In this section, I would like to explain Rasa in detail and some terms used in NLP which you should be familiar with.
* Intent: This tells us what the user would like to do.
Ex : Raise a complaint, request for refund etc
* Entities: These are the attributes which gives details about the user’s task. Ex — Complaint regarding service disruptions, refund cost etc
* Confidence Score : This is a distance metric which indicates how closely the NLU could classify the result into the list of intents.
Here is an example to help you understand the above mentioned terms —
Input: “My internet isn’t working since morning”.
- entities: “service=internet”,
- confidence score: 0.84 (This could vary based on your training)
NLU’s job (Rasa in our case) is to accept a sentence/statement and give us the intent, entities and a confidence score which could be used by our bot. Rasa basically provides a high level API over various NLP and ML libraries which does intent classification and entity extraction. These NLP and ML libraries are called as backend in Rasa which brings the intelligence in Rasa. These are some of the backends used with Rasa
- MITIE — This is an all inclusive library meaning that it has NLP library for entity extraction as well as ML library for intent classification built into it.
- spaCy + sklearn — spaCy is a NLP library which only does entity extraction. sklearn is used with spaCy to add ML capabilities for intent classification.
- MITIE + sklearn — This uses best of both the worlds. This uses good entity recognition available in MITIE along with fast and good intent classification in sklearn.
I have used MITIE backend to train Rasa. For the demo, I’ve taken a “Live Support ChatBot” which is trained for messages like this:
* My phone isn’t working.
* My phone isn’t turning on.
* My phone crashed and isn’t working anymore.
My training data looks like this:
NOTE — We have observed that MITIE gives better accuracy than spaCy + sklearn for a small training set but as you keep adding more intents, training on MITIE gets slower and slower. For a training set of 200+ examples with about 10–15 intents, MITIE takes about 35–45 minutes for us to train on a C4.4xlarge instance(16 cores, 30 GB RAM) on AWS.
Botkit is an open source bot development framework designed by the creators of Howdy. It basically provides a set of tools for building bots on Facebook Messenger, Slack, Twilio, Kik and other popular platforms. They have also come up with an IDE for bot development called Botkit Studio. To summarize, Botkit is a tool which allows us to write the bot once and deploy it on multiple messaging platforms.
Botkit also has a support for middleware which can be used to extend the functionality of botkit. Integrations with database, CRM, NLU and statistical tools are provided via middleware which makes the framework extensible. This design also allows us to easily add integrations with other tools and software by just writing middleware modules for them.
Botkit-Rasa has 2 functions: receive and hears which override the default botkit behaviour.
1. receive — This function is invoked when botkit receives a message. It sends the user’s message to Rasa and stores the intent and entities into the botkit message object.
2. hears — This function overrides the default botkit hears method i.e controller.hears. The default hears method uses regex to search the given patterns in the user’s message while the hears method from Botkit-Rasa middleware searches for the intent.
Let’s try an example — “my phone is not turning on”.
Rasa will return the following
1. Intent — device_failure
2. Entites — device=phone
If you notice carefully, the input I gave i.e “my phone is not turning on” is a not present in my training file. Rasa has some intelligence built into it to identify the intent and entities correctly for such combinations.
We need to add a hears method listening to intent “device_failure” to process this input. Remember that intent and entities returned by Rasa will be stored in the message object by Rasa-Botkit middleware.
You should be able run this bot with slack and see the output as shown below (support_bot is the name of my bot).
You are now familiar with the process of building chatbots with a bot development framework and a NLU. Hope this helps you get started on your bot very quickly. If you have any suggestions, questions, feedback then tweet me @harjun1601. Keep following our blogs for more articles on bot development, ML and AI.
Full stack engineer, tech enthusiast, aspiring entrepreneur and coffee addict.
Arjun has a strong experience in designing and developing cloud native micro services. He has also worked as a DevOps engineer and has played a crucial role in setting up DevOps culture at various enterprises. He is currently exploring Machine Learning and NLP. Find him at @harjun1601.
My vision on Data Analysis is that there is continuum between explanatory models on one side and predictive models on the other side. The decisions you make during the modeling process depend on your goal. Let’s take Customer Churn as an example, you can ask yourself why are customers leaving? Or you can ask yourself which customers are leaving? The first question has as its primary goal to explain churn, while the second question has as its primary goal to predict churn. These are two fundamentally different questions and this has implications for the decisions you take along the way. The predictive side of Data Analysis is closely related to terms like Data Mining and Machine Learning.
Containers are a disruptive technology and is being adopted by startups and enterprises alike. Whenever a new infrastructure technology comes along, two areas require a lot of innovation - storage & networking. Anyone who is adopting containers would have faced challenges in these two areas.
Flannel is an overlay network that helps to connect containers across multiple hosts. This blog provides an overview of container networking followed by details of Flannel.