Machine Learning

BigQuery 101: All the Basics You Need to Know

BigQuery 101: All the Basics You Need to Know

Google BigQuery is an enterprise data warehouse built using BigTable and Google Cloud Platform. It’s serverless and completely managed. BigQuery works great with all sizes of data, from a 100 row Excel spreadsheet to a Petabytes of data. Most importantly, it can execute a complex query on those data within a few seconds. We need to note before we proceed, BigQuery is not a transactional database. It takes around 2 seconds to run a simple query like ‘SELECT * FROM bigquery-public-data.object LIMIT 10’ on a 100 KB table with 500 rows. Hence, It shouldn’t be thought of as OLTP (Online Transaction Processing) database. BigQuery is for Big Data!

BigQuery supports SQL-like query, which makes it user-friendly and beginner friendly. It’s accessible via its web UI, command-line tool, or client library (written in C#, Go, Java, Node.js, PHP, Python, and Ruby). You can also take advantage of its REST APIs and get our job` done by sending a JSON request.

Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms

Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms

According to the OpenAI Gym GitHub repository “OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.”

Open AI Gym has an environment-agent arrangement. It simply means Gym gives you access to an “agent” which can perform specific actions in an “environment”. In return, it gets the observation and reward as a consequence of performing a particular action in the environment.

Lessons Learnt While Building an ETL Pipeline for MongoDB & Amazon Redshift Using Apache Airflow

Lessons Learnt While Building an ETL Pipeline for MongoDB & Amazon Redshift Using Apache Airflow

Recently, the author was involved in building a custom ETL(Extract-Transform-Load) pipeline using Apache Airflow which included extracting data from MongoDB collections and putting it into Amazon Redshift tables. 

Each ETL pipeline comes with a specific business requirement around processing data which is hard to be achieved using off-the-shelf ETL solutions. This is why a majority of ETL solutions are built manually, from scratch. In this blog, I am going to talk about my learnings around building an optimized, efficient, near real-time and fault tolerant custom ETL solution using Apache Airflow which involved moving data from MongoDB to Redshift.

Real Time Text Classification Using Kafka and Scikit-learn

Real Time Text Classification Using Kafka and Scikit-learn

Text classification is one of the important tasks in supervised machine learning (ML). Assigning categories to text, which can be tweets, facebook posts, web page, library book, media articles, gallery etc. has many applications like spam filtering, sentiment analysis etc.

In this blog we build a text classification engine to classify topics in an incoming Twitter stream using Apache Kafka and scikit-learn - Python based Machine Learning Library.

Your Complete Guide to Building Stateless Bots Using Rasa Stack

Your Complete Guide to Building Stateless Bots Using Rasa Stack

This blog aims at exploring the Rasa Stack to create a stateless chat-bot. We will look into how, the recently released Rasa Core, which provides machine learning based dialogue management, helps in maintaining the context of conversations using machine learning in an efficient way.

We will also build a sample chatbot using Rasa Core.

Chatbots With Google DialogFlow: Build a Fun Reddit Chatbot in 30 Minutes

Chatbots With Google DialogFlow: Build a Fun Reddit Chatbot in 30 Minutes

Google DialogFlow (formerly, api.ai) is a platform that provides a use-case specific, engaging voice and text-based conversations, powered by AI. In this blog, we will learn about DialogFlow and proceed to build a chatbot that can interact with Reddit.

Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

Machine Learning for your Infrastructure: Anomaly Detection with Elastic + X-Pack

We need a practical and scalable approach to understand the cause-effect relationship between data sources and events across complex infrastructure of VMs, containers, networks, micro-services, regions, etc. Machine learning is particularly useful for such problems where we need to identify “what changed”, since machine learning algorithms can easily analyze existing data to understand the patterns, thus making easier to recognize the cause. This is known as unsupervised learning, where the algorithm learns from the experience and identifies similar patterns when they come along again.

Let's see how you can setup Elastic + X-Pack to enable anomaly detection for your infrastructure & applications.

Surviving & Thriving in the Age of Software Accelerations

Surviving & Thriving in the Age of Software Accelerations

Enterprises need to adopt a new approach to software development and digital innovation. At Velotio, we are helping customers to modernize and transform their business with all of the approaches and best practices listed here in this blog. We talk in detail about how to achieve agility, cloud native development, DevOps maturity, micro-services adoption, digital transformation and build intelligent applications using data science in a secure environment.

A Quick Guide to Building a Serverless Chatbot With Amazon Lex

A Quick Guide to Building a Serverless Chatbot With Amazon Lex

Amazon Lex is a AWS service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. 

This blog is a detailed step-by-step tutorial for developing smart chatbots with serverless functions (Amazon Lambda).

Building an Intelligent Chatbot Using Botkit and Rasa NLU

Building an Intelligent Chatbot Using Botkit and Rasa NLU

Bots are the flavour of the season. Everyday, we hear about a new bot catering to domains like travel, social, legal, support, sales, etc. being launched. Facebook Messenger alone has more than 11,000 bots when I last checked and must have probably added thousands of them as I write this article.

The first generation of bots were dumb since they could understand only a limited set of queries based on keywords in the conversation. But the commoditisation of NLP (Natural Language Processing) and machine learning by services like Wit.ai, API.aiLuis.ai, Amazon Lex, IBM Watson, etc. has resulted in the growth of intelligent bots like donotpaychatShopper.

Explanatory vs. Predictive Models in Machine Learning

Explanatory vs. Predictive Models in Machine Learning

My vision on Data Analysis is that there is continuum between explanatory models on one side and predictive models on the other side. The decisions you make during the modeling process depend on your goal. Let’s take Customer Churn as an example, you can ask yourself why are customers leaving? Or you can ask yourself which customers are leaving? The first question has as its primary goal to explain churn, while the second question has as its primary goal to predict churn. These are two fundamentally different questions and this has implications for the decisions you take along the way. The predictive side of Data Analysis is closely related to terms like Data Mining and Machine Learning.

Bots: Disruption or Bubble?

Bots are the new black! The entire tech industry seems to be buzzing with “bot” fever. Me and my co-founders often see a “bot” company and discuss it’s business model. We should consider that there are many types of “bots” — chat bots, voice bots, AI assistants, robotic process automation(RPA) bots, conversational agents within apps or websites, etc.

Over the last year, we have been building some interesting chat and voice based bots which has given me some interesting insights. I hope to lay down my thoughts on bots in some detail and with some structure.