Data Science

A Quick Introduction to Data Analysis With Pandas

A Quick Introduction to Data Analysis With Pandas

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas aims to integrate the functionality of NumPy and matplotlib to give you a convenient tool for data analytics and visualization. It does more than just integration — it makes the usage far more better.

In this blog, I’ll give you a list of useful pandas snippets that can be reused over and over again. These will definitely save you some time that you might need skimming through the comprehensive Pandas docs.

Real Time Analytics for IoT Data using Mosquitto, AWS Kinesis and InfluxDB

Real Time Analytics for IoT Data using Mosquitto, AWS Kinesis and InfluxDB

Internet of things (IoT) is maturing rapidly and it is finding application across various industries. Every common device that we use is turning into the category of smart devices. Smart devices are basically IoT devices. These devices captures various parameters in and around their environment leading to generation of a huge amount of data. This data needs to be collected, processed, stored and analyzed in order to get actionable insights from them. To do so, we need to build data pipeline.

In this blog we will be building a similar pipeline using Mosquitto, Kinesis, InfluxDB and Grafana. We will discuss all these individual components of the pipeline and the steps to build it.

Explanatory vs. Predictive Models in Machine Learning

Explanatory vs. Predictive Models in Machine Learning

My vision on Data Analysis is that there is continuum between explanatory models on one side and predictive models on the other side. The decisions you make during the modeling process depend on your goal. Let’s take Customer Churn as an example, you can ask yourself why are customers leaving? Or you can ask yourself which customers are leaving? The first question has as its primary goal to explain churn, while the second question has as its primary goal to predict churn. These are two fundamentally different questions and this has implications for the decisions you take along the way. The predictive side of Data Analysis is closely related to terms like Data Mining and Machine Learning.