Big Data

A Beginner's Guide to Edge Computing

A Beginner's Guide to Edge Computing

There is a recent trend in change in architecture of the way data is stored and compute is done. Edge computing is one of such phenomena in which either the data or the compute is decentralized and taken to the nearest nodes of the user it can either be smartphone or local region servers.

In this blog we will delve into what Edge Computing really is, it’s various types, and see how it is implemented and managed in the real world.

Lessons learnt while building an ETL pipeline for MongoDB & Amazon Redshift using Apache Airflow

Lessons learnt while building an ETL pipeline for MongoDB & Amazon Redshift using Apache Airflow

Recently, the author was involved in building a custom ETL(Extract-Transform-Load) pipeline using Apache Airflow which included extracting data from MongoDB collections and putting it into Amazon Redshift tables. 

Each ETL pipeline comes with a specific business requirement around processing data which is hard to be achieved using off-the-shelf ETL solutions. This is why a majority of ETL solutions are built manually, from scratch. In this blog, I am going to talk about my learnings around building an optimized, efficient, near real-time and fault tolerant custom ETL solution using Apache Airflow which involved moving data from MongoDB to Redshift.