Redshift

Lessons learnt while building an ETL pipeline for MongoDB & Amazon Redshift using Apache Airflow

Lessons learnt while building an ETL pipeline for MongoDB & Amazon Redshift using Apache Airflow

Recently, the author was involved in building a custom ETL(Extract-Transform-Load) pipeline using Apache Airflow which included extracting data from MongoDB collections and putting it into Amazon Redshift tables. 

Each ETL pipeline comes with a specific business requirement around processing data which is hard to be achieved using off-the-shelf ETL solutions. This is why a majority of ETL solutions are built manually, from scratch. In this blog, I am going to talk about my learnings around building an optimized, efficient, near real-time and fault tolerant custom ETL solution using Apache Airflow which involved moving data from MongoDB to Redshift.