How I Built A Cascading Data Pipeline Based on AWS (Part 1)
Today I'm going to share some experience of building a data engineering project that I always take pride in. You are going to learn the reasons behind why I used the tools and AWS components, and how I designed the architecture.
Disclaimer: The content of this text is inspired by my experience with an unnamed entity. However, certain critical commercial interests and details have intentionally been replaced with fictional data/codes or omitted, for the purpose of maintaining confidentiality and privacy. Therefore, the full and accurate extent of the actual commercial interests involved is reserved.
Prerequisites
- Knowledge of Python
- Understanding of AWS components, such as DynamoDB, Lambda serverless, SQS and CloudWatch
- Comfortable coding experience with YAML & SAM CLI
Background
Let's say you are a data engineer and you need to constantly update the data in the warehouse. For example, you are responsible to sync up with the sales records of Dunder Mifflin Paper Co. on a regular basis. (I understand this is not a realistic scenario but have fun