The only free course you need to become a professional data engineer

Image by author

There are many courses and resources available on machine learning and data science, but very few on data engineering. This raises some questions. Is it a difficult field? Do you offer low salaries? Isn't it considered as exciting as other tech roles? However, the reality is that many companies are actively seeking data engineering talent and offering substantial salaries, sometimes exceeding $200,000. Data engineers play a crucial role as data platform architects, designing and building the fundamental systems that enable data scientists and machine learning experts to function effectively.

To address this gap in the industry, DataTalkClub has introduced a free, transformative bootcamp,”Data Engineering Zoomcamp“. This course is designed to train beginners or professionals looking to change careers, with essential skills and practical experience in data engineering.

This is a 6 week training camp where you will learn through multiple courses, reading materials, workshops and projects. At the end of each module, you will be assigned homework to practice what you have learned.

Week 1: Introduction to GCP, Docker, Postgres, Terraform and environment configuration.
Week 2: Workflow orchestration with Mage.
Week 3: Data Warehousing with BigQuery and Machine Learning with BigQuery.
Week 4: Analytical engineer with dbt, Google Data Studio and Metabase.
Week 5: Batch processing with Spark.
Week 6: Streaming with Kafka.

Picture of DataTalksClub/data-engineering-zoomcamp

The curriculum contains 6 modules, 2 workshops and a project that covers everything needed to become a professional data engineer.

Module 1: Master containerization and infrastructure as code

In this module, you will learn about Docker and Postgres, starting with the basics and moving through detailed tutorials on creating data pipelines, running Postgres with Docker, and more.

The module also covers essential tools such as pgAdmin, Docker-compose, and SQL upgrade topics, with optional content on Docker networking and a special tour for Linux users of the Windows subsystem. In the end, the course introduces you to GCP and Terraform, giving you a comprehensive understanding of containerization and infrastructure as code, essential for modern cloud-based environments.

Module 2: Workflow Orchestration Techniques

The module offers an in-depth exploration of Mage, an innovative open source hybrid framework for data transformation and integration. This module starts with the basics of workflow orchestration and continues with hands-on exercises with Mage, including configuring it through Docker and creating ETL pipelines from API to Postgres and Google Cloud Storage (GCS), and then to BigQuery.

The module's combination of videos, resources and practical tasks ensures a comprehensive learning experience, equipping students with the skills to manage sophisticated data workflows using Mage.

Workshop 1: Data ingestion strategies

In the first workshop, you will master creating efficient data ingestion pipelines. The workshop focuses on essential skills such as extracting data from APIs and files, normalizing and loading data, and incremental loading techniques. After completing this workshop, you will be able to create efficient data pipelines like a senior data engineer.

Module 3: Data Storage

The module is an in-depth exploration of data storage and analysis, focusing on data warehousing using BigQuery. It covers key concepts such as partitioning and clustering, and dives deeper into BigQuery best practices. The module progresses to advanced topics, particularly integrating Machine Learning (ML) with BigQuery, highlights the use of SQL for ML, and provides resources on hyperparameter tuning, feature preprocessing, and model deployment.

Module 4: Analytical Engineering

The analytical engineering module focuses on building a project using dbt (Data Build Tool) with an existing data warehouse, either BigQuery or PostgreSQL.

The module covers configuring dbt in both on-premises and cloud environments, introducing analytical engineering concepts, ETL vs ELT and data modeling. It also covers advanced dbt features such as incremental models, tags, hooks, and snapshots.

In the end, the module presents techniques for visualizing transformed data using tools such as Google Data Studio and Metabase, and provides resources for troubleshooting and efficient data loading.

Module 5: Batch Processing Proficiency

This module covers batch processing using Apache Spark, starting with introductions to batch processing and Spark, along with installation instructions for Windows, Linux, and MacOS.

It includes exploring Spark SQL and DataFrames, preparing data, performing SQL operations, and understanding Spark internals. Finally, it concludes by running Spark in the cloud and integrating Spark with BigQuery.

Module 6: The art of streaming data with Kafka

The module begins with an introduction to stream processing concepts, followed by an in-depth exploration of Kafka, including its fundamentals, integration with Confluent Cloud, and practical applications involving producers and consumers.

The module also covers Kafka configuration and streams, covering topics such as stream joins, testing, windowing, and using Kafka ksqldb & Connect. Additionally, it extends its focus to Python and JVM environments, introducing Faust for Python stream processing, Pyspark – Structured Streaming, and Scala examples for Kafka Streams.

Workshop 2: Stream processing with SQL

You'll learn how to process and manage streaming data with RisingWave, which provides a cost-effective solution with a PostgreSQL-style experience to power your stream processing applications.

Project: Real-world data engineering application

The goal of this project is to implement all the concepts we have learned in this course to build an end-to-end data pipeline. You will create a dashboard consisting of two tiles by selecting a data set, creating a pipeline to process the data and storing it in a data lake, building a pipeline to transfer the processed data from the data lake to a data warehouse, transforming the data into the data warehouse and preparing it for the dashboard, and finally building a dashboard to present the data visually.

2024 Cohort Details

Record: Enlist now
Start date: January 15, 2024, at 17:00 CET
Self-paced learning with guided support
Cohort folder with tasks and deadlines
Interactive loose community for peer learning

Previous requirements

Basic coding and command line skills.
Foundation in SQL
Python – beneficial but not required

Expert instructors leading your journey

Ankush Khanna
Victoria Perez Mola
Alexei Grigorev
Matt Palmer
Luis Oliveira
Michael Zapatero

Join our 2024 cohort and start learning with an amazing data engineering community. With expert-led training, hands-on experience, and a curriculum tailored to industry needs, this bootcamp not only equips you with the necessary skills but also positions you at the forefront of a lucrative and in-demand career path. Sign up today and transform your aspirations into reality!

Abid Ali Awan (@1abidaliawan) is a certified professional data scientist who loves building machine learning models. Currently, he focuses on content creation and writing technical blogs on data science and machine learning technologies. Abid has a Master's degree in technology Management and a Bachelor's degree in Telecommunications Engineering. His vision is to build an artificial intelligence product using a graph neural network for students struggling with mental illness.

The only free course you need to become a professional data engineer

Technical Terrence Team

TSX rises after yesterday's close at highest level since May 2022 By Investing.com

Leave a Reply Cancel reply

Recommended.

Paytrix raises $18.3 million to build its one-stop-shop for payments

Bitcoin consolidation over the weekend takes us to $69,300

Nvidia and AMD drive chip boom as industry awaits Qualcomm and ARM results

Conformer-based speech recognition in extreme edge computing devices

Daily Crunch: Bach to the future: The next Apple Music Classical app will include 5 million tracks

Categories

Important Links

The only free course you need to become a professional data engineer

Module 1: Master containerization and infrastructure as code

Module 2: Workflow Orchestration Techniques

Workshop 1: Data ingestion strategies

Module 3: Data Storage

Module 4: Analytical Engineering

Module 5: Batch Processing Proficiency

Module 6: The art of streaming data with Kafka

Workshop 2: Stream processing with SQL

Project: Real-world data engineering application

2024 Cohort Details

Previous requirements

Expert instructors leading your journey

Related

Technical Terrence Team

TSX rises after yesterday's close at highest level since May 2022 By Investing.com

Leave a Reply Cancel reply

Recommended.

Paytrix raises $18.3 million to build its one-stop-shop for payments

Bitcoin consolidation over the weekend takes us to $69,300

Nvidia and AMD drive chip boom as industry awaits Qualcomm and ARM results

Conformer-based speech recognition in extreme edge computing devices

Daily Crunch: Bach to the future: The next Apple Music Classical app will include 5 million tracks

Categories

Important Links

Get daily news updates to your inbox!