Cloud-Native Data Engineering Virtual Internship
In this advanced virtual internship, students will design and build scalable, fault-tolerant data pipelines and analytics solutions on cloud platforms. They will gain expertise in technologies like Spark, Kafka, and Snowflake, and learn to architect cloud-native data engineering systems that can handle large-scale data processing and real-time analytics. By the end of the internship, students will have a portfolio of projects demonstrating their ability to create robust, scalable data solutions on the cloud.
Track Overview
Tasks & Milestones
Cloud Platform Comparison
AdvancedAnalyze the features, services, and pricing models of AWS, Azure, and GCP, and recommend the best platform for a given data engineering use case.
Designing Highly Available Architectures
AdvancedDesign a highly available and fault-tolerant cloud architecture for a data processing and analytics solution.
Batch Data Pipeline with Spark
AdvancedDesign and implement a batch data pipeline using Apache Spark to process and analyze large datasets.
Real-time Data Pipeline with Kafka
AdvancedDesign and implement a real-time data pipeline using Apache Kafka to process and analyze streaming data.
Snowflake Data Warehouse Design
AdvancedDesign a Snowflake-based data warehouse solution to support a company's data analytics requirements.
Integrating Snowflake with Data Pipelines
AdvancedIntegrate a Snowflake-based data warehouse with data pipelines built using Spark and Kafka.
Deploying Data Solutions with Terraform
AdvancedUse Terraform to deploy a cloud-based data engineering solution on AWS, Azure, or GCP.
Managing Data Solutions on Kubernetes
AdvancedDeploy and manage a cloud-based data engineering solution on a Kubernetes cluster.
Prerequisites
- • Experience with data engineering concepts
- • Proficiency in Python or Scala
- • Familiarity with cloud computing platforms
Certificate
Certificate of Completion
Earn a certificate upon successful completion