Backend Advanced Premium

Streaming Data Pipelines with Apache Spark Virtual Internship

In this advanced virtual internship, students will learn to develop real-time data processing pipelines using Apache Spark Structured Streaming. They will gain hands-on experience integrating with various data sources, building scalable and fault-tolerant data pipelines, and implementing advanced streaming analytics. Upon completion, students will be equipped with the skills to design and implement robust streaming data solutions for real-world applications.

weeks

6 tasks

0 enrolled

Track price: $49.00

Track Overview

This track provides hands-on experience and real-world projects to build your skills.

Tasks & Milestones

Set up a Spark Structured Streaming Development Environment

Advanced

In this task, students will set up a development environment for working with Apache Spark Structured Streaming, including installing the necessary software and configuring their development tools.

8 hours

Ingest and Process Streaming Data from Apache Kafka

Advanced

In this task, students will learn how to ingest data from an Apache Kafka cluster and perform basic processing using Spark Structured Streaming.

12 hours

Ingest and Process Streaming Data from AWS Kinesis

Advanced

In this task, students will learn how to ingest data from an AWS Kinesis stream and perform basic processing using Spark Structured Streaming.

12 hours

Implement Windowing Operations on Streaming Data

Advanced

In this task, students will learn how to use windowing operations to perform time-based analysis on streaming data.

10 hours

Implement Advanced Aggregations on Streaming Data

Advanced

In this task, students will learn how to perform advanced aggregations, such as sessionization and anomaly detection, on streaming data using Spark Structured Streaming.

12 hours

Implement Real-time Inference with Streaming Data and Machine Learning

Advanced

In this task, students will learn how to integrate a pre-trained machine learning model into a Spark Structured Streaming pipeline for real-time inference on streaming data.

15 hours

Prerequisites

• Proficiency in Python or Scala
• Experience with distributed systems and data processing frameworks
• Familiarity with relational databases and NoSQL data stores

Certificate

Certificate of Completion

Earn a certificate upon successful completion