Intro to big data with Apache Spark 2.0


Intro to big data with Apache Spark 2.0

Course overview

  • Want to be a Data Scientist?
  • Should make Data Analysis for increasing your profit?
  • Want to know how to deal with Big Data?
  • Need to apply Machine Learning algorithms but don’t know the right tools? 

In our course, we will tell you about the main points of parallel, distributed and scalable machine learning. After successfully finishing all the Lessons and Projects, you will be able to process large data sets; to clean, transform and analyze structured and unstructured data; to build predictive models and make the evaluation of them.

Syllabus

WEEK 1. Introduction to Apache Spark 2.0

  • Lab 0. Virtual Machine installation

WEEK 2. Working with RDD and Key\Value pair of RDD

  • Lab 1. Basic operations with RDDs

WEEK 3. Spark SQL and Spark Streaming

  • Lab 2. Log files analysis with Apache Spark 2.0

WEEK 4. Machine Learning with MLlib

WEEK 5. Advanced Machine Learning

  • Lab 3. Predictive modeling with MLlib

WEEK 6. GraphX in Apache Spark 2.0

  • Lab 4. Introduction to Recommendation systems

WEEK 7. Big data: Use cases

WEEK 8. Final Project

Do you want to join Data Science School?