Intro to data science with Python


Intro to data science with Python

Course overview

This course is devoted to the usage of Python programing language in Data Science. Why Python? First of all, because it is a compelling programming language, used for many different applications. In recent years, many tools specifically for Data Science have been built. Analyzing data with Python has never been easier.

Machine learning is the opportunity to see the future. It is the art of finding the consequences of some phenomenon or action based on the set of reasons that caused this event, an opportunity to know about the results of the previous events. Knowledge of machine learning methods is aerobatics in Data Science.

The course will introduce a range of model-based and algorithmic machine learning methods including regression, several classification algorithms, decision trees, Naive Bayes, random forests, k-means clustering, etc. The course will cover the complete process of building prediction functions including data collection, feature creation and extraction, algorithms training and testing, its evaluation and improvement. You will solve problems of text (spam/ham) classification, house price regression building, prediction of whether a person was survived or not in the Titanic catastrophe and many others.

What you'll learn

Perhaps, you dealt with MS Excel and knew about how many things become possible with this tool. There is a wonderful python library which is more powerful and quick than Excel. It is called pandas. We will introduce you with this perfect tool and show how to process large and small datasets stored in various formats, how to transform, quantize, clean, filter and aggregate them and how to visualize data or some operations results with its help.

Here are the main points, which you will learn in our course:

  • the basics of Python programming language including its data structures, conditional statements, loops, generators, work with files and modules, OOP, etc.;
  • features of relational and so-called NoSQL databases (mainly, Neo4j, MongoDB and Cassandra), show its structure and how to save, extract and process data for different databases;
  • how to crawl Web pages, get access to Web API and how to interact with databases using Python.

Syllabus

WEEK 1. The crash course into Python

WEEK 2. Basic intro into pandas

WEEK 3. Basic intro into visualization with Python.

WEEK 4. Intro to Machine Learning with scikit-learn

WEEK 5. Advanced topics of Machine Learning with scikit-learn

WEEK 6. SQL with Python. Relational databases

WEEK 7. Web Scraping with Python

WEEK 8. Web API with Python. MongoDB. Cassandra

WEEK 9. The final project on https://www.kaggle.com/

Do you want to join Data Science School?