Spark is a new and very popular Big Data processing engine. Spark MLLib is a de facto standard for machine learning in Big Data.

This course is intended for data scientists and software engineers. It maintains an optimal balance of theory and practice. For each machine learning concept, we first discuss the foundations, its applicability and limitations. Then we explain the implementation and use, and specific use cases. This is achieved through a combination of about 50% lecture, 50% lab work.


  • familiarity with programming in at least one language
  • be able to navigate Linux command line
  • basic knowledge of command line Linux editors (VI / nano)

Course Outline: Please download PDF[sc:pdficon ]

Who should attend:

Data Scientists and Software Engineers

Need Help Signing Up For Training?

Please complete below form for help

* These fields are required.