Big Data Frameworks
This course examines current and emerging Big Data frameworks with focus on Data Science applications. The course starts with an introduction to MapReduce-based systems and then focuses on Spark and the Berkeley Data Analytics (BDAS) architecture. The course covers traditional MapReduce processes, streaming operation, machine learning and SQL integration. The course consists of the lectures and the assignments.
The course has an IRCnet channel #tkt-bdf.
Assignments are given by Ella Peltonen, Eemil Lagerspetz, and Mohammad Hoque.
Completing the course
The course consists of the lectures and the course assignments. The assignments are based on the Spark Big Data framework and the Scala programming language.
Instead of the first week exercise session, we have a Spark coding tutorial on Friday 13.3. at 10-12. Please bring your laptop with you, if you have one. You can install the latest Spark version beforehand.
The Scala Spark Tutorial 13.03.2015 slides are available here: http://is.gd/bigdatascala
The first exercise set is now out: link. Deadline is strictly 19.3. 2pm. returnings via Moodle. The first exercises will be discussed on Friday 20.3.
The second exercise set is available there. Deadline is 26.3. 2pm. please return your answers via Moodle. These exercises have been discussed on Friday 27.3. when there will also be a Q A for the exercise set three. Some hints included to the exercise set. Extended deadline 2.4. 2pm. Maximum number of points will be 5 if you use this opportunity. You can pick and do 5 that you are sure of, or do all 6 if you re not sure about one of them.
The third exercise set is now published. Deadline is 9.4. 2pm. please return via Moodle. These exercises will be discussed on Friday 10.4. after Easter. Because of the Easter break, we will not have an exercise on 3.4. Extended deadline 16.4. 2pm. Maximum number of points will be 5 if you use this opportunity. Please, return the entire solution set, also the exercises you are happy with from the first round.
On Friday 17.4. there is a Q A session instead of the exercise session. Prepare your questions beforehand.
The fourth (and last) exercise set is published. Deadline is 23.4. 2pm and returnings via Moodle as always. These exercises will be discussed on Friday 24.4. Nota that there will be no extension for this last exercise set.
Tentative lecture outline
7.4. Easter break
21.4. Two industry presentations (Nokia and F-Secure) on Big Data and Spark