00003-12 E StudentStudy | Advanced Data Science

Course offering details

Instructors: Dr. Florian Bader; Prof. Dr. Michael Scharkow; Marcel Alexander Schliebs; David Johannes Zimmermann

Event type: Seminar

Org-unit: Sociology, Politics & Economics

Displayed in timetable as: StudentStudy E

Hours per week: 6

Credits: 12,0

Location: Campus der Zeppelin Universität

Language of instruction: Englisch

Min. | Max. participants: 10 | 30

Priority scheme: Standard-Priorisierung

Course content:
The vision of the class is to create an understanding for Data Science and applied analytics to enable students to apply it to their own projects in academia and beyond. It shall open the world of data analysis and allow students to make their rst steps as a data scientist, quantitative researcher, or more computer-literate citizen and thereby support them for the challenges of the 21st century; may it be in research, consulting, nance, or just for analyzing their private geodata-running data. The objective of the class is to give the students an overview over the most important tools from the wide toolchain of Data Science, ranging from Data Management and reproducibility tools over advanced visualization up to Big Data Analytics with web scraping and an introduction into machine learning for scientists.
Over the course of the semester, the following topics will be covered

1. Introduction to Data Science with R
(a) Data Science Basics: train vs test sample, data science work
ow, supervised v unsupervised v
reinforcement learning/algorithms
(b) R Basics: Code, Functions, Libraries, Data Formats (xml, json, base64)

2. Advanced Data Analytics and the tidyverse (dplyr, tidyr, stringr + regex, forcats, purr, lubridate)

3. Advanced Data Visualization + Maps (ggplot2 and ggmaps)

4. Shipping Code: Package development and documentation (roxygen2 & testthat)

5. Automated Data Reporting (LaTeX, RMarkdown)

6. Reproducible Data Science (git, reproducible code, LaTeX, structure, docker?)

7. Interactivity and Apps Dashboards (shiny, htmlwidgets, lea
et)

8. Automated Webscraping for Big Data Analytics

9. Making sense of the \Big Data"
(a) Analysing Text the tidy-way: tidytext
(b) Network analysis/Spatial analysis in R (tidygraph + ggraph)

10. Databases and High-Performance Computing
(a) Databases and R-Interfaces + Job Automation (Shells, Cronjobs)
(b) High-Performance Computing (Parallel computing and Rcpp)

11. Arti cial Intelligence and Machine Learning: Neural Networks with Keras, Decision Trees, and
SVM

The goal for this class is to mirror our scienti c values: collaborative, transparent, open, and reproducible. The content was created in a collaborative draft process and will be open and online as it is created, for both the participants to learn as ecient as possible, as well as for future generations of students who want to get involved in data science.
This class enables students from all backgrounds (including CME, CCM, PAIR, and SPE) to become better researchers and prepares them for modern, quantitative research in all elds. It can thus be viewed as an advanced course that builds on the contents of a wide variety of courses such as Applied Statistics with R(PAIR and SPE), Econometrics and Quantitative Methods (CME) or the "Empiriepraktikum" (Module Political Sociology). In addition to that, it is also a complementary o er for students taking Advanced Methods classes (often focusing on the theoretical perspectives behind a certain methodology and their application in a limited, text-book-scenario setting) to apply the there-learned methods to large datasets gained from and to analyze real-world problems.
Finally, the course opens a window into to wide toolset of data science tools and provides the students with sources of inspiration to apply them to their own research projects not only in the term paper but also in Humboldt Projects, Bachelor, and Master Theses in which they might pursue a quantitative research question that requires advanced statistical methods or large quantities of (alternative sources) of data. Furthermore, the class teaches students valuable skills that prepare them for the job markets of the future, may it be academics, business analytics, nance, or consultancy.
The format of this StudentStudy could further be the starting point for developing the syllabi of a potential \ Computational Social Science" cluster in the university's research and teaching strategy, as it is currently under debate. If it proved to be successful during the rst semester, it could be further institutionalized and potentially  extended to an Online-Based MOOC/OpenEd-course serving as an attention-generating ZU agship offer.
Language of Instruction: English

Lecturers:
Prof. Michael Scharkow, ZU
Prof. Michael Scharkow holds the Chair of Digital Communication Science at Zeppelin University. In his research and teaching, he combines his his interest in statistical methods and quantitative computing to promote more open, transparent and reproducible research practices.
Dr. Florian Bader, ZU
Dr. Florian Bader is a Post-Doc at the Chair of Political Science. In his research, he is specialized on Computational Social Science, e.g. using Big Data Methods such as quantitative analyses of regional media content to predict violent attacks on refugee shelters.
Chris Hartgerink, Tilburg University
Chris Hartgerink is a PhD at the University of Tilburg, right now nishing his dissertation on detecting possible data fabrication in the social sciences. In his research, he has developed and applied statistical methods that, on the basis of information reported in scienti c papers, can give a rst indication for data fabrication. He is also a strong advocate of Open Research practices and has previously taught workshops on the topics of reproducibility and transparency using R tools.
David Zimmermann, University of Witten/Herdecke
David Zimmermann, PhD student in Finance at the University of Witten/Herdecke, researches the e ects automated trading has on the nancial market. His main interest lie in computational simulations, reproducible research, and a deeper understanding of how processes work in detail. He has taught bachelor and master classes and has held several workshops in nance, R, and Data Science.

Prerequisites:
Students should have a basic understanding of statistics and should have basic experience using R or equivalent statistical software. We will have an introduction session to Data Science and R, where we will also introduce the necessary libraries in-depth, therefore speci c library knowledge is not needed. If you have never  used R before or feel that you are lacking skills, we are more than happy to provide you with appropriate material to freshen-up during the summer More concretely, we will provide you with a self-designed online pre-course containing an introduction to R as well as DataCamp premium accounts.

Further information about the exams:
The grading is supposed to capture the in-depths understanding of the contents of the class by the students. Therefore we advise the following examinations:

 80% Term Paper
 20% Presentation

Appointments
Date From To Room Instructors
1 Mon, 3. Sep. 2018 10:00 16:00 Fab 3 | 2.09 David Johannes Zimmermann
2 Tue, 4. Sep. 2018 10:00 16:00 Fab 3 | 2.09 David Johannes Zimmermann
3 Wed, 5. Sep. 2018 10:00 16:00 Fab 3 | 2.09 Prof. Dr. Michael Scharkow
4 Th, 6. Sep. 2018 10:00 16:00 Fab 3 | 2.12 David Johannes Zimmermann
5 Fri, 7. Sep. 2018 10:00 16:00 Fab 3 | 2.12 David Johannes Zimmermann
6 Wed, 19. Sep. 2018 13:30 19:00 Fab 3 | 2.09 Prof. Dr. Michael Scharkow
7 Wed, 26. Sep. 2018 14:00 19:00 Fab 3 | 2.09 Prof. Dr. Michael Scharkow
8 Wed, 10. Oct. 2018 13:30 19:00 Fab 3 | 2.08 Dr. Florian Bader
9 Wed, 17. Oct. 2018 13:30 19:00 Fab 3 | 2.08 Dr. Florian Bader
10 Fri, 23. Nov. 2018 13:30 19:00 Fab 3 | 2.11 Dr. Florian Bader; Prof. Dr. Michael Scharkow; Marcel Alexander Schliebs; David Johannes Zimmermann
11 Sat, 24. Nov. 2018 10:00 16:00 Fab 3 | 2.11 David Johannes Zimmermann
Course specific exams
Description Date Instructors Compulsory pass
1. Midterm + Endterm Time tbd Yes
Class session overview
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
Instructors
Dr. Florian Bader
David Johannes Zimmermann
Prof. Dr. Michael Scharkow