Amazing course, outstanding instructor! His lectures have provided me with tons of transferable skills and knowledge that I will be able to apply in a wide variety of related areas. — participant from Singapore
Many, if not most, of the major debates in modern social sciences revolve around questions that can be addressed with data. At the same time, the amount of available data and the number of publicly-available open-source tools for cleaning, transforming, analyzing, and visualizing data have increased exponentially since the turn of the millennium. With a few clicks, people can compare word frequencies in books over time or construct elaborate size-weighted word clouds – tasks that would have taken scholars weeks if not months of effort in the past.
This course introduces participants to those tools and the principles behind their use in the context of applications in the social sciences. It marries the substance of theory to the methodologies of data visualization and exploratory data analysis. The course is designed to serve as a standalone course, but can also serves as a gateway to more advanced data analysis classes.
This two-week, 40-hour course runs Monday-Friday, 9:00 am-1:00 pm, June 19-30, 2017.
Many, if not most, of the major debates in the modern social sciences revolve around questions that can be addressed with data. The sources of voting behavior, the correlates of war, the determinants of development, issues related to political economy, psychology, institutions, conflict, etc. All of these issues are amenable to data-based analysis.
At the same time, the amount of available data and the number of publicly-available open-source tools for cleaning, transforming, analyzing, and visualizing these data have increased exponentially over the last decade. This course introduces participants to these tools. It provides participants with a practical introduction to data visualization, exploratory data analysis, and inference, with an emphasis on social science applications. It combines social science theory with the various methodologies of data visualization and exploratory data analysis.
The course begins with an introduction to existing data visualization tools, followed by a more in-depth introduction to the statistical software and programming language R, which will be used for the bulk of the visualizations in the course. Following these topics, we will cover such topics as identification and causal inference, the estimation of uncertainty, and basic hypothesis tests in the context of data visualization. Good design will be discussed and emphasized throughout the course. The course also emphasizes hands-on learning by letting participants work on a project in their own area of interest that focuses on finding and analyzing data, assessing the structure of the data, and working through the most appropriate, succinct, and informative summaries and visualizations.
The format of the course is unusual as the lectures are available online and can be accessed via iTunes U on the laptops, tablets, or even smartphones. This allows participants to watch them at whatever time and speed is convenient to them, and they can also rewind and slow the recording down where more focused attention is required or to revisit difficult sections. Participants are expected to watch these lectures and to complete homework assignments prior to class so that class time can be used to review the lectures, discuss the homework assignments, work on exercises, and for one-on-one meetings that address participants' specific issues related to their individual projects and other data visualization questions and needs.
There are no prerequisites for this course.
Participants are expected to bring a WiFi-enabled laptop computer. Access to data, temporary licenses for the course software, and installation support will be provided by the Methods School.
Yau, Nathan. 2011. Visualize This: The FlowingData Guide to Design, Visualization, and Statistics. Indianapolis, IN: Wiley Publishing.
Mittal, Hrishi V. 2011. R Graphs Cookbook. Birmingham: Packt Publishing.
Teetor, Paul. 2011. R Cookbook. Sebastopol, CA: O'Reilly Media.
Short, Tom. 2004. R Reference Card.
Tufte, Edward R. 2001. The Visual Display of Quantitative Information. 2nd edition. Cheshire, CT: Graphics Press.