Quantitative Text Analysis II   

Professor Sagarzazu is a great teacher, enabling us to really understand and apply complex concepts and methods. He also took the time to talk and help me with my own research. — participant from Canada

This course covers advanced techniques and methods of quantitative text analysis that allow participants to systematically extract information from text. It combines theoretical sessions with practical exercises that let participants immediately apply analytical techniques to text and practice their newly acquired methods skills. The course starts with a quick review of dictionary-based approaches to scaling, topic coding, and sentiment analysis, but quickly moves on to more sophisticated techniques for the quantitative analysis of texts. The course covers and discusses the differences between supervised and unsupervised mechanisms of text analysis and explores some of the mathematical and statistical background of these techniques. By the end of the course, participants will have also learned how to incorporate and use the results of applied quantitative text analysis in further statistical analyses.

This course is the second part in a two-course sequence. It requires participants to be familiar with the material covered by the introductory Quantitative Text Analysis I or have prior experience with text analysis.


Dates

This one-week, 20-hour course runs Monday-Friday, 9:00 am-1:00 pm, June 26-30, 2017.


Instructor

Iñaki Sagarzazu (picture), Texas Tech University


Detailed Description

Building on the material covered by the first course in the two-course text analysis sequence (cf. Quantitative Text Analysis I), this course covers advanced techniques of quantitative text analysis and provides participants with the skills to immediately apply these methods that allow them to systematically extract and to analyze information from text.

The course starts with a quick review of introductory techniques, such as dictionary-based approaches to scaling, topic coding, and sentiment analysis, as well as the basic theoretical concepts for quantitative text analysis. After this initial review, the course quickly moves on to more sophisticated techniques for the quantitative analysis of text, combining theoretical sessions with practical exercises.

The course discusses the differences between supervised and unsupervised mechanisms of text analysis and explores some of the mathematical and statistical background of these techniques. It covers one unsupervised scaling technique and two topic coding techniques – one supervised and one unsupervised – that are state-of-the-art in the social science literature on text analysis.

The final session of this course looks into how to incorporate and use the results of applied quantitative text analysis in further statistical analyses for interpretation and inference and to make best use of text analysis in various research designs and as part of the participants’ own research projects.

Depending on the participants' research projects and interests, the course covers the topics of 'spidering' and data 'scraping' and offers practical solutions to problems arising from acquiring, pre-processing, and storing large numbers of texts, e.g., from government or NGO websites. It also addresses issues arising from non-English language materials as well as scripts, word segmentation, etc.


Prerequisites

We strongly encourage participants to combine this course with the introductory Quantitative Text Analysis I. Alternatively, participants should have prior experience with text analysis and some familiarity with the statistical software R.


Requirements

Participants are expected to bring a WiFi-enabled laptop computer. Access to data, temporary licenses for the course software, and installation support will be provided by the Methods School.


Core Readings

Slapin, Jonathan, and Sven-Oliver Proksch. 2008. A Scaling Model for Estimating time Series Policy Positions from Texts. American Journal of Political Science 52: 705-722.

Hjorth, Frederik, Robert Klemmensen, Sara Hobolt, Martin Ejnar Hansen, and Peter Kurrild-Klitgaard. 2015. Computers, Coders, and Voters: Comparing Automated Methods for Estimating Party Positions. Research and Politics 2: 1-9.

Pardos-Prado, Sergi, and Iñaki Sagarzazu. 2016. The Political Conditioning of Subjective Economic Evaluations: The Role of Party Discourse. British Journal of Political Science 46: 799-823.

Grimmer, Justin. 2010. A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases. Political Analysis 18: 1-35.

Sagarzazu, Iñaki, and Heike Kluver. 2015. Coalition Governments and Party Competition: Political Communication Strategies of Coalition Parties. Political Science Research and Methods.

Grimmer, Justin, and Brandon Stewart. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis 21: 267-297.


Suggested Readings

Krippendorff, Klaus H. 2013. Content Analysis: An Introduction to Its Methodology. 3rd edition. Thousand Oaks, CA: Sage Publications.

Neuendorf, Kimberly A. 2002. The Content Analysis Guidebook. Thousand Oaks, CA: Sage Publications.

Kluver, Heike. 2009. Measuring Interest Group Influence Using Quantitative Text Analysis. European Union Politics 10: 535–549.

Laver, Michael, and John Garry. 2000. Estimating Policy Positions from Political Texts. American Journal of Political Science 44: 619–634.

Hu, Minqing, and Bing Liu. 2004. Mining and Summarizing Customer Reviews. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168-177.

Laver, Michael, Kenneth Benoit, and John Garry. 2003. Extracting Policy Positions from Political Texts Using Word as Data. American Political Science Review 97: 311–331.


Register Now