# Regression Analysis

Prof. Pollins welcomes questions and makes sure everyone understands, explaining concepts in two, three ways if needed. I learned so much more than I expected. — ICPSR Summer Program student

This course focuses on single-equation regression techniques and provides participants with the tools they need to quantitatively test their theories. It prepares them for advanced work in statistical modeling. Following an introduction to matrix algebra, the ordinary least squares (OLS) estimator, and a review of its Gaussian properties, the course relaxes individual OLS assumptions. It explores the generalized least squares (GLS) estimator, and participants learn to diagnose and treat various regression pathologies, using the 'best practice' procedures of modern social science research. The course finishes with a look at nonlinear regression models, estimated via maximum likelihood. While the course covers the theoretical and mathematical underpinnings of regression, the emphasis is on learning by doing, hands-on exercises, and applying the 'Swiss army knife' of statistics.

This course only requires a minimal background in statistics and mathematics and serves as a gateway to more advanced data analysis classes.

## Dates

This course was offered in 2017.

## Instructor

Brian M. Pollins (picture), Ohio State University

## Detailed Description

This course focuses on single-equation regression techniques and is designed to provide participants with a thorough foundation in regression analysis. It provides them with the tools needed for the quantitative testing of theories and prepares them for advanced work in statistical modeling, such as structural equation modeling, advanced maximum likelihood estimation, or Bayesian analysis.

Following a brief review of correlation and bivariate regression and and introduction to matrix algebra, the course substantially enhances participants' understanding of regression by using matrix algebra to first (re)derive the OLS estimator, and then introduce the GLS estimator, which allows us to relax each of the simplifying assumptions of OLS. Participants learn to diagnose and treat problems related to collinearity, non-uniform error variance (heteroskedasticity), autocorrelation, measurement error, etc. with the help of the 'best practice' procedures of modern social science research. The course finishes with a look at nonlinear regression models, estimated via maximum likelihood.

Being taught using matrix algebra, this course allows participants to easily move beyond regression as all advanced statistics relies on matrix algebra, and all statistical software is written in matrix algebra. However, while it provides a thorough mathematical foundation, the aim is not to 'drown' participants in equations and theory. Instead, participants learn by doing and through hands-on exercises that train their intuition for the flexible and versatile tool that is regression analysis.

Exercises involve the use of the statistical software and programming language R. Participants favoring other statistical software packages (e.g., SPSS or Stata) are free to use them, but the instructor may be unable to assist with specific problems.

## Prerequisites

Participants should be familiar with basic algebra and have a grasp of introductory statistics, incl. descriptive statistics, hypothesis testing, and bivariate regression. Prior experience with R or knowledge of matrix algebra are neither assumed nor required.

## Requirements

Participants are expected to bring a WiFi-enabled laptop computer. Access to data, temporary licenses for the course software, and installation support will be provided by the Methods School.