What data should you use?

Jeffrey Leek
Johns Hopkins Bloomberg School of Public Health

A succcessful predictor

Polling data

Weighting the data

Key idea

To predict X use data related to X

Key idea

To predict player performance use data about player performance

Key idea

To predict movie preferences use data about movie preferences

Key idea

To predict hospitalizations use data about hospitalizations

Not a hard rule

Looser connection = harder prediction

Data properties matter

Unrelated data is the most common mistake