Getting and Cleaning Data
- Subsetting example walkthrough
- Apples to Oranges Data Organisation Challenge
- dplyr introductory tutorial and R Markdown document: A 39-minute video tutorial that covers the five basic dplyr “verbs” and a dozen other dplyr functions. dplyr is an update to the plyr package, useful for subsetting, sorting, summarizing, and merging data using a more intuitive syntax than plyr or base R.
- dplyr “going deeper” tutorial and R Markdown document: A 37-minute video tutorial that covers the new functionality in dplyr versions 0.3 and 0.4.
- Downloading files general advice
- Codebook sample
- Second Codebook sample
- Query string (and other fields-within-fields) unrolling
- Pre-processing Excel files before loading them into R
- Codebook template that can be used in the Getting and Cleaning Data project
- “Real world” example - reading American Community Survey 2000 PUMS Data: Demonstrates how to extract records of a given type from a data file containing multiple record types, and how to use an Excel-based code book to specify arguments for reading a fixed-width file.
- 18 Months of CTA advice
- Common Problems: Quiz 1 - Missing Java Runtime Explains how to solve the problem of a missing Java Runtime for the question that requires students to process a Microsoft Excel spreadsheet.
- Strategy for Reading Files & APIs / Quiz 2
- Common Problems: Quiz 2 - sqldf() driver fails to connect
- Tutorial: Downloading Files Illustrates various ways of downloading files, including binary and text files.
- Creating dataframes from xml data
Comprehensive Notes
- Complete notes for Getting and Cleaning Data