Materials and setup

Laptop users: You should have R installed; if not:

  1. Open a web browser and go to http://cran.r-project.org and download and install it

  2. Also helpful to install RStudio (download from http://rstudio.com)

  3. In R, type install.packages("tidyverse") to install a suite of usefull packages including ggplot2

Everyone: Download workshop materials:

  1. Download materials from http://tutorials.iq.harvard.edu/R/Rgraphics.zip

  2. Extract the zip file containing the materials to your desktop

Workshop Overview

Class Structure and Organization:

  • Ask questions at any time. Really!
  • Collaboration is encouraged
  • This is your class! Special requests are encouraged

This is an intermediate R course:

  • Assumes working knowledge of R
  • Relatively fast-paced
  • Focus is on ggplot2 graphics–other packages will not be covered

Starting At The End

My goal: by the end of the workshop you will be able to reproduce this graphic from the Economist:

img

img

Why ggplot2?

Advantages of ggplot2

  • consistent underlying grammar of graphics (Wilkinson, 2005)
  • plot specification at a high level of abstraction
  • very flexible
  • theme system for polishing plot appearance
  • mature and complete graphics system
  • many users, active mailing list

That said, there are some things you cannot (or should not) do With ggplot2:

  • 3-dimensional graphics (see the rgl package)
  • Graph-theory type graphs (nodes/edges layout; see the igraph package)
  • Interactive graphics (see the ggvis package)

What Is The Grammar Of Graphics?

The basic idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformations
  • scales
  • coordinate system
  • position adjustments
  • faceting

Setup: install the tidyverse package

The ggplot2 packages is included in a popular collection of packages called “the tidyverse”. Take a moment to ensure that it is installed, and that we have attached the ggplot2 package.

## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.6
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Example Data: Housing prices

Let’s look at housing prices.

## Parsed with column specification:
## cols(
##   State = col_character(),
##   region = col_character(),
##   Date = col_double(),
##   Home.Value = col_integer(),
##   Structure.Cost = col_integer(),
##   Land.Value = col_integer(),
##   Land.Share..Pct. = col_double(),
##   Home.Price.Index = col_double(),
##   Land.Price.Index = col_double(),
##   Year = col_integer(),
##   Qrtr = col_integer()
## )
## # A tibble: 6 x 5
##   State region  Date Home.Value Structure.Cost
##   <chr> <chr>  <dbl>      <int>          <int>
## 1 AK    West   2010.     224952         160599
## 2 AK    West   2010.     225511         160252
## 3 AK    West   2010.     225820         163791
## 4 AK    West   2010      224994         161787
## 5 AK    West   2008      234590         155400
## 6 AK    West   2008.     233714         157458

ggplot2 VS Base Graphics

Compared to base graphics, ggplot2

  • is more verbose for simple / canned graphics
  • is less verbose for complex / custom graphics
  • does not have methods (data should always be in a data.frame)
  • uses a different system for adding plot elements

ggplot2 VS Base for simple graphs

Base graphics histogram example:

ggplot2 histogram example:

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.