Instructions

Questions for you to answer are as quoted blocks of text. Put your code used to address these questions and interpretation below each block.

Getting the data in

Download the data from https://www.dropbox.com/s/ekujui9p0rjtz4r/King2012general-ecanvass.txt?dl=0. It is a plain text file of data, about 60 MB in size. Save it somewhere on your computer, and read the file into R. You will want to use the cache=TRUE chunk option for this (and potentially other chunks).

This file is tab separated (a TSV), so we’ll use the read_tsv function in readr without any fancy options:

library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
king_raw <- read_tsv("King2012general-ecanvass.txt")

Inspecting the data

Describe the data in its current state. How many rows are there? What variables on the data? What kinds of values do they take (don’t list them all if there are many)? Are the column types sensible?

There are 527576 rows and 9 columns, as follows:

str(king_raw)
## Classes 'tbl_df', 'tbl' and 'data.frame':    527576 obs. of  9 variables:
##  $ Precinct    : chr  "ADAIR" "ADAIR" "ADAIR" "ADAIR" ...
##  $ Race        : chr  "Attorney General  partisan office" "Attorney General  partisan office" "Attorney General  partisan office" "Attorney General  partisan office" ...
##  $ LEG         : int  45 45 45 45 45 45 45 5 5 5 ...
##  $ CC          : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ CG          : int  1 1 1 1 1 1 1 8 8 8 ...
##  $ CounterGroup: chr  "Total" "Total" "Total" "Total" ...
##  $ Party       : chr  "Dem" "Rep" "NP" "NP" ...
##  $ CounterType : chr  "Bob Ferguson" "Reagan Dunn" "Registered Voters" "Times Blank Voted" ...
##  $ SumOfCount  : int  367 418 856 24 809 0 0 283 296 733 ...

The columns types for each of these look pretty reasonable:

Precinct

Precinct has a precinct identifier, which is sometimes a city or neighborhood name, sometimes some text followed by an ID number:

king_raw %>%
    select(Precinct) %>%
    distinct(Precinct) %>%
    head(10)
## Source: local data frame [10 x 1]
## 
##         Precinct
##            (chr)
## 1          ADAIR
## 2        ALDARRA
## 3  ALDER SPRINGS
## 4      ALDERWOOD
## 5    ALG 30-0013
## 6    ALG 30-0014
## 7    ALG 30-3141
## 8         ALPINE
## 9      AMES LAKE
## 10    ANGEL CITY
king_raw %>%
    select(Precinct) %>%
    distinct(Precinct) %>%
    tail(10)
## Source: local data frame [10 x 1]
## 
##       Precinct
##          (chr)
## 1  WOD 45-3192
## 2  WOD 45-3193
## 3  WOD 45-3270
## 4  WOD 45-3530
## 5         WOLF
## 6     WOODSIDE
## 7      WYNOCHE
## 8       WYNONA
## 9  YPT 48-1233
## 10 YPT 48-1234
precinct_count <- king_raw %>%
    select(Precinct) %>%
    distinct(Precinct) %>%
    tally() %>%
    as.numeric()

There are 2499 distinct values appearing the Precinct column. Sounds about right for King County?

Race

Race appears to contain the specific races, some of which are positions candidates are running for, some of which are local propositions, and some are miscellaneous Tim Eyman ballot bloating garbage (“Advisory Vote of the People”):

king_raw %>%
    select(Race) %>%
    distinct(Race) %>%
    head(10)
## Source: local data frame [10 x 1]
## 
##                                                                                                        Race
##                                                                                                       (chr)
## 1                                                                         Attorney General  partisan office
## 2                  Auburn School District No. 408 Proposition No. 1 General Obligation Bonds - $110,000,000
## 3                                  City of Black Diamond Proposition No. 1 Change in the Plan of Government
## 4                                     City of Kent Proposition No. 1 Levy to Support Kent Parks and Streets
## 5                 City of Kirkland Proposition No. 1 Levy for City Street Maintenance and Pedestrian Safety
## 6           City of Kirkland Proposition No. 2 Levy for City Parks Maintenance, Restoration and Enhancement
## 7  City of Mercer Island Proposition No. 1 Nine Year Levy Lid Lift for City Fire Station and Fire Apparatus
## 8                                                 City of Normandy Park Proposition No. 1 Property Tax Rate
## 9                       City of Sammamish Proposition No. 1 Advisory Vote on Multi-Purpose Community Center
## 10          City of Seattle Proposition No. 1 General Obligation Bonds - $290,000,000 (Alaskan Way Seawall)
king_raw %>%
    select(Race) %>%
    distinct(Race) %>%
    tail(10)
## Source: local data frame [10 x 1]
## 
##                                                                                             Race
##                                                                                            (chr)
## 1                     Advisory Vote of the People Advisory Vote No. 2 Substitute House Bill 2590
## 2                                       Superintendent of Public Instruction  nonpartisan office
## 3                                        Superior Court Judge Position No. 42 nonpartisan office
## 4                      United States Representative Congressional District No. 1 partisan office
## 5  United States Representative Congressional District No. 1 partisan office, 1 month short term
## 6                      United States Representative Congressional District No. 7 partisan office
## 7                      United States Representative Congressional District No. 8 partisan office
## 8                      United States Representative Congressional District No. 9 partisan office
## 9                                                                             congressional Dist
## 10                                                        United States Senator  partisan office

LEG, CC, CG

LEG, CC, and CG appear to be the numbers for legislative district, King County Council district, and Congressional district. We could argue that LEG, CC, and CG should be converted to character because their numerical values are irrelevant, but we’re not planning on using this information, so nothing bad will happen by being lazy and not changing it from integer to character. I could say col_types = "cccccccci" when using read_tsv, though, to get all the column types perfect.

We note these values are missing on nearly 200 rows:

king_raw %>%
    select(LEG, CC, CG) %>%
    summary()
##       LEG              CC              CG       
##  Min.   : 1.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.:32.00   1st Qu.:3.000   1st Qu.:7.000  
##  Median :37.00   Median :5.000   Median :7.000  
##  Mean   :34.86   Mean   :4.835   Mean   :6.955  
##  3rd Qu.:45.00   3rd Qu.:7.000   3rd Qu.:9.000  
##  Max.   :48.00   Max.   :9.000   Max.   :9.000  
##  NA's   :188     NA's   :188     NA's   :188

Let’s look at a sample:

king_raw %>%
    filter(is.na(LEG)) %>%
    head(10)
## Source: local data frame [10 x 9]
## 
##            Precinct                                          Race   LEG
##               (chr)                                         (chr) (int)
## 1  ELECTIONS OFFICE             Attorney General  partisan office    NA
## 2  ELECTIONS OFFICE             Attorney General  partisan office    NA
## 3  ELECTIONS OFFICE             Attorney General  partisan office    NA
## 4  ELECTIONS OFFICE             Attorney General  partisan office    NA
## 5  ELECTIONS OFFICE             Attorney General  partisan office    NA
## 6  ELECTIONS OFFICE             Attorney General  partisan office    NA
## 7  ELECTIONS OFFICE             Attorney General  partisan office    NA
## 8  ELECTIONS OFFICE Commissioner of Public Lands  partisan office    NA
## 9  ELECTIONS OFFICE Commissioner of Public Lands  partisan office    NA
## 10 ELECTIONS OFFICE Commissioner of Public Lands  partisan office    NA
##       CC    CG CounterGroup Party       CounterType SumOfCount
##    (int) (int)        (chr) (chr)             (chr)      (int)
## 1     NA    NA        Total   Dem      Bob Ferguson         11
## 2     NA    NA        Total   Rep       Reagan Dunn          3
## 3     NA    NA        Total    NP Registered Voters         43
## 4     NA    NA        Total    NP Times Blank Voted          2
## 5     NA    NA        Total    NP     Times Counted         16
## 6     NA    NA        Total    NP  Times Over Voted          0
## 7     NA    NA        Total    NP          Write-in          0
## 8     NA    NA        Total   Rep      Clint Didier          3
## 9     NA    NA        Total   Dem Peter J. Goldmark         11
## 10    NA    NA        Total    NP Registered Voters         43

Interesting. It looks like there’s a “precinct” called “ELECTIONS OFFICE” which has some counts listed, but sure doesn’t sound like a real precinct. We could think about dropping rows for this “precinct” since it seems like might be a mistake, though it won’t make a big difference.

CounterGroup

CounterGroup only has one value (Total) and is completely, utterly useless. A waste of a column!

king_raw %>%
    select(CounterGroup) %>%
    distinct(CounterGroup)
## Source: local data frame [1 x 1]
## 
##   CounterGroup
##          (chr)
## 1        Total

Party

Party contains values for the political parties involved in each race:

king_raw %>%
    group_by(Party) %>%
    tally() %>%
    arrange(desc(n))
## Source: local data frame [17 x 2]
## 
##    Party      n
##    (chr)  (int)
## 1     NP 446582
## 2    Dem  29382
## 3    Rep  28405
## 4    Dcr   2501
## 5    CPN   2499
## 6    DPN   2499
## 7    GPN   2499
## 8    JPN   2499
## 9    LPN   2499
## 10   RPN   2499
## 11   SPN   2499
## 12   SWN   2499
## 13   RHC    238
## 14    SA    207
## 15   PAR    168
## 16    ID     53
## 17   NOP     48

We see quite a few different values here. Dem and Rep stand out as being Democrats and Republicans, and NP is so common that it must mean “no party” or “non-partisan”. Most of the other values have exactly the same number of values as there are distinct precincts, so these are probably Presidential or Senate candidates from third parties if every precinct is voting for them.

CounterType

CounterType appears to contain a mix of candidate names or position votes (e.g. “Bob Ferguson”, “Approved”, “No”) and overall summaries for the particular race in the precinct (e.g. “Registered Voters”, “Times Blank Voted”, “Times Counted”).

I would say this column has a lot of the info we want, and that it is currently structured “too long” since all of these are different rows corresponding to the same precinct for the same race.

king_raw %>%
    select(CounterType) %>%
    distinct() %>%
    head(20)
## Source: local data frame [20 x 1]
## 
##          CounterType
##                (chr)
## 1       Bob Ferguson
## 2        Reagan Dunn
## 3  Registered Voters
## 4  Times Blank Voted
## 5      Times Counted
## 6   Times Over Voted
## 7           Write-in
## 8           Approved
## 9           Rejected
## 10                No
## 11               Yes
## 12      Clint Didier
## 13 Peter J. Goldmark
## 14     Ronald E. Cox
## 15  Marlin Appelwick
## 16    Johanna Bender
## 17        Maintained
## 18          Repealed
## 19        Jay Inslee
## 20       Rob McKenna

SumOfCount

SumOfCount appears to be just counts of votes (or registered voters) associated with CounterType. This is the numeric information we want to use, and there are no missing values:

king_raw %>%
    select(SumOfCount) %>%
    summary()
##    SumOfCount    
##  Min.   :   0.0  
##  1st Qu.:   2.0  
##  Median : 124.0  
##  Mean   : 186.8  
##  3rd Qu.: 310.0  
##  Max.   :1218.0

The quantities of interest

We are interested in turnout rates for each of these races in each precinct. We will measure turnout as times votes were counted (including for a candidate, blank, write-in, or “over vote”) out of registered voters.

We are also interested in differences between precincts in Seattle and precincts elsewhere in King County. Again, these data are not documented, so you will have to figure out how to do this.

Finally, we will want to look at precinct-level support for the Democratic candidates in King County in 2012 for the following contests:

  • President (and Vice-President)
  • Governor
  • Lieutenant Governor

We will measure support as the percentage of votes in a precinct for the Democratic candidate out of all votes for candidates or write-ins. Do not include blank votes or “over votes” (where the voter indicated multiple choices) in the overall vote count for the denominator.

Use dplyr, tidyr, or any other tools you like to get the data to one row per precinct with the following columns (at minimum):

  • Precinct identifier
  • Indicator for whether the precinct is in Seattle or not
  • Precinct size in terms of registered voters
  • Turnout rate
  • Percentage Democratic support for President
  • Percentage Democratic support for Governor
  • Percentage Democratic support for Lieutenant Governor

Filtering down the data

For what we want to do, there are a lot of rows that are not useful. We only want ones pertaining to races for President, Governor, and Lieutenant Governor. So let’s trim everything down. How do these things show up in the data? Eyeballing time!

# info on the distinct races
races <- king_raw %>%
    select(Race) %>%
    distinct(Race) %>%
    arrange(Race)
# print it out as a character vector
as.character(races$Race)
##  [1] "Advisory Vote of the People Advisory Vote No. 1 Engrossed Senate Bill 6635"                                              
##  [2] "Advisory Vote of the People Advisory Vote No. 2 Substitute House Bill 2590"                                              
##  [3] "Attorney General  partisan office"                                                                                       
##  [4] "Auburn School District No. 408 Proposition No. 1 General Obligation Bonds - $110,000,000"                                
##  [5] "City of Black Diamond Proposition No. 1 Change in the Plan of Government"                                                
##  [6] "City of Kent Proposition No. 1 Levy to Support Kent Parks and Streets"                                                   
##  [7] "City of Kirkland Proposition No. 1 Levy for City Street Maintenance and Pedestrian Safety"                               
##  [8] "City of Kirkland Proposition No. 2 Levy for City Parks Maintenance, Restoration and Enhancement"                         
##  [9] "City of Mercer Island Proposition No. 1 Nine Year Levy Lid Lift for City Fire Station and Fire Apparatus"                
## [10] "City of Normandy Park Proposition No. 1 Property Tax Rate"                                                               
## [11] "City of Sammamish Proposition No. 1 Advisory Vote on Multi-Purpose Community Center"                                     
## [12] "City of Seattle Proposition No. 1 General Obligation Bonds - $290,000,000 (Alaskan Way Seawall)"                         
## [13] "City of Shoreline Proposition No. 1 Acquisition and Local Control of Seattle Water Services in Shoreline"                
## [14] "City of Snoqualmie Proposition No. 1 Public Safety Operations, Streets and Parks Maintenance Levy"                       
## [15] "Commissioner of Public Lands  partisan office"                                                                           
## [16] "congressional Dist"                                                                                                      
## [17] "Court of Appeals, Division No. 1, District No. 1 Judge Position No. 4 nonpartisan office"                                
## [18] "Court of Appeals, Division No. 1, District No. 1 Judge Position No. 7 nonpartisan office"                                
## [19] "District Court West Electoral District Judge Position No. 1 nonpartisan office, unexpired 2-year term"                   
## [20] "Federal Way School District No. 210 Proposition No. 1 Capital Projects Levy"                                             
## [21] "Governor  partisan office"                                                                                               
## [22] "Insurance Commissioner  partisan office"                                                                                 
## [23] "King County Fire Protection District No. 20 Proposition No. 1 Levy of General Tax for Maintenance and Operations"        
## [24] "King County Fire Protection District No. 45 Proposition No. 1 Levy of General Tax for Maintenance and Operations"        
## [25] "King County Proposition No. 1 Regular Property Tax Levy for Automated Fingerprint Identification System (AFIS) Services" 
## [26] "Lieutenant Governor  partisan office"                                                                                    
## [27] "Passed by the Legislature and Ordered Referred by Petition Referendum Measure No. 74"                                    
## [28] "President and Vice President of the United States  partisan office"                                                      
## [29] "Proposed by Initiative Petition Initiative Measure No. 1185"                                                             
## [30] "Proposed by Initiative Petition Initiative Measure No. 1240"                                                             
## [31] "Proposed North Highline Area  \"Y\" Annexation Area Proposition No. 1 Proposed Annexation to the City of Burien"         
## [32] "Proposed to the People by the Legislature Amendment to the State Constituion Senate Joint Resolution No. 8223"           
## [33] "Proposed to the People by the Legislature Amendment to the State Constitution Engrossed Senate Joint Resolution No. 8221"
## [34] "Proposed West Hill Annexation Area Proposition No. 1 Proposed Annexation to the City of Renton"                          
## [35] "Propsed to the Legislature and Referred to the People Initiative Measure No. 502"                                        
## [36] "Secretary of State  partisan office"                                                                                     
## [37] "Sheriff  nonpartisan office, unexpired 1-year term"                                                                      
## [38] "Si View Metropolitan Park District Proposition No. 1 One-Year Operations and Maintenance Levy"                           
## [39] "State Auditor  partisan office"                                                                                          
## [40] "State Representative Legislative Dist No. 1 - Position 1 partisan office"                                                
## [41] "State Representative Legislative Dist No. 1 - Position 2 partisan office"                                                
## [42] "State Representative Legislative Dist No. 11 - Position 1 partisan office"                                               
## [43] "State Representative Legislative Dist No. 11 - Position 2 partisan office"                                               
## [44] "State Representative Legislative Dist No. 30 - Position 1 partisan office"                                               
## [45] "State Representative Legislative Dist No. 30 - Position 2 partisan office"                                               
## [46] "State Representative Legislative Dist No. 31 - Position 1 partisan office"                                               
## [47] "State Representative Legislative Dist No. 31 - Position 2 partisan office"                                               
## [48] "State Representative Legislative Dist No. 32 - Position 1 partisan office"                                               
## [49] "State Representative Legislative Dist No. 32 - Position 2 partisan office"                                               
## [50] "State Representative Legislative Dist No. 33 - Position 1 partisan office"                                               
## [51] "State Representative Legislative Dist No. 33 - Position 2 partisan office"                                               
## [52] "State Representative Legislative Dist No. 34 - Position 1 partisan office"                                               
## [53] "State Representative Legislative Dist No. 34 - Position 2 partisan office"                                               
## [54] "State Representative Legislative Dist No. 36 - Position 1 partisan office"                                               
## [55] "State Representative Legislative Dist No. 36 - Position 2 partisan office"                                               
## [56] "State Representative Legislative Dist No. 37 - Position 1 partisan office"                                               
## [57] "State Representative Legislative Dist No. 37 - Position 2 partisan office"                                               
## [58] "State Representative Legislative Dist No. 39 - Position 1 partisan office"                                               
## [59] "State Representative Legislative Dist No. 39 - Position 2 partisan office"                                               
## [60] "State Representative Legislative Dist No. 41 - Position 1 partisan office"                                               
## [61] "State Representative Legislative Dist No. 41 - Position 2 partisan office"                                               
## [62] "State Representative Legislative Dist No. 43 - Position 1 partisan office"                                               
## [63] "State Representative Legislative Dist No. 43 - Position 2 partisan office"                                               
## [64] "State Representative Legislative Dist No. 45 - Position 1 partisan office"                                               
## [65] "State Representative Legislative Dist No. 45 - Position 2 partisan office"                                               
## [66] "State Representative Legislative Dist No. 46 - Position 1 partisan office, short and full term"                          
## [67] "State Representative Legislative Dist No. 46 - Position 2 partisan office"                                               
## [68] "State Representative Legislative Dist No. 47 - Position 1 partisan office"                                               
## [69] "State Representative Legislative Dist No. 47 - Position 2 partisan office"                                               
## [70] "State Representative Legislative Dist No. 48 - Position 1 partisan office"                                               
## [71] "State Representative Legislative Dist No. 48 - Position 2 partisan office"                                               
## [72] "State Representative Legislative Dist No. 5 - Position 1 partisan office"                                                
## [73] "State Representative Legislative Dist No. 5 - Position 2 partisan office"                                                
## [74] "State Senator Legislative Dist No. 1 partisan office"                                                                    
## [75] "State Senator Legislative Dist No. 11 partisan office"                                                                   
## [76] "State Senator Legislative Dist No. 39 partisan office"                                                                   
## [77] "State Senator Legislative Dist No. 41 partisan office"                                                                   
## [78] "State Senator Legislative Dist No. 46 partisan office, unexpired 2-year term"                                            
## [79] "State Senator Legislative Dist No. 5 partisan office"                                                                    
## [80] "State Supreme Court Justice Position No. 2 nonpartisan office"                                                           
## [81] "State Supreme Court Justice Position No. 8 nonpartisan office, short and full term"                                      
## [82] "State Supreme Court Justice Position No. 9 nonpartisan office"                                                           
## [83] "State Treasurer  partisan office"                                                                                        
## [84] "Superintendent of Public Instruction  nonpartisan office"                                                                
## [85] "Superior Court Judge Position No. 42 nonpartisan office"                                                                 
## [86] "United States Representative Congressional District No. 1 partisan office"                                               
## [87] "United States Representative Congressional District No. 1 partisan office, 1 month short term"                           
## [88] "United States Representative Congressional District No. 7 partisan office"                                               
## [89] "United States Representative Congressional District No. 8 partisan office"                                               
## [90] "United States Representative Congressional District No. 9 partisan office"                                               
## [91] "United States Senator  partisan office"

Governor partisan office, Lieutenant Governor partisan office, and President and Vice President of the United States partisan office are the ones we want. Note there are double spaces in here before “partisan office”! These are in positions 21, 26, and 28 of my sorted races output, respectively, so I will make a character vector holding those values specifically for easier subsetting. Never type more than you have to!

# make a character vector of relevant races
(rel_races <- races$Race[c(21, 26, 28)])
## [1] "Governor  partisan office"                                         
## [2] "Lieutenant Governor  partisan office"                              
## [3] "President and Vice President of the United States  partisan office"
# subset the data to relevant races
king_rel_races <- king_raw %>%
    filter(Race %in% rel_races)

Seattle precincts

How can I figure out which precincts are in Seattle? I’m going to make a dataset with the whole list and eyeball it to see if anything jumps out.

precincts <- king_rel_races %>%
    select(Precinct) %>%
    distinct(Precinct) %>%
    arrange(Precinct)

Scrolling to the “S” section, it looks like Seattle precincts all start with SEA followed by a space and a precinct number. Looking at a map on the King County website and zooming in enough to see the precinct numbers confirms it. Precincts near but not in Seattle like in Shoreline to the north or Tukwila to the south have a different naming system. Thus, I am confident that identifying Seattle precincts as those whose first four characters are SEA will work to flag those. Just the three characters SEA on its own won’t – there are precincts called SEALTH, SEAN, and SEAVIEW we don’t want to flag as in Seattle.

One way to proceed is to use the substr function (seen in Week 4 when checking if the second letter of some first names was “a”) to pull out the first four characters of Precinct and check if they are equal to SEA.

king_flag <- king_rel_races %>%
    mutate(Location = ifelse(substr(Precinct, start = 1, stop = 4) == "SEA ",
                             "Seattle",
                             "Not Seattle"))

An alternative way is to use the separate function in tidyr (which we used to take a character representation of song length and split it into minutes and seconds). We could split these precincts at the first space and then check if the stuff in the first part of the split says SEA:

king_flag_alt <- king_rel_races %>%
    separate(Precinct, into = c("part1", "part2"), sep = " ") %>%
    mutate(Location = ifelse(part1 == "SEA",
                             "Seattle",
                             "Not Seattle"))
## Warning: Too few values at 7101 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
## 11, 12, 13, 14, 22, 23, 24, 25, 26, 27, ...

Note that we get a warning message when doing this with separate because some precincts only have one word in them, so there is no space to split on. This is fine. (We’ll see more ways to match text patterns in Week 8.)

Sanity check: do we get the same answer either way? Let’s sum how many times the approaches disagree.

sum(king_flag$Location != king_flag_alt$Location)
## [1] 0

Both ways give the same answers, so we’re all good!

Registered voters and turnout rates

We want to calculate turnout rates as total votes (including normal votes, blank votes, over votes, write-ins) for the Presidential race divided by registered voters.

First, I see there is a value in CounterType called “Times Counted”. It would be nice if this was the numerator we were after. I’m going to check this by summing SumOfCount up within each precinct and race of interest over all the rows besides where CounterType is “Registered Voters” or “Times Counted”. Then I’ll compare these to the “Times Counted” rows. We’ll use joins to do this:

# sum over rows besides "Registered Voters" or "Times Counted"
# within each precinct and race
times_counted_manual <- king_flag %>%
    select(Precinct, Race, CounterType, SumOfCount) %>%
    filter(CounterType != "Registered Voters" & CounterType != "Times Counted") %>%
    group_by(Precinct, Race) %>%
    summarize(votes_added_up = sum(SumOfCount))
head(times_counted_manual)
## Source: local data frame [6 x 3]
## Groups: Precinct [2]
## 
##   Precinct
##      (chr)
## 1    ADAIR
## 2    ADAIR
## 3    ADAIR
## 4  ALDARRA
## 5  ALDARRA
## 6  ALDARRA
##                                                                 Race
##                                                                (chr)
## 1                                          Governor  partisan office
## 2                               Lieutenant Governor  partisan office
## 3 President and Vice President of the United States  partisan office
## 4                                          Governor  partisan office
## 5                               Lieutenant Governor  partisan office
## 6 President and Vice President of the United States  partisan office
##   votes_added_up
##            (int)
## 1            809
## 2            809
## 3            809
## 4            625
## 5            625
## 6            625
# now just grab the "Times Counted" rows and merge
times_counted_compare <- king_flag %>%
    select(Precinct, Race, CounterType, SumOfCount) %>%
    filter(CounterType == "Times Counted") %>%
    # rename the column on filtered data for clarity
    rename(times_counted_value = SumOfCount) %>%
    left_join(times_counted_manual,
              by = c("Precinct", "Race")) %>%
    # compute differences
    mutate(diff = times_counted_value - votes_added_up)

summary(times_counted_compare$diff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0       0

They’re always the same! That means “Times Counted” is including every possible kind of vote for each race, such as blanks, write-ins, accidental over-votes, or your usual ones. Now we can make a data frame that has registered voters and turnout rates (for the Presidential race) for each precinct:

turnout_rates <- king_flag %>%
    # filter to just the presidential election
    filter(Race == rel_races[3]) %>%
    # filter to just registered voters or times counted
    filter(CounterType %in% c("Registered Voters", "Times Counted")) %>%
    # just the columns we want
    select(Precinct, Location, CounterType, SumOfCount) %>%
    # use spread to put the two counts on the same row for each precinct
    spread(key = CounterType, value = SumOfCount) %>%
    # use new columns to compute turnout rate
    mutate(Turnout = `Times Counted` / `Registered Voters`)
head(turnout_rates)
## Source: local data frame [6 x 5]
## 
##        Precinct    Location Registered Voters Times Counted   Turnout
##           (chr)       (chr)             (int)         (int)     (dbl)
## 1         ADAIR Not Seattle               856           809 0.9450935
## 2       ALDARRA Not Seattle               733           625 0.8526603
## 3 ALDER SPRINGS Not Seattle               545           469 0.8605505
## 4     ALDERWOOD Not Seattle               474           414 0.8734177
## 5   ALG 30-0013 Not Seattle               467           334 0.7152034
## 6   ALG 30-0014 Not Seattle               483           353 0.7308489

Democratic support rates

You are asked to measure support as the percentage of votes in a precinct for the Democratic candidate out of all votes for candidates or write-ins, but this time not to include blank votes or “over votes” (where the voter indicated multiple choices) in the overall vote count for the denominator.

A good approach here is to compute the denominator, and then merge on the Democratic vote count and divide.

Computing candidate votes

We want one row per precinct per race with the total number of votes for a person or write-in. I observe that for the races of interest, the proper candidates all have rows where Party is not NP:

king_flag %>%
    select(Race, Party, CounterType) %>%
    distinct()
## Source: local data frame [27 x 3]
## 
##                                    Race Party       CounterType
##                                   (chr) (chr)             (chr)
## 1             Governor  partisan office   Dem        Jay Inslee
## 2             Governor  partisan office    NP Registered Voters
## 3             Governor  partisan office   Rep       Rob McKenna
## 4             Governor  partisan office    NP Times Blank Voted
## 5             Governor  partisan office    NP     Times Counted
## 6             Governor  partisan office    NP  Times Over Voted
## 7             Governor  partisan office    NP          Write-in
## 8  Lieutenant Governor  partisan office   Rep   Bill Finkbeiner
## 9  Lieutenant Governor  partisan office   Dcr         Brad Owen
## 10 Lieutenant Governor  partisan office    NP Registered Voters
## ..                                  ...   ...               ...

I could keep those rows or rows for “Write-in” and that would be perfect.

candidate_vote_rows <- king_flag %>%
    # keep just not NP rows, or write-in rows
    filter(Party != "NP" | CounterType == "Write-in") %>%
    select(Precinct, Location, Race, Party, CounterType, SumOfCount)

# sum over all votes for candidates within a precinct and race
total_candidate_votes <- candidate_vote_rows %>%
    group_by(Precinct, Location, Race) %>%
    summarize(total_candidate_votes  = sum(SumOfCount))

Let’s look at how to pull up the Democrat rows specifically:

candidate_vote_rows %>%
    select(Race, Party, CounterType) %>%
    distinct()
## Source: local data frame [15 x 3]
## 
##                                                                  Race
##                                                                 (chr)
## 1                                           Governor  partisan office
## 2                                           Governor  partisan office
## 3                                           Governor  partisan office
## 4                                Lieutenant Governor  partisan office
## 5                                Lieutenant Governor  partisan office
## 6                                Lieutenant Governor  partisan office
## 7  President and Vice President of the United States  partisan office
## 8  President and Vice President of the United States  partisan office
## 9  President and Vice President of the United States  partisan office
## 10 President and Vice President of the United States  partisan office
## 11 President and Vice President of the United States  partisan office
## 12 President and Vice President of the United States  partisan office
## 13 President and Vice President of the United States  partisan office
## 14 President and Vice President of the United States  partisan office
## 15 President and Vice President of the United States  partisan office
##    Party                                    CounterType
##    (chr)                                          (chr)
## 1    Dem                                     Jay Inslee
## 2    Rep                                    Rob McKenna
## 3     NP                                       Write-in
## 4    Rep                                Bill Finkbeiner
## 5    Dcr                                      Brad Owen
## 6     NP                                       Write-in
## 7    DPN                     Barack Obama and Joe Biden
## 8    LPN                 Gary Johnson and James P. Gray
## 9    SWN                James Harris and Alyson Kennedy
## 10   GPN                   Jill Stein and Cheri Honkala
## 11   RPN                      Mitt Romney and Paul Ryan
## 12   SPN                   Peta Lindsey and Yari Osorio
## 13   JPN Ross C. (Rocky) Anderson and Luis J. Rodriquez
## 14   CPN               Virgil Goode and James N. Clymer
## 15    NP                                       Write-in

Interesting. For Governor, the Democrat candidate has Party of "Dem". For Lieutenant Governor, it’s "Dcr", and for President, it’s "DPN". Life is a rich tapestry! I’ll count as Democratic votes anything that is "Dem", "Dcr", or "DPN".

# subset to votes for Democrat candidate
democratic_vote_rows <- candidate_vote_rows %>%
    filter(Party %in% c("Dem", "Dcr", "DPN")) %>%
    select(Precinct, Location, Race, SumOfCount) %>%
    # rename the count to be informative
    rename(dem_votes = SumOfCount)

Now we can merge by precinct and race and do the math:

democrat_vote_rates <- democratic_vote_rows %>%
    left_join(total_candidate_votes,
              by = c("Precinct", "Location", "Race")) %>%
    mutate(`Democrat support` = dem_votes / total_candidate_votes) %>%
    select(Precinct, Location, Race, `Democrat support`)
head(democrat_vote_rates)
## Source: local data frame [6 x 4]
## 
##        Precinct    Location                      Race Democrat support
##           (chr)       (chr)                     (chr)            (dbl)
## 1         ADAIR Not Seattle Governor  partisan office        0.4134496
## 2       ALDARRA Not Seattle Governor  partisan office        0.4377049
## 3 ALDER SPRINGS Not Seattle Governor  partisan office        0.4251627
## 4     ALDERWOOD Not Seattle Governor  partisan office        0.4137931
## 5   ALG 30-0013 Not Seattle Governor  partisan office        0.5382263
## 6   ALG 30-0014 Not Seattle Governor  partisan office        0.5714286

Combining it all

We have registered voters and turnout in turnout_rates, and Democratic candidate support rates in democrat_vote_rates. Now we merge using left_join:

precinct_data <- turnout_rates %>%
    left_join(democrat_vote_rates,
              by = c("Precinct", "Location"))
head(precinct_data)
## Source: local data frame [6 x 7]
## 
##   Precinct    Location Registered Voters Times Counted   Turnout
##      (chr)       (chr)             (int)         (int)     (dbl)
## 1    ADAIR Not Seattle               856           809 0.9450935
## 2    ADAIR Not Seattle               856           809 0.9450935
## 3    ADAIR Not Seattle               856           809 0.9450935
## 4  ALDARRA Not Seattle               733           625 0.8526603
## 5  ALDARRA Not Seattle               733           625 0.8526603
## 6  ALDARRA Not Seattle               733           625 0.8526603
##                                                                 Race
##                                                                (chr)
## 1                                          Governor  partisan office
## 2                               Lieutenant Governor  partisan office
## 3 President and Vice President of the United States  partisan office
## 4                                          Governor  partisan office
## 5                               Lieutenant Governor  partisan office
## 6 President and Vice President of the United States  partisan office
##   Democrat support
##              (dbl)
## 1        0.4134496
## 2        0.5031686
## 3        0.5155666
## 4        0.4377049
## 5        0.4595070
## 6        0.5129032

We can make this wide using spread and clean up the names a bit:

wide_precinct_data <- precinct_data %>%
    spread(key = Race, value = `Democrat support`) %>%
    rename(Governor = `Governor  partisan office`,
           `Lt. Governor` = `Lieutenant Governor  partisan office`,
           President = `President and Vice President of the United States  partisan office`)
wide_precinct_data
## Source: local data frame [2,499 x 8]
## 
##         Precinct    Location Registered Voters Times Counted   Turnout
##            (chr)       (chr)             (int)         (int)     (dbl)
## 1          ADAIR Not Seattle               856           809 0.9450935
## 2        ALDARRA Not Seattle               733           625 0.8526603
## 3  ALDER SPRINGS Not Seattle               545           469 0.8605505
## 4      ALDERWOOD Not Seattle               474           414 0.8734177
## 5    ALG 30-0013 Not Seattle               467           334 0.7152034
## 6    ALG 30-0014 Not Seattle               483           353 0.7308489
## 7    ALG 30-3141 Not Seattle               518           359 0.6930502
## 8         ALPINE Not Seattle               578           459 0.7941176
## 9      AMES LAKE Not Seattle               823           725 0.8809235
## 10    ANGEL CITY Not Seattle               697           540 0.7747489
## ..           ...         ...               ...           ...       ...
##     Governor Lt. Governor President
##        (dbl)        (dbl)     (dbl)
## 1  0.4134496    0.5031686 0.5155666
## 2  0.4377049    0.4595070 0.5129032
## 3  0.4251627    0.4580499 0.5118280
## 4  0.4137931    0.4201031 0.5254237
## 5  0.5382263    0.6398714 0.6186186
## 6  0.5714286    0.6109422 0.6534091
## 7  0.5085227    0.5589124 0.5502793
## 8  0.4618834    0.4903382 0.5350877
## 9  0.4892086    0.5276923 0.5819444
## 10 0.7655039    0.7718941 0.8142589
## ..       ...          ...       ...

Graphing the results

Turnout

Make a scatterplot where the horizontal axis is number of registered voters in the precinct, and the vertical axis is turnout rate. Color the precincts in Seattle one color, and use a different color for other precincts. Do you observe anything?

ggplot(data = wide_precinct_data,
       aes(x = `Registered Voters`, y = 100*Turnout,
           color = Location, group = Location)) +
    geom_point(alpha = 0.4, size = 1) +
    geom_smooth() +
    scale_y_continuous(breaks=seq(0, 100, 10)) +
    scale_color_manual(values = c("orange", "navyblue")) +
    ggtitle("Turnout rates by precinct in King County, 2012") +
    ylab("Turnout\n(% of registered voters voting in Presidential race)") +
    theme_bw()
## Warning: Removed 15 rows containing non-finite values (stat_smooth).
## Warning: Removed 15 rows containing missing values (geom_point).

I’m getting warnings when plotting because of precincts with zero registered voters whose turnout rate is NaN (coming from division by zero), which isn’t a big deal. I’ve also superimposed a smooth trend line for each location to see the average relationship between registered voters and turnout rates.

You can see a couple of weird points for turnouts in excess of 100% for very, very small precincts. (Sound the alarms and call the voter fraud police!) We can also see that Seattle precincts are just bigger, with almost all above 250 registered voters, while precincts can be quite a bit smaller elsewhere in King County.

It looks like within Seattle, there is a slight negative relationship between the number of registered voters in a precinct and the proportion of whom actually vote in the Presidential race. The trend is instead flat-to-slightly-increasing for precincts outside of Seattle. However, for precincts of the same size (in terms of registered voters), Seattle precincts actually had slightly higher turnout rates on average than non-Seattle precincts.

The overall level of turnout seems pretty impressive, with many not-too-small precincts having rates of 80% or higher. However, keep in mind this is just calculated out of registered voters. We would need to use Census data and do something much more sophisticated if we wanted to account for all eligible voters residing in King County who are not registered.

Democratic support

Now let’s visualize the Democratic support rates for the three races within each precinct for sufficently large precincts. Limit the data to precincts with at least 500 registered voters. Make a line plot where the horizontal axis indicates precincts, and the vertical axis shows the Democratic support rates. There should be three lines in different colors (one for each race of interest).

Do not label the precincts on the horizontal axis (you will probably have to search to figure out how). You should, however, arrange them on the axis in order from smallest to largest in terms of support for the Democratic candidate for president — that is, the line plotting percentage support for Obama should be smoothly increasing from left to right. (Hint: you will probably want to add a new column to the data giving the order to plot these in.) The order of the lines in the legend should follow the order of the lines at the right edge of the plot.

To do this, we need to use the “wide” version of the data (one row per precinct), and make a new variable called Order giving the order to plot these in based on Democratic support for the Presidential race. We’ll sort this data and then make the column to get it right.

# subset the data to big precincts
big_precincts <- wide_precinct_data %>%
    filter(`Registered Voters` >= 500)

# reorder data in terms of support for Obama
big_precincts <- big_precincts %>%
    arrange(President) %>%
    # now make a column for the precinct plotting order
    mutate(Order = row_number())

# alternative solution to make plotting order
# using base R on the sorted data:
# big_precincts$Order <- 1:nrow(big_precincts)

Then we can reshape back from “wide” to “tidy” form using gather so that we have one variable giving the race and can plot a separate line for each.

tidy_big_precinct_data <- big_precincts %>%
    gather(key = Race, value = `Democrat support`,
           # rotate down the columns for each race
           Governor, `Lt. Governor`, `President`)

Next, I’m going to set the order of the lines in the legend. To do this, I need to pull out the rows for precinct that will be plotted on the right edge of the graph, which is the precinct with the heaviest Obama support among big precincts, aka the one with the maximum value of Order. Then, I’ll find the Democrat support rates for the three offices in that precinct, and take those in descending order to get the order of the lines for the legend. I use the factor function like we saw in Week 5 to change the order.

# find the order of the lines using precinct at the right of the graph:
order_of_lines <- big_precincts %>%
    # keep the most Obama supporting precinct
    filter(Order == max(Order)) %>%
    select(Precinct) %>%
    # merge it on to tidy_big_precinct_data data to get %s for each race
    left_join(tidy_big_precinct_data %>%
                  select(Precinct, Race, `Democrat support`),
              by = "Precinct") %>%
    # order the races from most dem support to least
    arrange(desc(`Democrat support`))
order_of_lines
## Source: local data frame [3 x 3]
## 
##      Precinct         Race Democrat support
##         (chr)        (chr)            (dbl)
## 1 SEA 43-1855    President        0.9464286
## 2 SEA 43-1855     Governor        0.9341865
## 3 SEA 43-1855 Lt. Governor        0.5574713
# now take tidy_big_precinct_data and relevel Race:
tidy_big_precinct_data <- tidy_big_precinct_data %>%
    # relevel these in the order set above
    mutate(Race = factor(Race,
                         levels = order_of_lines$Race))
head(tidy_big_precinct_data)
## Source: local data frame [6 x 8]
## 
##      Precinct    Location Registered Voters Times Counted   Turnout Order
##         (chr)       (chr)             (int)         (int)     (dbl) (int)
## 1      VEAZIE Not Seattle               631           518 0.8209192     1
## 2      NATHAN Not Seattle               510           424 0.8313725     2
## 3 KEN 47-2477 Not Seattle               537           472 0.8789572     3
## 4 RATTLESNAKE Not Seattle               608           506 0.8322368     4
## 5    ATKINSON Not Seattle               568           486 0.8556338     5
## 6      FARLEY Not Seattle               744           612 0.8225806     6
##       Race Democrat support
##     (fctr)            (dbl)
## 1 Governor        0.3123772
## 2 Governor        0.3149038
## 3 Governor        0.2857143
## 4 Governor        0.3076923
## 5 Governor        0.3501048
## 6 Governor        0.3601340

Finally we plot, suppressing labels for each precinct/order on the horizontal axis since it is not informative:

ggplot(data = tidy_big_precinct_data,
       aes(x = Order, y = `Democrat support`*100,
           group = Race, color = Race)) +
    geom_line(alpha = 0.5) +
    ggtitle("Democratic support in three races\nKing County, 2012, large precincts only") +
    scale_x_continuous(breaks = NULL, # no x-axis labels
                       name = "Precinct (ordered by Obama support)") +
    scale_y_continuous(breaks = seq(30, 100, 10),
                       name = "Percent of votes within precinct for Democratic candidate") +
    scale_color_manual(values = c("black", "red", "blue")) +
    theme_bw()

This kind of plot is a way to represent three dimensional information (Democrat support rates for each of the three positions per precinct) in two dimensions, by putting the observations in order on the horizontal axis and showing the three values of interest on the same scale on the vertical axis. Here’s a version using points instead of lines if you find the line version hard to interpret:

ggplot(data = tidy_big_precinct_data,
       aes(x = Order, y = `Democrat support`*100,
           group = Race, color = Race)) +
    geom_point(alpha = 0.35, size = 0.5) +
    ggtitle("Democratic support in three races\nKing County, 2012, large precincts only") +
    scale_x_continuous(breaks = NULL, # no x-axis labels
                       name = "Precinct (ordered by Obama support)") +
    scale_y_continuous(breaks = seq(30, 100, 10),
                       name = "Percent of votes within precinct for Democratic candidate") +
    scale_color_manual(values = c("black", "red", "blue")) +
    theme_bw()

From either chart, we see that support for Obama was usually higher than support for the Democratic candidates in the other two races within each precinct, as the President line typically lies above the Governor and Lt. Governor lines. Particularly in precincts where support for Obama was below 70% or so, we see considerably lower support for Inslee in the gubernatorial race by about 5-10%.

We also see that in precincts where support for Obama was above 70%, there is a plateau and even a slight drop in average support for the Democratic candidate for Lieutenant Governor (Brad Owen), and lots of variation. Perhaps this is related to The Stranger‘s and various progressive groups’ endorsement of Owen’s Republican opponent in 2012, which could be a factor resonating with voters in large liberal precincts. We’d need more data to figure this out, but the graph shows that many voters in the most Obama-supporting big precincts could not have been voting a straight Democratic ticket, and Owen faced a low ceiling on the level of support from liberal precincts.

If you are concerned that this is an artifact of limiting only to large precincts, you can repeat this analysis without filtering based on registered voters and obtain a similar result with more noise.