Homework 5.2: Wrangling and hacker stats with Darwin’s finches (70 pts)


Peter and Rosemary Grant of Princeton University have visited the island of Daphne Major on the Galápagos every year for over forty years and have been taking a careful inventory of the finches there. The Grants published a wonderful book, 40 years of evolution: Darwin’s finches on Daphne Major Island. They were generous and made their data publicly available on the Dryad data repository.

We will investigate their measurements of beak depth (the distance, top to bottom, of a closed beak) and beak length (base to tip on the top) of Darwin’s finches. The image below defines the beak length and beak depth.

Finch beak diagram

We will look at data from two species, Geospiza fortis and Geospiza scandens. The Grants provided data on the finches of Daphne for the years 1973, 1975, 1987, 1991, and 2012. I have included the data in the files grant_1973.csv, grant_1975.csv, grant_1987.csv, grant_1991.csv, and grant_2012.csv. They are in almost exactly the same format is in the Dryad repository; I have only deleted blank entries at the end of the files.

a) The data sets come in separate files, but we would like to merge these all into one data frame. The problem is that they have different header names, and only the 1973 file has a year entry (called yearband). This is common with real data. It is often a bit messy and requires some wrangling.

  1. First, change the name of the yearband column of the 1973 data to year. Also, make sure the year format is four digits, not two!
  2. Next, add a year column to the other four data frames. You want tidy data, so each row in the data frame should have an entry for the year.

  3. Change the column names so that all the data frames have the same column names. I would choose column names

    ['band', 'species', 'beak length (mm)', 'beak depth (mm)', 'year']

  4. Concatenate the data frames into a single data frame.

b) After the wrangling you have done in part (a), your task now is to devise a measure for the shape of a beak. That is, invent some scalar measure that combines both the length and depth of the beak and justify your choice. Compare this measure between species and through time. (This is very open-ended. It is up to you to define the measure, make relevant plots, and compute confidence intervals.)