Consult the general homework guidelines.
Due before class Monday 2014-09-29.
The goal is to manipulate and explore a dataset with the dplyr
package, complemented by visualizations made with ggplot2
. In particular, we keep our focus on working with a data.frame
(but note: dplyr
can work with other data storage formats seamlessly).
Remember the sampler concept. Your homework should serve as your own personal cheatsheet in the future for ways to manipulate a dataset and produce companion figures. Give yourself the cheatsheet you deserve!
Work with the Gapminder excerpt. If you really, really want to, you can explore a different dataset but get permission from Jenny.
Pick at least three of the tasks below and attack each with a table and figure.
dplyr
should be your main data manipulation toolggplot2
should be your main visualization toolMake observations about what your tables/figures show and about the process.
If you want to do something comparable but different, i.e. swap one quantitative variable for another, be my guest! If you are feeling inspired and curious, then we’re doing this right. Go for it.
Relax about the following things:
knitr::kable()
function. Assuming my_df
is a data.frame, here’s an R chunk that should print it as a decent-looking table:```{r results = 'asis'}
knitr::kable(my_df)
```
For each table, make sure to include a relevant figure.
Your figure does not have to depict every last number from the data aggregation result. Use your judgement. It just needs to complement the table, add context, and allow for some sanity checking both ways.
Notice which figures are easy/hard to make, which data formats make better inputs for plotting functions vs. for human-friendly tables.
I’ve always wanted to get a two-column layout, but only in certain parts of a document. The idea would be to put a figure and relevant table right next to each other. Anyone who figures that out in our report-making workflow would be a hero. But this is low priority and only for a keener.
You’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc. Give credit to your sources, whether it’s a blog post, a fellow student, an online tutorial, etc.
Follow instructions on How to submit homework
Start using our general rubric for specifics to evaluate! The form will require you to do so!
Check minus: Didn’t tackle at least 3 tasks. Or didn’t make companion graphs. Didn’t interpret anything but left it all to the “reader”. Or more than one technical problem that is relatively easy to fix. It’s hard to find the report in this crazy repo.
Check: Hits all the elements. No obvious mistakes. Pleasant to read. No heroic detective work required. Solid.
Check plus: Exceeded the requirements in number of tasks. Or developed novel tasks that were indeed interesting and “worked”. Impressive use of dplyr
and/or ggplot2
. Impeccable organization of repo and report. You learned something new from reviewing their work and you’re eager to incorporate it into your work.