One of my favorite aspects of GitHub is the ability to inspect a repository’s files in a browser. Certain practices make browsing more rewarding and can postpone the day when you must create a proper website for a project. Perhaps indefinitely.
Keep files in the plainest, web-friendliest form that is compatible with your main goals. Plain text is the very best. GitHub offers special handling for certain types of files:
README.md
.R
filesread.table()
Let’s acknowledge the discomfort some people feel about putting derived products under version control. Specifically, if you’ve got an R Markdown document foo.Rmd
, it can be knit()
to produce the intermediate product foo.md
, which can be converted to the ultimate output foo.html
. Which of those files are you “allowed” to put under version control? Source-is-real hardliners will say only foo.Rmd
but pragmatists know this can be a serious bummer in real life. Just because I can rebuild everything from scratch, it doesn’t mean I want to.
The taboo of keeping derived products under version control originates from compilation of binary executables from source. Software built on a Mac would not work on Windows and so it made sense to keep these binaries out of the holy source code repository. Also, you could assume the people with access to the repository have the full development stack and relish opportunities to use it. None of these arguments really apply to the foo.Rmd --> foo.md --> foo.html
workflow. We don’t have to blindly follow traditions from the compilation domain!
In fact, looking at the diffs for foo.md
or foo-figure-01.png
can be extremely informative in data analytic projects! Sometimes you see changes you did not expect, which can tip you off to changes in the underlying data and/or the packages you depend on.
This is a note about cool things GitHub can do with various file types, if they happen to end up in your repo. I won’t ask you how they got there.
You will quickly discover that GitHub renders Markdown files very nicely. By clicking on foo.md
, you’ll get a decent preview of foo.html
. Yay!
Aggressively exploit this handy feature. Make Markdown your default format for narrative text files and use them liberally to embed notes to yourself and others in a repository hosted on Github. It’s an easy way to get pseudo-webpages inside a project “for free”. You may never even compile these files to HTML explicitly; in many cases, the HTML preview offered by GitHub is all you ever need.
What does this mean for R Markdown files? Keep intermediate Markdown. Commit both foo.Rmd
and foo.md
, even if you choose to .gitignore
the final foo.html
. As of September 2014, GitHub renders R Markdown files nicely, like Markdown, and with proper syntax highlighting, which is great. But, of course, the code blocks just sit there un-executed, so my advice about keeping intermediate Markdown still holds. You want YAML frontmatter that looks something like this for .Rmd
:
---
title: "Something fascinating"
author: "Jenny Bryan"
date: "`r format(Sys.Date())`"
output:
html_document:
keep_md: TRUE
---
or like this for .R
:
#' ---
#' title: "Something fascinating"
#' author: "Jenny Bryan"
#' date: "`r format(Sys.Date())`"
#' output:
#' html_document:
#' keep_md: TRUE
#' ---
In RStudio, when editing .Rmd
, click on the gear next to “Knit HTML” for YAML authoring help
For a quick, stand-alone document that doesn’t fit neatly into a repository or project (yet), make it a Gist. Example: Hadley Wickham’s advise on what you need to do to become a data scientist. Gists can contain multiple files, so you can still provide the R script or R Markdown source and the resulting Markdown, as I’ve done in this write-up of Twitter-sourced tips for cross-tabulation.
README.md
You probably already know that GitHub renders README.md
at the top-level of your repo as the de facto landing page. This is analogous to what happens when you point a web browser at a directory instead of a specific web page: if there is a file named index.html
, that’s what the server will show you by default. On GitHub, files named README.md
play exactly this role for directories in your repo.
Implication: for any logical group of files or mini project-within-your-project, create a sub-directory in your repository. And then create a README.md
file to annotate these files, collect relevant links, etc. Now when you navigate to the sub-directory on GitHub the nicely rendered README.md
will simply appear.
Some repositories consist solely of README.md
. Examples: Jeff Leek’s write-ups on How to share data with a statistician or Developing R packages.
If you’ve got a directory full of web-friendly figures, such as PNGs, you can use code like this to generate a README.md
for a quick DIY gallery, as Karl Broman has done with his FruitSnacks. I have also used this device to share Keynote slides on GitHub (mea culpa!). Export them as PNGs images and throw ’em into a README gallery: slides on file organization and some on file naming.
If you have an HTML file in a GitHub repository, simply visiting the file shows the raw HTML. Boo. But if you preface the link with http://htmlpreview.github.com/?
, you will see properly rendered HTML. Illustration:
This sort of enhanced link might be one of the useful things to put in a README.md
or other Markdown file in the repo.
Update: you may also want to check out rawgit.com or this Chrome extension.
You will notice that GitHub does automatic syntax highlighting for source code. For example, notice the coloring of this R script. The file’s extension is the primary determinant for if/how syntax highlighting will be applied. You can see information on recognized languages, the default extensions and more at github/linguist. You should be doing it anyway, but let this be another reason to follow convention in your use of file extensions.
Note you can click on “Raw” in this context as well, to get just the plain text and nothing but the plain text.
GitHub will nicely render tabular data in the form of .csv
(comma-separated) and .tsv
(tab-separated) files." You can read more in the blog post announcing this feature in August 2013 or in this GitHub help page.
Advice: take advantage of this! If something in your repo can be naturally stored as delimited data, by all means, do so. Make the comma or tab your default delimiter and use the file suffixes GitHub is expecting. I have noticed that GitHub is more easily confused than, say, R about things like quoting, so always inspect the GitHub-rendered .csv
or .tsv
file in the browser. You may need to do light tidying to get the automagic rendering to work properly. Think of it as yet another way to learn about imperfections in your data.
Here’s an example of a tab delimited file on GitHub: lotr_clean.tsv, originally found here (nope, IBM shut down manyeyes July 2015).
Note you can click on “Raw” in this context as well, to get just the plain text and nothing but the plain text.
PNG is the “no brainer” format in which to store figures for the web. But many of us like a vector-based format, such as PDF, for general purpose figures. Bottom line: PNGs will drive you less crazy than PDFs on GitHub. To reduce the aggravation around viewing figures in the browser, make sure to have a PNG version in the repo.
Examples:
The browsability of GitHub makes your work accessible to people who care about your content but who don’t (yet) use Git themselves. What if such a person wants all the files? Yes, there is a clickable “Download ZIP” button offered by GitHub. But what if you want a link to include in an email or other document? If you add /archive/master.zip
to the end of the URL for your repo, you construct a link that will download a ZIP archive of your repository. Click here to try this out on a very small repo:
https://github.com/jennybc/lotr/archive/master.zip
Go look in your downloads folder!
To link to another page in your repo, just use a relative link: [admin](courseAdmin/)
will link to the courseAdmin/
directory inside the current directory. [admin](/courseAdmin/)
will link to the top-level courseAdmin/
directory from any where in the repo
The same idea also works for images. ![](image.png)
will include image.png
located in the current directory
They love that!
You can create a link that takes people directly to an editing interface in the browser. Behind the scenes, assuming the clicker is signed into GitHub but is not you, this will create a fork in their account and send you a pull request. When I click the link below, I am able to actually commit directly to master
for this repo.
CLICK HERE to suggest an edit to this page!
Here’s what that link looks like in the Markdown source:
[CLICK HERE to suggest an edit to this page!](https://github.com/STAT545-UBC/STAT545-UBC.github.io/edit/master/bit006_github-browsability-wins.md)
and here it is with placeholders:
[INVITATION TO EDIT](<URL to your repo>/edit/master/<name of your md file>)
AFAIK, to do that in a slick automatic way across an entire repo/site, you need to be using Jekyll or some other automated system. But you could easily handcode such links on a small scale.