In which I do some cheerleading for the R Project for Statistical Computing.
1. You’re almost certain to find it worth the effort
Often, in the endless “should academics learn to code” debate, it’s not clear to newcomers what you can actually use this code for once you’ve invested a lot of time in learning it. Copy&paste online tutorials don’t tend to make things much clearer. How do you get from “hello world!” to practical applications for your research?
But R is all about analysing and presenting data and there aren’t too many historians who don’t work with data of some kind sooner or later. If you already use spreadsheets or SPSS or databases of some kind, and if you ever present tables or graphs in papers, you’ll almost certainly get something out of learning even a small amount of R (and there’ll probably be R packages to make it easy to use it with your current tools). R is flexible: it can be used with conventional tabular statistical data or with linguistic corpora and other textual datasets. You can use it for heavyweight number crunching, textmining, exploratory visualisations at the start of a project, and spectacular ones in presentations and publications – all sorts of humanities data uses. (I reallyreallyreally want to find an application for the beautiful viz in this blogpost.)
2. You don’t have to do it on the command line
I know some people love command line tools. But a good graphical user interface can make all the difference for newcomers and those of us who actually don’t look forward to firing up Terminal. After installing R itself, RStudio is the next thing to download (it’s free). It’s a proper work horse, including a code editor, console, R packages manager, visualization tools, previewer and more. If you already use Markdown (and maybe even if you don’t yet), you’ll love RMarkdown and R Notebooks.
(On a different GUI-related topic, see also: Github Desktop. You’re welcome.)
3. The Tidyverse
One of my periodic rants is that historians need to understand data and data modelling (even if they don’t think they work with “data”) before they worry about programming code. With R you can learn about both at the same time. The Tidyverse is described as “an opinionated collection of R packages designed for data science”, which “share an underlying philosophy”; tools for creating and working with Tidy Data.
4. Great online learning resources by and for historians
The Programming Historian has several R tutorials from the very basic to more advanced techniques. Currently I think there are four:
R Basics with Tabular Data
Data Wrangling and Management in R
Basic Text Processing in R
Correspondence Analysis for Historical Research with R
Looking beyond these short tutorials, Lincoln Mullen has developed a free online textbook, Computational Historical Methods, “how to identify sources and frame historical questions then answer them through computational methods”, using R.
See also Scott Weingart’s list of resources for teaching yourself to code for DH.
5. Sharing, openness and reproducible research
I’ve written my last two conference papers entirely in R. This means that everything is in plain text and I can easily post online all the data, code and visualisations I used. I put them on Github, but there are other options, like RPubs (from the people who make RStudio, and it’s really easy to send stuff from RStudio straight to RPubs).
Filed under: Digital History, Resources Tagged: data, R