I gave a talk at Monday night’s Winston-Salem R Users Group that covered a lot of the base R package, and also showed a brief demo of how to use packrat
when packages are necessary, and so is sharing code across multiple team members and/or environments.
The idea came from a comment at a previous meeting about the dangers of trying to maintain common versions of code within and across teams, not just to avoid surprising errors, but also to ensure reproducibility.
So my recommended approach was:
- Learn as much as you can about the details of the base package. It’s a huge package, and a lot of common needs can be handled simply and effectively.
- When you need a package (and there are certainly useful and necessary packages), use a system like
packrat
to keep dependencies systematically managed.
Most of the content wouldn’t be a surprise to daily R users, but I did throw in some things that either 1) surprised me when I first learned them, or 2) increased my productivity so much that I think everyone should know them.
For example:
> all(NULL) [1] TRUE > any(NULL) [1] FALSE
How? If all of anything is TRUE
, surely that means any of that same thing is also TRUE
…
It turns out making all(NULL)
resolve to TRUE
is a requirement for making another seemingly intuitive comparison TRUE
. From the help file:
That all(logical(0))
is true is a useful convention: it ensures that
all(all(x), all(y)) == all(x, y)
even if x
has length zero.
So it’s an important reminder than all
can give funny results if you don’t also check the length of the object.
We covered some of the functions borrowed from other languages, like Negate
, Reduce
, Map
, Filter
, etc (along with the related-in-my-mind do.call
).
The useful example I gave for Reduce
was a simple merge of an arbitrary-length list of data.frame
s. The default merge
function in R only accepts two arguments, so if you have x data.frame
s, you need (x – 1) merge
calls.
We also touched on the idea that if
, [
, and [[
are really just functions, as are operators, and the funny things you can do by overloading +
with -
.
browser
changed my life. The ability to step into a function environment and run code line by line is immeasurably valuable when debugging, so much that I now get really frustrated when debugging in other languages.
This is hard to demonstrate in a blog post – just imagine being able to control the flow of the program before the error occurs, and to test out different variations live in the same environment, instead of being stuck with a vague error report and a crash.
Lastly, we covered packrat
, which is an extremely useful way to ensure you can replicate identical environments for multiple people collaborating on a project. I won’t bore you with the details here: start with the example walkthrough on the project page.
I had one piece of trouble when I was playing around: after installing packrat
for a project, installing a new few packages and testing the snapshot
function, it was hard to get things to work in other projects where I didn’t want to use packrat
(yet).
I kept getting an error because it had altered my ~/.Rprofile
such that R was always looking for a file that didn’t exist – annoying, but I solved it by removing that line from the profile.
I enjoyed giving the talk – next month we’ll have another group member cover image processing and analysis using the EBImage
package, a topic that will be mostly new to me.