Base R When Possible, Packrat When Not

I gave a talk at Monday night’s Winston-Salem R Users Group that covered a lot of the base R package, and also showed a brief demo of how to use packrat when packages are necessary, and so is sharing code across multiple team members and/or environments.

The idea came from a comment at a previous meeting about the dangers of trying to maintain common versions of code within and across teams, not just to avoid surprising errors, but also to ensure reproducibility.

So my recommended approach was:

  1. Learn as much as you can about the details of the base package. It’s a huge package, and a lot of common needs can be handled simply and effectively.
  2. When you need a package (and there are certainly useful and necessary packages), use a system like packrat to keep dependencies systematically managed.

Most of the content wouldn’t be a surprise to daily R users, but I did throw in some things that either 1) surprised me when I first learned them, or 2) increased my productivity so much that I think everyone should know them.

For example:

> all(NULL)
[1] TRUE
> any(NULL)
[1] FALSE

How? If all of anything is TRUE, surely that means any of that same thing is also TRUE

It turns out making all(NULL) resolve to TRUE is a requirement for making another seemingly intuitive comparison TRUE. From the help file:

That all(logical(0)) is true is a useful convention: it ensures that

all(all(x), all(y)) == all(x, y)

even if x has length zero.

So it’s an important reminder than all can give funny results if you don’t also check the length of the object.

We covered some of the functions borrowed from other languages, like Negate, Reduce, Map, Filter, etc (along with the related-in-my-mind do.call).

The useful example I gave for Reduce was a simple merge of an arbitrary-length list of data.frames. The default merge function in R only accepts two arguments, so if you have x data.frames, you need (x – 1) merge calls.

We also touched on the idea that if, [, and [[ are really just functions, as are operators, and the funny things you can do by overloading + with -.

browser changed my life. The ability to step into a function environment and run code line by line is immeasurably valuable when debugging, so much that I now get really frustrated when debugging in other languages.

This is hard to demonstrate in a blog post – just imagine being able to control the flow of the program before the error occurs, and to test out different variations live in the same environment, instead of being stuck with a vague error report and a crash.

Lastly, we covered packrat, which is an extremely useful way to ensure you can replicate identical environments for multiple people collaborating on a project. I won’t bore you with the details here: start with the example walkthrough on the project page.

I had one piece of trouble when I was playing around: after installing packrat for a project, installing a new few packages and testing the snapshot function, it was hard to get things to work in other projects where I didn’t want to use packrat (yet).

I kept getting an error because it had altered my ~/.Rprofile such that R was always looking for a file that didn’t exist – annoying, but I solved it by removing that line from the profile.

I enjoyed giving the talk – next month we’ll have another group member cover image processing and analysis using the EBImage package, a topic that will be mostly new to me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s