I gave a talk at Monday night’s Winston-Salem R Users Group that covered a lot of the base R package, and also showed a brief demo of how to use
packrat when packages are necessary, and so is sharing code across multiple team members and/or environments.
The idea came from a comment at a previous meeting about the dangers of trying to maintain common versions of code within and across teams, not just to avoid surprising errors, but also to ensure reproducibility.
So my recommended approach was:
- Learn as much as you can about the details of the base package. It’s a huge package, and a lot of common needs can be handled simply and effectively.
- When you need a package (and there are certainly useful and necessary packages), use a system like
packratto keep dependencies systematically managed.
Most of the content wouldn’t be a surprise to daily R users, but I did throw in some things that either 1) surprised me when I first learned them, or 2) increased my productivity so much that I think everyone should know them.
> all(NULL)  TRUE > any(NULL)  FALSE
How? If all of anything is
TRUE, surely that means any of that same thing is also
It turns out making
all(NULL) resolve to
TRUE is a requirement for making another seemingly intuitive comparison
TRUE. From the help file:
all(logical(0)) is true is a useful convention: it ensures that
all(all(x), all(y)) == all(x, y)
x has length zero.
So it’s an important reminder than
all can give funny results if you don’t also check the length of the object.
We covered some of the functions borrowed from other languages, like
Filter, etc (along with the related-in-my-mind
The useful example I gave for
Reduce was a simple merge of an arbitrary-length list of
data.frames. The default
merge function in R only accepts two arguments, so if you have x
data.frames, you need (x – 1)
We also touched on the idea that
[[ are really just functions, as are operators, and the funny things you can do by overloading
browser changed my life. The ability to step into a function environment and run code line by line is immeasurably valuable when debugging, so much that I now get really frustrated when debugging in other languages.
This is hard to demonstrate in a blog post – just imagine being able to control the flow of the program before the error occurs, and to test out different variations live in the same environment, instead of being stuck with a vague error report and a crash.
Lastly, we covered
packrat, which is an extremely useful way to ensure you can replicate identical environments for multiple people collaborating on a project. I won’t bore you with the details here: start with the example walkthrough on the project page.
I had one piece of trouble when I was playing around: after installing
packrat for a project, installing a new few packages and testing the
snapshot function, it was hard to get things to work in other projects where I didn’t want to use
I kept getting an error because it had altered my
~/.Rprofile such that R was always looking for a file that didn’t exist – annoying, but I solved it by removing that line from the profile.
I enjoyed giving the talk – next month we’ll have another group member cover image processing and analysis using the
EBImage package, a topic that will be mostly new to me.