Google Analytics and CausalImpact

Initial Steps

A digital marketing firm in the United Kingdom contacted us about building a web app to demonstrate differences in Google Analytics metrics before and their projects.

The idea relied on Google’s CausalImpact library for R (here’s a video presentation by Kay Brodersen, lead author on the paper behind the library).

Since at least some of the application would rely on R, the client was interested in both Shiny and OpenCPU.

We gave them the usual tradeoff – we can build a great prototype with Shiny without too much effort, and then weigh the need to support more traffic against the cost of Shiny Server Pro, the effort to build a load-balanced AWS setup with the free version of Shiny Server, or transition the project to OpenCPU.

We agreed to build the first version in Shiny with a PostgreSQL database to store model specifications.

Building the Foundation

There was a fair amount of start-up work before we could get to the fun part of estimating impacts:

  1. Simplifying the Google OAuth flow (thanks to Google’s OAuth Playground for providing example HTTP requests/responses).
  2. Diagramming and setting up the database tables with all the information that would be required to save, edit, and resume existing modeling projects.
  3. Lastly, figuring out the Google Analytics version 4 API request structure (again, thanks to Google for a useful example request generator tool).

Proving Value

The idea behind this application was to prove to the client’s own clients that their work had made a detectable, cumulative, and important impact.

The CausalImpact library can run with very little input: just a time series object and an event date. It constructs a Bayesian structural time series model on the time series prior to the event, and uses that to forecast data points after the event.

Then a simple comparison of post-event observed data against the forecast provides point-by-point impacts (differences), and the sum of those impacts is the cumulative difference.

For example, a research question might be “How many page views did we receive in the past month as a result of our website redesign?”

Well, check out the plots for an example from the Thanksgiving to Christmas stretch of 2016.

First we can compare the predicted page views (dashed green line) with the actual page views to see that something really increased page views for a few weeks:

freshegg-case-study-predicted

Then we can check the cumulative impacts to see the full scale of the differences between the predicted and observed values:

freshegg-case-study-cumulative

Something worked, right?

Output like this can be a deal-clincher for clients when using a metric like revenue.

If the client pays $X, and the project returns $2X in cumulative additional revenue, then the project fee is immediately justified.

Obviously this use case is client-facing, but this can be just as important as an internal tool. Suppose a business wants to track its own social media metrics and determine whether a recent campaign or hashtag actually made a difference in engagement, or whether its recent investment in systems software actually increased production, reduced costs, and improved efficiency measures.

Stop guessing with spreadsheets – use a more robust modeling approach to be sure you’re right.

More to Come

We think there are a lot of similar ideas for applications of CausalImpact that are worth exploring, whether they end up in a Shiny application or not.

If you’re interested in talking to us about how CausalImpact can help drive decision-making at your organization, please send a message and we’ll start a conversation.