Query Cache Hashing in SQL Server

I attended a SQL Server user group about two weeks ago, and the topic was query optimization, presented by Kevin Kline of SentryOne.

I found the subject pretty interesting – there are a lot of tricks to SQL Server queries that can really impact how quickly queries return.

I want to follow up on some of the very subtle cases he demonstrated, but one thing that really caught my eye was his explanation that query results are cached using a hash of the query string.

No problem, right? Actually, the following queries are all hashed to different keys, meaning that despite returning the exact same results, they wouldn’t use the same cached results.

/* All of the following four queries will be cached separately! */

SELECT * FROM table1;
select * from table1;
SELECT *  FROM table1;
select *
  from table1;

So if you want to take advantage of caching, consistent style really matters!

But why is this? Why isn’t SQL Server upcasing/lowcasing queries, stripping returns, and collapsing sequential whitespace before calculating the hash that will be the cache key?

Seems easy enough – does anyone know why they don’t?

Implementing SHA-1 in Python

Earlier this year researchers at Google were able to generate two PDFs with the same SHA-1 digest, and the world became reasonably worried about the security of the hashing algorithm.

So even though I’ll likely never be using SHA-1 in the future (and more importantly, would never use my own implementation in a real-world project), I thought I’d sit down with the spec and see if I could implement it in Python, which I haven’t been using as much as I want to lately.

Thankfully NIST also provides a short example case to check against.

So let’s begin!

Continue reading “Implementing SHA-1 in Python”

Copying R Environments

I’ve been working on a codebase that relied on storing a lot of objects in R environments, mainly because of the potential speed improvements with large numbers of objects.

See this article for a pretty good explanation.

After a recent spec change, I needed to start looping around a block of code that was previously using a single environment to store objects. The easiest approach was to create an initial base environment to use at the start of each iteration of the loop, and then create a copy of that environment that would be specific to the loop iteration.

But I got a surprise!

All of the result environments looked like the last iteration.

First, let’s look at how this works when you start with a base list, make a copy,  and modify the copy:

Continue reading “Copying R Environments”

United States Topography

I downloaded a shapefile of North American topographical contours a while ago, and never spent much time with it until recently, when I noticed there are some really fascinating subsets of the United States.

For reference, the file comes from this page, and contains contours at 100-meter resolution. That’s actually not very detailed if you’re looking at a county or a city, but for an area this large with a pretty wide range in elevation, it’s plenty to get the big idea.

Speaking of big idea, here’s the full layer, with the color scale ranging from blue at the low end to red at the highest elevations:

overall
Might as well be honest – I could look at this map for hours and not get bored.

 

Okay, big deal – flat in the east and south, mountains in the west, right? Stay with me…

Continue reading “United States Topography”

Presenting Firebase to TriadJS

This past Thursday I had the opportunity to present to the TriadJS meetup group about rapid application development with Firebase.

I don’t do a ton of pure web development, but I like when I get to work on a project where I can explore a new tool, and this presentation allowed me to summarize the pros and cons that I found while working on the social media analysis dashboard that I’ve written about as a case study.

Continue reading “Presenting Firebase to TriadJS”

Shiny Versus What, Exactly?

When the early versions of Shiny were released in 2012, my career changed forever.

I’m not exaggerating. Shiny generalized data analysis – instead of tweaking code and parameters and plots every time a client needed to see a slight variation of existing output, I could build a user interface that would produce the same analysis for ANY inputs.

The researcher could check for themselves, without needing a round trip back to me. We could move faster, and more effectively.

Five years late, Shiny has no equal.

Continue reading “Shiny Versus What, Exactly?”