After reading Zed Shaw’s uncompromising Programmers Need To Learn Statistics Or I Will Kill Them All I felt suitably chided and decided to serve my pennance by kicking some stuff around with
ab and R. This is probably the kind of thing that Shaw was decrying, my statistical skills are terrible, but here’s what I did.
Gather some data
First I wanted to gather some data about my server, so I used
ab to benchmark the server several times, making 1,000 requests each time using either 5, 10 or 20 threads. I know this falling straight into the power of ten pitfall. I haven’t read the working out the power of my test chapter of the R tutorial (although I have read Joseph Tal’s Reading Between The Numbers so I shouldn’t be able to hide for too long).
This command sets
ab going and writes raw data about the requests to a tab-separated file. See
man ab for more info.
ab -n 1000 -c 10 -g journal-n1000-c10.tsv http://aftnn.org/journal/
To remove confounding elements, I guess I should do some or all of the following:
- Run the test from many different servers around the world to eliminate network and CPU power variations.
- Run the same test at random times of the day to eliminate variation caused by the daily cycle of traffic to the site.
- Instead of carpet bombing the server with 1,000 requests, I should introduce a random-length wait period between each request, this would (I assume) more accurately model real world traffic, as well as sneaking around any basic cache mechanism in play.
Plot the data with R
This couldn’t be simpler. The following R commands create a histogram of my request data.
perfdata <- read.delim("~/journal-n1000-c10.tsv")
And bingo, R displays this:
An abberation no doubt, Zed, but there you go.