Ben Godfrey

Archive for January, 2009

Haskell web frameworks reinvent too much

Paul R Brown’s web application perpubplat (personal publishing platform) differs from a lot of the other Haskell web code I’ve seen — it doesn’t try to reinvent every wheel.

HAppS and Turbinado are 2 example Haskell frameworks. Both implement a web server. The HAppS project is implementing transactional ACID-compliant in-memory state, SMTP, IRC, DNS and much more. Turbinado is building a Rails-like stack.

These projects present large untested codebases. Neither project has a list of sites running their software. As a programmer considering using these tools, there’s just too much that could go wrong, too much I will have to understand and fix myself.

perpubplat is not a framework, but it’s a concise easy to understand codebase. It also contains a number of familiar looking patterns that every framework needs — logging, url mapping/routing. Most importantly, it doesn’t try to reinvent the wheel, it uses standard Haskell code where possible and serves pages to a web server via FastCGI.

Only serving responses through FastCGI makes complete sense to me. Apache, Lighty, Nginx and others are all great tools for getting static content out to the world. They are very configurable, offering flexibility for security rules, url rewriting, etc. Why try to build a product to mimic this existing code when you can piggyback on top of it through FastCGI? Instead all that programmer time can be used to build everything else required by a Haskell web application.

Wheel-reinvention (also known as Not Invented Here syndrome) makes it harder to use your code and harder to improve upon it, as parts become interdependent. Small pieces, loosely coupled means the Haskell framework conversation can evolve more quickly.

Nginx+Django+FastCGI

A lot of people seem to have posts like this, but there were some things that I got stuck on when moving to Nginx from Apache.

location ^~

The ^~ match can not be used with regular expressions.

This will not work as expected:

location ^~ /(foo|bar)/ {
    ...
}

Use the ~ match operator if you want to use an RE. Make sure that your RE matches don’t clash with you plain string matches. The latter will be preferred in this event.

location + root

If you want to specify a different root within a location block, be mindful that the uri is unchanged.

For example, if you want to publish Django’s admin media, you might write something like this:

location ^~ /admin/media/ {
    root /usr/local/django/django/contrib/admin/media;
}

When a request comes in, Nginx will concatenate the root and the uri to find the file to server. With this config, it will try to serve /usr/local/django/django/contrib/admin/media/admin/media/css/base.css.

SCRIPT_NAME and PATH_INFO

Django uses PATH_INFO to match against urlpatterns. Nginx’s fastcgi_params include doesn’t set that. It does set SCRIPT_NAME. If both PATH_INFO and SCRIPT_NAME are set to $fastcgi_script_name, Django seems to get an empty path for all requests. Just set PATH_INFO!

Request buffering to file

Nginx buffers large requests to file before passing them to an upstream server. There is no option to stop this from happening. If you want to track the progress of request uploads, you will need to use the Upload Progress Module.

Followize — A trimmed down, fast and efficient web app for reading tweets

Followize is a trimmed down, fast and efficient web app for reading tweets. See the latest update from each person you follow, explore replies and timelines easily. It’s kind of to Twitter as Gmail is to email.

Followize home view

Features

  • View the latest tweet from each person you follow. Get the 1,000 yard view of what your community is talking about.
  • Clean, minimal, efficient interface.
  • View replied-to tweets. Just click the name “in reply to blah” and the original tweet is loaded ajaxily.
  • View user timelines. The latest tweet is handy for a quick scan, but viewing a twitter’s last 10 tweets is as easy as clicking their name.
  • Links to @replies, #hashtags, $STOCKTWITS.

Followize is particularly handy if you follow a bunch of people. Sometimes the more active users drown out the quieter ones. Followize let’s you keep up-to-date with everyone easily.

Building a Twitter client

I’m a keen twitterer. When I read my tweets I see find that certain voices shout louder than others, where volume = tweet frequency. Those voices aren’t necessarily the ones I care about. I want to know what’s going on with my more restrained friends too.

I designed Followize to solve this problem. Like Twitter100, it shows the latest tweet from each friend. The UI is more efficient than Twitter100’s and I have some enhancements planned that I hope will make Followize a very quick and convenient way to keep up with the people you’re following.

Followize uses the Twitter API’s friends method. Until yesterday, the documentation for that method said it would return “up to 100 of the authenticating user’s friends who have most recently updated.” I.e. that the sort order is the created_at time of each friend’s latest status update. Subsequent pages of less-recently-updating friends can be requested as well. Followize is just a nice UI for this data built on Google App Engine.

However, after building the app and using it for a little while, I noticed that the data was not sorted in this way at all. I raised this as an API issue. One of Twitter’s engineers responded that this was a documentation error, rather than a software error, and updated the docs. The correct order is (effectively) the date the user began following a given person. Unfortunately this all but kills my application.

If Twitter is sending the data in the wrong order for my app, I have to load all the data and sort it myself. The first person I followed might be the one who has most recently updated and thus the last record in the results of the friends method call. Pulling a page of 100 friends from Twitter to App Engine takes around 0.8 seconds, decoding the JSON then takes another 0.15 seconds. Good old Scobleizer follows 21K people, Obama follows 171K! Loading all the required data for Scoble would take 3.3 minutes, plus some time for sorting, committing to cache etc. Twitter rate limits API requests to 100 per 60 minute period. Loading those 21K friends requires 210 API requests, and that’s only for one page. Scoble is likely to reload the page a few minutes later and the whole thing begins again.

I’m looking at using Gnip as a workaround, but this is sub-optimal. A rough strategy would be as follows:

  1. A user logs in to Followize for the first time.
  2. A background process loads the complete list of their friends from Twitter’s API.
  3. Followize adds those friends to a Gnip filter of Twitter users followed by Followize users.
  4. Gnip POSTs updates for each user to a Followize API endpoint.
  5. Followize stores users being followed and their latest update in it’s DB.
  6. When the user requests the page, tweets are loaded from the DB.

This drawbacks to this approach are:

  • Step 2 could still fall fowl of Twitter’s API rate limit, necessitating a 1 hour wait.
  • The application load doesn’t scale with traffic. Scoble could sign up, I’ll start getting a tonne of tweets coming in from Gnip, but Scoble may never visit Followize again, rendering that traffic useless. I can pull data up to 60 minutes old from Gnip, so I could minimize the processing overhead by pulling tweets every 60 seconds for example.
  • All of these API calls would be too long-running for Google App Engine.
  • The application complexity is dramatically increased and it is now reliant on an additional remote service.

I’d like Twitter to order the data for me, but Twitter’s API as it stands can’t be modified to do all the heavy lifting for every application. Gnip has an interesting model in that they allow you to offload some work, filtering of data, to them. A model in which I could write my own view of Twitter’s data and upload that to be run locally to their DB would be a great solution. Given the wide range of apps using Twitter’s API, I’m hopeful.

Update, Jan 9: HubSpot’s State of the Twittersphere says that only 12% of Twitter users are following more than 100 people. I’d suggest those are not likely the people who will find Followize useful though. In addition, most Twitter users are new to the service and following lists grow with time.

Update, Jan 17: After initial setbacks, I found simply pulling several pages from the Twitter API and caching them with staggered timeouts provides a good enough user experience. Followize lives!