Ben Godfrey

Moving to HTML5

HTML5 book

I’ve converted this site to HTML5. It took about 5 minutes.

The first step was to switch the doctype. The HTML5 doctype:

<!DOCTYPE html>

And that’s it. No more secret incantation.

The 2nd step was to remove the trailing slashes from void tags (<img>, <meta>, <br>, etc). These are optional in HTML5. I’m not using the XML serialisation, so they’re unnecessary.

Next steps

When I get another 5 minutes, I’d like to use the new document structure tags — <header>, <nav>, <section>, <article> and <footer>. These expose the semantics of the document more clearly (useful for authors). They can also be styled with display:block to keep older browsers happy. Their introduction was based on a large-scale analysis of HTML document structure performed by Google.

Towards testing OpenRasta views (OR 2.0.3, .NET 4, VS 2010)

My employer is a .NET shop. Despite my open source upbringing, I’ve been getting to grips with some of the newer .NET technologies, OpenRasta for exposing objects RESTfully, Fluent NHibernate for simple object-RDBMS mapping. Some pretty cool stuff coming from the ALT.NET scene.

I’m building a RESTful XML service with OpenRasta. I encountered a couple of problems that weren’t covered by the community documentation so I’ve written them up here. As such this is not a complete introduction. See this question on StackOverflow about getting started with OpenRasta for help on that front.

A quick note about Web.config

The minimal changes to Web.config suggested by the first site tutorial wasn’t enough for me in VS 2010. However, borrowing the Web.config from OpenRasta.Demo (included in the OR repo) and hacking it a little bit worked a treat. Here is the complete Web.config for an OR 2.0.3 project hosted on ASP.NET in VS2010 that’s working for me right now.

A note about NUnit versions

OpenRasta.DI.Unity.Tests.Unit introduces a dependency on nunit.framework.dll version 2.5.1.9189. If you’re using a different version, you can redirect the binding in App.config. See Resolving Dependent .NET Assembly Version Conflicts for more info.

Testing views

Django’s test client makes it easy to test a view.

>>> from django.test.client import Client
>>> c = Client()
>>> response = c.get("/retailer/1")
>>> response.content
'<?xml version="1.0" encoding="utf-8"?>...'

I want to do the same thing to my OpenRasta views.

  1. Set up a lightweight environment with 1 or 2 lines of code.
  2. Pass an URL to a method which returns the view output.
  3. Parse response content and inspect data contained therein.

I would prefer not to have to set up a dev web server or use Selenium or Twill or some other tool to test my view code. Although those are valid integration tests, they’re complicated and slow. My purpose is just to demonstrate that I’ve wired my domain model and OpenRasta together correctly, so I want to test the shortest code path.

OpenRasta provides InMemoryHost which looks like it should solve the problem nicely. OpenBastard, OpenRasta’s suite of regression tests, also provides some tools for solving this problem. Sadly, both are undocumented and work in progress at the time of writing.

After spending some time looking through the source and asking questions on the mailing list, I wasn’t able to create a test environment using either InMemoryHost or OpenBastard. I’ll update this post as and when I make any progress.

However, in the meantime, here is an integration test that passes a request to OR through IIS running on localhost.

[TestFixture]
public class RetailerTests
{
    [Test]
    public void GetRetailer_RetailerExists_RetailerRepresentationReturned()
    {
        var uri = new Uri("http://localhost/retailers/1");
        var webRequest = (HttpWebRequest)WebRequest.Create(uri);
        webRequest.Method = "GET";
        webRequest.ContentType = "application/x-www-form-urlencoded";
        webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";

        Retailer retailer;
        DataContractSerializer dcs = new DataContractSerializer(typeof(Retailer));
        using (var response = webRequest.GetResponse())
        {
            retailer = (Retailer)dcs.ReadObject(response.GetResponseStream());
        }

        Assert.AreEqual(retailer.Id, 1);
        Assert.AreEqual(retailer.Name, "Retailer");
        Assert.AreEqual(retailer.Products.Count, 1);
    }
}

Based on code posted to the OpenRasta mailing list by David Lawton.

Hack your life with Remember The Milk

Alt text

At first glance, Remember The Milk (henceforth RTM), is a simple todo list web app. Dig deeper and you’ll find a very flexible, customisable and programmable platform.

Getting Things Done

Like many todo list apps, RTM roughly follows the Getting Things Done (GTD) methodology.

  • The overview allows you to focus on tasks for today, tomorrow or this week.
  • The tasks view allows you to manage tasks, categorise them into lists (inbox, errands, work, etc), tag them, set due dates, set repeating behaviour and add urls, notes and time estimates.
  • The locations view lets you assign tags to locations (useful when you use a mobile app).

Hacks

Quick task add

Adding tasks to RTM is a hack in itself. They have a nice terse syntax for specifying the attributes of a task. Days, times and urls are parsed automatically, e.g. “Buy tea 1pm,” but you can save a lot of time using their Smart Add operators. E.g.:

Prioritise issues Monday #Work !2 *weekly

Means:

Add a task called ‘Prioritise issues’ for Monday, add it to my ‘Work’ list, set the priority to 2 and repeat it every Monday.

See the Smart Add documentation for details and lots of examples.

Keyboard shortcuts

RTM’s keyboard shortcuts save a lot of time when navigating around the application. There are lots of them, they cover the app pretty comprehensively, making it feel more like a desktop app.

iPhone app

Before switching to RTM, I used OmniFocus on 2 Macs and my iPhone. The iPhone app was always very slow to load and sync. I hardly used it. RTM’s iPhone app is simple, fast and useful. It starts fairly quickly (although I wouldn’t complain if they shaved off another second or 2). Critically, once it has started, it’s very quick to add tasks (thanks to Smart Add) and to sync. RTM pushes reminders as app notifications (which are like richer text messages). The number of incomplete tasks (either for today or in total) is shown as an icon badge. The app does just the things you need and does them well.

There is also an Android app and sync tools for Blackberry and Windows Mobile.

Reminders for events with due times

RTM can send you task reminders via email, SMS, Twitter or iPhone app notifications. You can configure RTM to send reminders for all events or just events with due times.

I forget to do things, so I set due times for things and RTM sends me an iPhone notification. I guess when I’m going to be in the shop and get RTM to remind me when I’m there. An alternative way to do this would be to set locations for tasks and then check the location list when I arrive.

Repeating tasks

Tasks can be set to repeat and RTM can express quite specific schedules, e.g. “every month on the 4th” or “after 6 months.” See the repeat interval documentation for details. I use it to remind me to do my weekly online grocery shop.

Add to RTM bookmarklet

RTM’s Quick Add Bookmarklet makes setting reminders from your browser extra quick and easy. Unfortunately, it doesn’t save address of the current page with the new task.

rtm command using mail

I’m never far from a bash shell, so it’s useful to have a command to add things to RTM. This command uses RTM’s email to task list feature.

rtm() { echo '' | mail -s "$*" username+12345x@rmilk.com; }

The Smart Add operators mean I can easily set all the attributes I need on the task from a single command. I haven’t gotten into scripting it yet, but it would be simple to add RTM tasks in that way. There are loads of ways this could be helpful. For example, I could create a cron job to check disk space on my servers and add a high-priority RTM task if a disk becomes more than 80% full.

You could also write a version of this command that uses Twitter to upload tasks or goes direct to the API, e.g. using the Python API wrapper.

Searching and smart lists

You can search RTM with a bewildering array of Google-like search operators. Search for tasks tagged blah with tag:blah or see work tasks completed yesterday with completedWithin:"1 day of today" and list:Work. You can save any search as a Smart List, which adds a tab to your tasks view. Tasks added when viewing a smart lists inherit the attributes of that list.

Smart lists have a million applications. Here are some ways I’ve used them:

  • Create a master “Work” list and then smart lists that filter by tag to show items for individual projects. View the big picture or focus on a project.
  • Create a list of tasks recently completed. Check it before going along to review meetings.

Even more hacks

There are loads more RTM hacks I haven’t tried yet:

Your life, hacked

RTM supports a myriad of ways to organise your life and work. It’s easy to get started and flexible enough to support many different styles of organisation. Some really powerful features help you work smart. The app is always responsive, it gets out of your way and lets you focus on getting things done.

Mixcloud is a great site for sharing mixes

UK startup Mixcloud has built a great site for sharing DJ mixes. It was fun enough that I typed out the full track-listing for my now-venerable Make It Minimal mix, all 43 tracks!

Make It Minimal by Afternoon on  Mixcloud

How I work

I’m an independent software engineer working on a contract basis. I have a process for projects. It’s a relatively lightweight, loosely agile methodology. It attempts to ensure projects run smoothly by building software in a series of short iterations.

Face-to-face or Skype meeting

After an introduction is made, I organise a face-to-face or Skype meeting. I get an overview of the requirements and the context in which the application will sit, the business goals. Clients often have preconceptions about what I’ll do, how long it will take, etc. I like to learn about the project so I can challenge those preconceptions if necessary. For example, if a client asks me to create a blogging tool, I’ll make sure that none of the off-the-shelf tools will do before proceeding. I’m actually pretty vigorous in recommending alternate approaches. My experience is that a lot of stress and heartache can be avoided.

Because I practise agile software development, I encourage clients to find a minimum set of features which can be implemented an iteration lasting up to 2 weeks. This is a useful process. It encourages both me and my clients to think about the business goals first. If those goals can be met without software or with off-the-shelf tools, we can make that decision now.

Specification document

After the initial meeting and a follow-up email or 2, I will produce an iteration specification document. I usually get this to the client within a few days to a week depending on how busy I am. I can rush it through if necessary.

The specification will list the set of user stories (how the app will be used, tasks it will support) and the features required (the actual stuff I’ll build). The feature set will take no more than 2 weeks to develop, test and deploy. Additional features are left for subsequent iterations.

If the client is happy, work begins. Otherwise the spec is revised. The spec is my statement of intention, this is my understanding of what the client wants me to build. I ask my clients to check this carefully. If I’ve got something wrong, it’s time-consuming to fix later. It’s often hard to make spec docs exhaustive and unambiguous and this can lead to disagreements, I try hard to avoid that.

Developing iteratively

The work begins on the first iteration. I will develop the required features and test and deploy the software within the stated time period. During this period communication slows down a bit. Coding requires a really quite extreme level of concentration so I tend to close my email, Skype, etc and put my headphones on.

Only quite minor changes to the spec can be accommodated once an iteration has begun. If changes are frequent, it’s very hard to get up to speed. A car can go much faster on a straight road. The client’s desire to provide new feedback and ideas and the developer’s desire to get the job done are often in competition. The beauty of developing in short iterations is that this competition can be resolved. Ideas can wait a week or 2, in fact they benefit from more consideration, meanwhile the develop can code away undisturbed.

Towards the end of the iteration, the new code is deployed to somewhere where it can be tested by myself, the client and other stakeholders, often a staging site. Bugs are found and fixed. Loose ends are tied up. Finally, the software is put into production. Bug testing and fixing takes from 25-50% of the iteration time. Deployment is generally straightforward, but can still take a few hours, for that reason it’s only done once per iteration.

When an iteration is complete, the process begins again. A face-to-face or Skype meeting allows the client and I to assess what’s been achieved so far. Learning from one iteration goes into the next. A new spec is produced and I get back to work.

Clojure: a stateless dynamically-typed Lisp on the JVM

I’m a big fan of programming without state in languages like Haskell and Erlang. Clojure is a modern Lisp, but follows the stateless style very closely, taking ideas from Haskell and ML. Clojure is implemented in Java and runs on the JVM, providing full access to all of Java’s libraries.

Modern

Clojure is a lot less crufty than other Lisp implementations, some of which are now pretty long in the tooth. Maps and vectors and a few other things are given syntax. Macros still work though, so I guess it’s still all s-expressions ([1 2 3] could trivially map to (vector 1 2 3)). Clojure has a terse syntax with lots of good code smells (e.g. *constant* and predicate?).

Structural sharing

Clojure benefits from state of the art data structure research. Clojure’s key data structures (lists, vectors, maps) are implemented using a technique called structural sharing. If you add an element to the front of an existing list, the new list is simply the new item plus the existing list. The existing list can be stored once. The new item simply points to the head of the existing list. This technique preserves complexity characteristics of operations on data structures (insert, remove, access, etc), minimises memory footprint and provides immutable state! Very cool stuff.

Dynamic typing

Clojure is dynamically-typed like Python or Ruby. This makes it a good entry point from these languages. For programmers looking to program without state other options are Haskell or Erlang. Erlang is awesome for building highly available, concurrent applications. It’s syntax is kind of fun, but not for everyone. Haskell is strongly typed, pure and lazy. This unique mix of features can be very powerful, but can also trip you up. Both languages have a sweet spot, Clojure feels more general purpose. For me, that means great for scripting and writing web apps :-).

I like Django’s minimal template language. There is an implementation in Erlang, but Erlang’s string processing is somewhat weak. Templating and other string processing, like parsing JSON, is a bit long-winded. Haskell doesn’t have any good template languages, the best option is Text.Xhtml, a combinator library. Not very designer-friendly. Haskell’s powerful type system isn’t very useful for implementing a text template language. It gets in the way more than it helps. It feels like an impedance mismatch. It would be an interesting project to build a template processor compatible with Django in Clojure.

Recommended viewing

For a more complete description watch Clojure creator Rich Hickey describing Clojure and walking through a concurrent application.

Software development advice for startups

You’ve got a great idea! You want to build a web site that saves people time and money and helps them make friends! You’ve written a rough business plan, now you want to find someone to build your site.

Software development is complicated, expensive, error-prone, regularly boring and complicated. As a programmer, here’s what I think you should know before we meet for coffee.

1. Avoid developing software

As a general rule, you should avoid doing anything you don’t need to do in a startup or new project. You have plenty of things to worry about. If you can possibly avoid writing software, do so. Need a website? Use WordPress. Intruders.tv and I Can Has Cheezburger are 2 great businesses built on WordPress. Need something like a social network? Use Ning. Prove that you can build the community, then transition to a bespoke platform to build features. Since 2004, an amazing number of high quality, free or low-cost tools have come on to the market. Tools don’t have to be perfect. Maybe half your product is already implemented by someone else’s API. Use that, build the other half. Worry about strategic risks later. You’re prototyping remember.

2. Software is complex

Software is complex and fragile. Even a simple piece of software is many times more complex than a car engine for example. Even simple web apps built with standard tools don’t tend to share that many common parts. Understanding an application later and modifying it is really a very complex task indeed.

It’s many times easier to change an idea or a document or a diagram than to change software. Try to make changes early and avoid making them late if possible. The agile methodology, building software bit by bit in short sprints, makes it more likely that you’ll change something before it’s been coded, tested and deployed.

3. Don’t make a big long list of features stretching until the end of time and space

Decide what the very core of your offering is, find out if your customers are interested and build the minimum that can solve their problem.

This approach costs less and is less risky than building your perfect dream. When you have the minimum viable product built, solicit feedback from your customers. By making the big list, you’re second guessing what they want. You’re probably wrong.

Ideas happen much more quickly than the code. An idea might take a day to think through and a week or a month to implement. If you make a list of features, those features will be irrelevant by the time the programmers get around to building them. Stay in the moment. Keep the list to what your customers are asking for today.

4. Let your developers (and designers) do their job

Cheaper development teams are great when you have a known problem and a known solution. When you’re launching a new product, you have only one or neither. Communication is critical, but it’s also important to respect the software development process. Remember that writing software is complex, boring and error-prone. This means most of the time, developers have to get their heads down and work, work, work. I find 2 weeks a good length for a block of work because it gives a good mix of regular points for communication, keeping the product connected to reality, and time to get things done.

Be flexible enough to create an environment where your team can get on with delivering a great product, and you can get on with everything else.

Bonus point: If you engage a designer, let them design. You have to be proud of the way your application or site looks, but you are not the customer, the customer is. Experienced designers are good at making things that work well and look good for the larger audience. Try not to make personal judgement, like “I don’t like that grey.” Your customers probably won’t care as long as the site or service looks professional and is easy to use. If you have A/B test data to show that the grey has lower conversion rates, that’s different (Google actually tests different shades of blue).

5. Don’t micromanage

This is kind of the same as the previous point, but it’s vitally important for software. If you’re launching a product and you start involving yourself in the minutiae of development, you will slow the process down hugely. Development takes focus, handling a micromanaging client takes time and destroys that focus, leading to mistakes and delays.

Let small decisions go. Revisit them if they really are wrong. You don’t know if your product will be a hit yet. You need to get something out there as quickly and as cheaply as possible. If it later turns out you need to change the core product (which you almost certainly will), all the time spent fine tuning before launch is time wasted.

6. Software is only part of the puzzle

Finding money, figuring out a product and having it built is not easy, but it is well understood and, given a sensible spec, can be acheived in a finite amount of time.

Most new products don’t fail because the implementation was bad, they fail because the customers never came. Make sure the customer is there before you start. Make sure you know where they will come from, why they will use your product. Make sure you know this inside out. It’s very possible that your idea, though neat, isn’t a workable business. You really want to learn that before investing time and effort in building a product. The New Business Road Test is an excellent book on evaluating opportunities.

7. Rewriting is common

Once a piece of software has been around the block once or twice, the task it’s being used for ends up pretty far from the one is was intended for. Maintaining it becomes increasingly difficult. At this point, it’s time to consider a rewrite. This is a good thing. New tools and practices will have emerged that you can take advantage of. You know more about the business you’re in and you can cut out the features you don’t need and concentrate on making the ones you do much better. Rewriting is good. Plan to rewrite. Get stuff done simply and quickly at first, revisit it later when you know more.

This is a controversial point. Trying to change software is like trying to change a car into a boat, it can be done with time and effort, but building a boat from scratch will be easier and the final boat will keep water out better, go faster, look nicer and generally be more fit for purpose. Data from NASA cited in Facts and Fallacies of Software Engineering suggest that if as little as 30% of an application needs to be changed, it is less time-consuming and expensive to start over.

And finally…

There are many great books about software engineering, but I can personally recommend none more than Robert L Glass’s Facts and Fallacies of Software Engineering. It is a mine of empirical data about what makes software projects succeed and fail. If your company’s business is making software (which it is if you’re a web business), you need to read this book.

There are also a number of great websites about startups, few are finer than Eric Ries’ Startup Lessons Learned. Read it all. Twice. Carefully.

The New Business Road Test is a great tool for stopping yourself from diving in to an ill-advised venture. Reading it may save you a lot of time and money and prevent hair-loss.

Django, Drupal, Webmachine: Different frameworks for different projects

Django is an awesome framework but different projects have different needs. The last 2 projects I’ve been involved with have been using Drupal. Other projects I’m planning call for very RESTful designs. Webmachine, an Erlang framework, is a great fit for these.

I do still very much love Python and Django, perhaps even more that I’m using PHP day to day. I miss the REPL. I miss first class functions. I miss Django’s very tidy organisation of code.

Drupal is something I’ve studiously avoided for a long time, thinking it to be a Zope-like mire. That’s true to an extent: there are many versions and a lot of code. Drupal apps do have good separation of concerns. The internal organisation of modules and themes is useful, although there’s a little bit too much function name magic going on (“Why isn’t my validator firing? Who knows!”). I’m interested in hooks, although I haven’t needed them yet. The same concept has served Django hackers well.

Leaky abstractions

Django and Drupal are both leaky abstractions. It’s easy to create great big joins with Django’s ORM. Drupal generates a mammoth set of CSS and JS imports per page. Both of these can be addressed with programmer discipline, but sometimes it’s nice to have a thinner level of abstraction to make you think carefully about each requirement and how best to implement it. Webmachine is such an abstraction. What it does provide, however, is system management built on top of Erlang and OTP. People doing scale (a group I’m not a member of) find that writing the initial app is easy, scaling it is hard.

Easy hacking, easy scaling

Newer frameworks seek to make initial implementation easy and scaling easy too. Webmachine is undoubtedly somewhat harder to code for than Django, there are less batteries included, but, in theory at least, scaling is easier. Only a little of that is due to implementation, most is due to architectural style. Webmachine does get in your way a bit less when hacking RESTfully. Django does things like setting cookies on every hit, making caching harder, increasing load on your app.

I think the current leader for easy hacking, easy scaling is Google App Engine. The core of Django runs happily, the system scales to Google’s infrastructure, deployment is very simple. GAE has one flaw though, porting to another platform involves work modifying code and extracting data. While the code scales effortlessly, scaling a business around that code seems harder. If your goals don’t chime with Google’s, you’re stuck. Frameworks running on open source software stacks are more trustable and it’s for this reason that EC2 is so much more popular than GAE to date.

Python syntax highlighting with Chili

I’ve enabled syntax highlighting on this site using the very tidy Chili.

The standard distribution of Chili doesn’t include a Python language definition (or Erlang, or Haskell…), so I wrote one.

Download Python recipe for Python

Use SSH public key authentication with Fabric

Fabric is a very useful Python tool for scripting administration of remote servers. Like Capistrano it allows you to define tasks as a mixture of local and remote operations and then run them for lots of hosts, different groups of hosts, etc.

Increasingly I’m using configuring sshd to allow public key authentication only. Using this method makes your server more secure against increasingly common SSH brute force attacks. You can also configure an ssh-agent app to allow password-less logins.

If you want your Fabric tasks to access machines using public key authentication, add something like to your Fabfile:

from paramiko import RSAKey

config.fab_user = "jhacker"
config.fab_pkey = RSAKey.from_private_key_file("/path/to/keyfile")

Simple, and very useful.