Ben Godfrey

Building a Twitter client

I’m a keen twitterer. When I read my tweets I see find that certain voices shout louder than others, where volume = tweet frequency. Those voices aren’t necessarily the ones I care about. I want to know what’s going on with my more restrained friends too.

I designed Followize to solve this problem. Like Twitter100, it shows the latest tweet from each friend. The UI is more efficient than Twitter100’s and I have some enhancements planned that I hope will make Followize a very quick and convenient way to keep up with the people you’re following.

Followize uses the Twitter API’s friends method. Until yesterday, the documentation for that method said it would return “up to 100 of the authenticating user’s friends who have most recently updated.” I.e. that the sort order is the created_at time of each friend’s latest status update. Subsequent pages of less-recently-updating friends can be requested as well. Followize is just a nice UI for this data built on Google App Engine.

However, after building the app and using it for a little while, I noticed that the data was not sorted in this way at all. I raised this as an API issue. One of Twitter’s engineers responded that this was a documentation error, rather than a software error, and updated the docs. The correct order is (effectively) the date the user began following a given person. Unfortunately this all but kills my application.

If Twitter is sending the data in the wrong order for my app, I have to load all the data and sort it myself. The first person I followed might be the one who has most recently updated and thus the last record in the results of the friends method call. Pulling a page of 100 friends from Twitter to App Engine takes around 0.8 seconds, decoding the JSON then takes another 0.15 seconds. Good old Scobleizer follows 21K people, Obama follows 171K! Loading all the required data for Scoble would take 3.3 minutes, plus some time for sorting, committing to cache etc. Twitter rate limits API requests to 100 per 60 minute period. Loading those 21K friends requires 210 API requests, and that’s only for one page. Scoble is likely to reload the page a few minutes later and the whole thing begins again.

I’m looking at using Gnip as a workaround, but this is sub-optimal. A rough strategy would be as follows:

  1. A user logs in to Followize for the first time.
  2. A background process loads the complete list of their friends from Twitter’s API.
  3. Followize adds those friends to a Gnip filter of Twitter users followed by Followize users.
  4. Gnip POSTs updates for each user to a Followize API endpoint.
  5. Followize stores users being followed and their latest update in it’s DB.
  6. When the user requests the page, tweets are loaded from the DB.

This drawbacks to this approach are:

  • Step 2 could still fall fowl of Twitter’s API rate limit, necessitating a 1 hour wait.
  • The application load doesn’t scale with traffic. Scoble could sign up, I’ll start getting a tonne of tweets coming in from Gnip, but Scoble may never visit Followize again, rendering that traffic useless. I can pull data up to 60 minutes old from Gnip, so I could minimize the processing overhead by pulling tweets every 60 seconds for example.
  • All of these API calls would be too long-running for Google App Engine.
  • The application complexity is dramatically increased and it is now reliant on an additional remote service.

I’d like Twitter to order the data for me, but Twitter’s API as it stands can’t be modified to do all the heavy lifting for every application. Gnip has an interesting model in that they allow you to offload some work, filtering of data, to them. A model in which I could write my own view of Twitter’s data and upload that to be run locally to their DB would be a great solution. Given the wide range of apps using Twitter’s API, I’m hopeful.

Update, Jan 9: HubSpot’s State of the Twittersphere says that only 12% of Twitter users are following more than 100 people. I’d suggest those are not likely the people who will find Followize useful though. In addition, most Twitter users are new to the service and following lists grow with time.

Update, Jan 17: After initial setbacks, I found simply pulling several pages from the Twitter API and caching them with staggered timeouts provides a good enough user experience. Followize lives!

Comments

Martin Kleppmann's avatar

Martin Kleppmann

If you get visited by a user who follows thousands of people, they still won't want to see the most recent update from all of them; a random selection (not even necessarily the ones with the most recent updates) would probably do. In fact, you could request the next page of their friends from the API only if they click on the next page of friends in Followize. Given the massive amount of data rushing through Twitter, implementing a new sorting order in the API will probably take them quite a bit of engineering effort; at that scale, I reckon that even simple features need to be reflected in the architecture (e.g. they need a cluster which does nothing but receive the 'firehose' XMPP stream and update sorting indexes)...
Cancel

Comments are closed for this post.

electromute's avatar

electromute

Hey Ben, just a quick note here, if you haven't already be sure to hop in our google groups list: http://groups.google.com/group/gnip-community?hl=... We are in the process of finalizing our schema and if you have any thoughts in light of what you are working on, that would be great. Going to digest your post a bit as well :-)
Cancel

Comments are closed for this post.

Ben Godfrey http://aftnn.org

Thanks for your thoughts. Disclaimer: I built Followize to work for me, so that's my judgement call. Seeing the latest update from a random set of users would be interesting, perhaps better than the latest. There are some people who update so little, they're still hard to see even in a one-person one-vote system. Unfortunately, this isn't the ordering Twitter provide. They order on friendship create date, so I'll always see updates from the 100 most recently followed people. That's interesting, but not the reason I build Followize. Not sure about sorting at Twitter scale, I don't think their scale is actually that big. If Amazon can do it, and Google can, Twitter should be able to sa well :-). Fair enough that it may not be a trivial undertaking.
Cancel

Comments are closed for this post.

Ben Godfrey http://aftnn.org

I think the Gnip schema looks good for what it does. I haven't really played around with it in depth, so others will have more useful comments. Would love to see JSON though :-).
Cancel

Comments are closed for this post.

electromute's avatar

electromute

Hey Ben, nice blog post. Looks like you've really thought through all of the issues. Here's my take (also, we would also like to see JSON as well, hopefully soon). One thing to consider is that you won't be getting tweets individually from Gnip. We send you the filter data buckets in batches, so it generally won't be a one-to-one correlation. Step 2, loading the initial Twitter data from the API will probably have to be mitigated. Do they let you batch the requests, though? Seems like the issue is making a certain number of requests per minute is the problem. Also, I would not assume that folks want to put everyone on this list, as someone who lives by and loves Twitter, I would like to be able to focus in on a subset of users at a time, I think to Martin's point above. It's easy to assume everyone wants a million features, but really what will they actually use and what is of the most value to your audience? (sounds like your audience is the power user.) You said: "The application load doesn't scale with traffic. Scoble could sign up, I'll start getting a tonne of tweets coming in from Gnip, but Scoble may never visit Followize again, rendering that traffic useless. I can pull data up to 60 minutes old from Gnip, so I could minimize the processing overhead by pulling tweets every 60 seconds for example." True, he may never come back but don't forget this is not going to be a problem to the degree you are thinking because not all of the people he follows are going to be tweeting, and you'll get a batch of their tweets when we ping you, which is different than you having to ping us once a minute for content that may or may not be useful. And to your last bullet point, there is some added complexity but that's only because you are doing some "way paving". Paving the way for new mashups, new ways of doing things can be tough but so worth it! I'd say give it a try, see what works, and have fun with it!
Cancel

Comments are closed for this post.

Ben Godfrey http://aftnn.org

Thanks for your comment, I definitely think this is the Right Thing to do as far as Followize goes.

In the short term I wanted to to get the UI out there and get some feedback on whether people find it useful or not. Building a solution on top of the raw Twitter API is not fast, but it was quite easy and it has given me a working prototype.

If Followize is useful to other people, then I will look into moving the backend over to a VPS service and start using Gnip to collect tweets.
Cancel

Comments are closed for this post.

electromute's avatar

electromute

I think that is a very good strategy. Good luck with the initial prototype. It's likely that we'll also have a bit more features once you are ready to move forward.
Cancel

Comments are closed for this post.

Steve Motley http://www.twitter.com/stevemotley

I’m new to twitter and just recently I have been unable to follow any new people. When I click on follow, the little whit hand comes up and then there is this little box with two spinning arrows. Nothing happens. Could someone please advise me on this and what to do?

Thanks

Cancel

Comments are closed for this post.

Add a new comment

Comments are closed for this post.