Experiments, Interactive, MIT, Portfolio

Twitter Story Extractor

05.17.09 | Comment?

I haven’t posted anything recently because I’ve been finishing up a number of projects out here at MIT. I wanted to share this one with everybody, it’s my final project for my New Media Storytelling class at the Media Lab. This is a Twitter visualizer built using Quartz Composer that takes multiple users and plots their tweets in 3D space with vertical spacing that depends on the relative times of posting. So far, it can be used to help track public conversations between users, but I imagine there are other ways that it might be interesting. I’m calling it a story extractor.


As an aspiring science writer/journalist, I am interested in using technology to help me find and understand stories. Twitter is one such tool and though I was reluctant at first, I have found increasing value in it as I have explored its many varied uses and applications. One thing that strikes me, however, is how messy the experience can be. Once I started following more than about twenty people, the volume of information was overwhelming. There are a number of Twitter clients that help to manage information overload by allowing you to filter the stream in different ways. These are helpful, but I suspect are just a first pass at whittling the Twitterverse into something more comprehensible.

One problem that I consistently encounter is following the conversations that occur between different Twitter users. Twitter has become like an open public chat-room through the use of @replies, public messages directed at a particular user. Many times, a brief burst of conversation will erupt with two or more users tossing an idea back and forth directly to one another while in-between posting other thoughts, links, etc. that may or may not be related. Conversation thus gets tangled up with all sorts of other information and can be difficult to extract into a more intuitive form using existing tools. I thought that this might be an interesting problem to try and tackle using data visualization software.

I imposed a couple of constraints on myself. One, if I could at all help it, I wanted to build something that would not require a person to actually log in to Twitter. If Twitter is a public forum of communication, I’d like to see what I can do with just the data and access that is publicly available. Two, I wanted the user to only have to enter the Twitter usernames of the people they wished to compare and the program would do the rest.

Reference Points and Inspiration:

Social Collider

This is a web-app that tries to visualize a person’s twitter footprint and their interactions with other users as particle trails, like one would see out of a particle accelerator collisions. Does not require Twitter login.


This is the first Twitter client that I have seen that has built-in support for conversation tracking. Given a particular @reply tweet, it tells you all the previous tweets that were part of that conversation. It’s almost exactly what I proposed to do in my initial proposal. Tweetie for Mac (as opposed for the iPhone) launched almost immediately after I proposed my project and caused me to think about how I might do something different. Requires Twitter login.


This is an example Quartz Composition that comes with the standard QC download files. I used it to learn how to get XML data into a Quartz Composition. Requires Twitter login.

What I Did:

Quartz Composer has a number of components that can be joined together via a visual “node and cable” type interface. Aside from a brief “Hello World”, I had never used QC before this, though I had been eager for a project that would allow me to learn it. After proposing the project, I did a number of scratch pieces to familiarize myself with how the various available components function. First, I learned how to get XML and RSS data into QC using the included downloader patches. From there I learned how QC does data structures and iterators and created an initial proof of concept piece that could take a Twitter users’s RSS feed and visualize it, offsetting Tweets based on an arbitrary condition present in the content of the tweet (Fig. 1).

Fig. 1
Fig. 1 Offsetting Tweets based on whether or not they contain an odd or even number of words. If the tweet has an @username in it, the username appears in green to the right of it.

I then parsed each tweet, looking for the presence of “@”. If @ was present, the program would extract the username and render it in a separate sprite.

Convinced that I could both import and satisfactorily play with RSS data in QC, I set out to design an appealing 3D structure. The included 3D navigation and rendering patches were insufficient, so I built my own. The first step was to make a stacks of cubes, then position them at arbitrary points around a circle. (Fig. 2)

Fig. 2
Fig. 2 An arbitrary number of cube stacks of arbitrary height, each with a different color.

With both the tweet-processing and 3D structure working, I set out to integrate them. (Fig. 3)

Fig. 3
Fig. 3 Colored cubes with twitter data on them.

In order to extract the kind of conversation information I wanted, I had to find a way to plot all the tweets of all the users on the same time-grid. To do this, I took the time of the most recent tweet out of all the users and subtracted the time of the oldest tweet of all the users. I could take the time of each tweet, subtract the oldest time bias, and this would give me normalized times relative to the overall time frame.

I found the predictable, yet still interesting, result that the frequency with which people tweet varies widely and that comparing a rapid twitterer with a less active one gives timelines that are difficult to compare. I tried to solve this problem by including a variable “stretch” control that would smear out the timeline in places where it was too clumpy, where too many tweets overlapped with each other to be able to see what was going on. This was only partially effective as many people seem to tweet in rapid bursts of two or three, followed by some interval. (Fig. 4)

Fig. 4
Fig. 4 The user in red is a much more frequent twitterer than the user in blue, making comparison of individual tweets on the same time grid difficult.

Despite the difficulties with relative time-frames, I was able to use this tool to find an interesting interchange across five tweets, along with nonreply information that contributed to my understanding of what was going on. (Fig. 5)

Fig. 5
Fig. 5 An interchange between two science writers, read from bottom to top.

I was thus able to solve the problem I set out to tackle, in at least some cases. With further refinement of navigation controls, and a better way to vertically scale the time relative positions of tweets (perhaps using some collision avoidance in the sprites) this might be a more generally useful tool.


Here are screen caps with brief description of the three main patches in my Quartz Composition.

Fig. 6
Fig. 6 The top-level patch takes input from the mouse and keyboard and passes it to an iterator (QC’s version of a “for” loop) called “Stacks”, the pink patch surrounded by blue.

Fig. 7
Fig. 7 The “Stacks” iterator takes a Twitter username, pings twitter.com/username, pulls the RSS feed address from that page, and passes that to an RSS downloader patch. It passes the feed to another iterator called “cube stacks”, the pink patch on the far left. This level also has a macro that computes the position of each cube stack in the XZ plane, given mouse and keyboard data from the previous layer. The global timeline is also computed at this level.

Fig. 7
Fig. 8 The “cube stacks” iterator takes an individual RSS feed and creates up to twenty 3D cubes each with a different tweet on its front face. This is where the vertical position computations are carried out.


From the top level patch, input the number of users you wish to compare.

In the “Stacks” iterator, input a group of up to nine usernames in the blue boxes.
From the viewer:

Right-click to turn autorotate on or off, when auto-rotate is off, shift-scroll rotates

Regular scroll zooms in or out.

Click and drag to move vertically, wipe to move faster or slower (careful, this is buggy)

Right arrow increases radius, hold left arrow and use right arrow to decrease radius

Up arrow spreads the tweets in a vertical direction. Use this to see the relative spacing between tweets. Down arrow plus up arrow shrinks it back down.

“t” toggles between time-relative and constant interval vertical spacing.

“n” toggle nonreplies on or off

Future Exploration:

I think with some refinement of the user interface and navigation controls, this tool could be an interesting way to aggregate and sort Twitter data, providing one way to make a little more narrative sense of what is going on. One immediate idea is to use this for fiction, an author might write a story using multiple twitter accounts, paying attention to the time of tweeting—which will appear as vertical space—as a way to add dramatic timing.

With more work on tweet-processing, this could be used to make a large web of interacting Twitterers, each new stack spawning the stacks with which it has recently been in conversation. This would get messy quickly, so some new navigation idea might be necessary.

Twitter Search Extension

Out of curiosity, I ran the RSS feeds of different search terms from Twitter Search through the program. This provides a quick visual way to compare the relative frequencies with which people mention the various terms. One interpretation might be that the terms tweeted more frequently are more at the forefront of the Twitter collective consciousness. (Fig. 9)

Blue: British History
Green: Particle Physics
Red: Britney Spears
Yellow: Dinosaurs
Orange: Economy
Fig. 9
Fig. 9 The more frequent a term appears in the Twitterverse, the more tightly clustered the “tweetboxes” become.

If you would like a copy of the .qtz file to play around with, please email me and I would be happy to share. sequence AT mit DOT edu

Tags: , , ,

have your say

Add your comment below, or trackback from your own site. Subscribe to these comments.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>