Tuesday, March 27, 2012

Geographic Visualization of my Twitter Graph

(Source at https://github.com/alpengeist/Twitter-Crawler)

I am experimenting with the Neo4j graph database and as an exercise I loaded the friend graph from Twitter, starting with me (@alpengeist_de), in depth 4 and width 40 into the DB. I then enhanced the data with the geo location information from Yahoo Places. This process takes a while, because the Twitter REST API limits to 150 requests per hour for non-registered applications. Behind a corporate proxy it is even worse, because the tracking is on the IP address and my colleagues consume the quota as well.

My Java program caches the friend data in a simple CSV file, so I can start it anytime and get another bunch of  users from Twitter. The geodata is cached in a properties file (the simplest K/V store there is :-)
I can quickly generate the Neo4j graph database with data from disk.

I have collected about 30000 users so far. About 18000 of them have the full data set. For visualization I started with the open source tool Gephi, whose Geo Layouter I have used to produce the images.

I did a node ranking by follower count to fatten the dots a little, and a color partitioning by country. Yahoo Places returns a quality measure, so I filtered on that as well. Et voilĂ , the continents emerge.

Not everyone in Twitter has entered a real location. Many say they are in "the internet", which is located in Brazil according to Yahoo. It seems like the Internet is enjoying itself in a pleasant climate and dancing Samba! Another good one is @artcika, who claims to be "in the middle of the map". Yahoo locates that in Papua Neuginea.
Those errors are not easy to filter out.


Twitter Graph layouted with Gephi Geo Layouter (Mercator projection), partitioned by country
The little spots outside the continents are mostly due to nonsense location data from Twitter where Yahoo Places was still confident to have delivered something useful. However, there are actually users in Alaska, Hawaii, and Iceland :-)

I have plans to get familiar with the D3 JavaScript library for visualization, but that will take some effort.

Finally, here is a version with the connections switched on. There is not much information one can draw from this. I just enjoy the emerging color patterns.




1 comment: