Category Archives: Challenge

Gost Counties visualization wins DataViz challenge

Just a short note that I am honored, thrilled and excited to announce that my Ghost Counties data visualization has won the Visualizing.org ‘Visualizing the 2010 US Census data visualization challenge‘. A great thank you to the jury from Visualizing.org and Eyeo Festival and all the kind messages I received!

Eyeo Data Visualization Challenge: Ghost Counties

About a day before the deadline I have submitted my entry for the Eyeo Data Visualization Challenge by Visualizing.org where the grand prize is a ticket to the brilliant Eyeo Festival.

You can see the final result here: http://www.janwillemtulp.com/eyeo.

The visualization depicts the number of homes and vacant homes for all the counties for each state. The size of the outer bubble represents the total number of homes, the size of the inner bubble represents the number of vacant homes. The y-axis shows the population size (on a logarithmic scale) and the x-axis of the bubbels shows the number of vacant homes per population. Each bubble is also connected with a line to another axis: the population / home ratio. On the top right you can see some exact numbers for this data.

This time I built the visualization in Processing, mainly because I expected to work with large datasets from the US Census Bureau and I might had to use some OpenGL for better performance. Eventually I didn’t use OpenGL. Building the visualization in Processing was lots of fun. To get sense of the data I tried as many as 5 completely different approaches. Here are some of the sketches that eventually led to this visualization (view this selection on my Flickr stream).

The data itself was not very complex, but rather big, and the biggest challenge was to find a creative approach to visualize this data, but without using a map (which would be rather obvious since it’s about locations).

Visualizing the World Economic Forum Global Agenda Interlinkage

The World Economic Forum (WEF) and Visualizing.org have recently issued a Data Visualization contest in which interactive designers were asked to develop cutting-edge visualizations that will help elucidate the interconnectedness among issues, highlight emerging clusters and catalyze dialogue at the Summit between Councils. The data for this contest was derived from a survey of experts of the 72 Global Agenda Councils of the WEF, and they were asked the following three questions:

  • “Please select a maximum of 5 Global Agenda Councils that your Council would benefit from interacting with by order of priority”
  • “Please select a maximum of 3 Industry / Regional Agenda Councils that your Council would benefit from interacting with by order of priority”
  • “Please describe how it interlinks with your Council”

The data

The data was an Excel workbook with 3 sheets (or 3 CSV files) that contained the survey data:

  • A matrix with pre-calculated weighted links between Councils
  • A flat list of all the survey data
  • All the survey data, but in a different structure (this time by respondent Council)

The World Economic Forum Councils

The WEF consists of 3 Agenda’s:

  • Global Agenda (divided into 3 subgroups: Drivers and Trends, Risks and Opportunities, Policy and Institutional Responses)
  • Industry Agenda
  • Regional Agenda

The Global Agenda has 72 Councils, the Regional Agenda 10 Councils, and the Industry Agenda 14 Councils. All of the Councils are concerned with a specific issue (e.g. human rights or ocean governance). Each Council has 1 or more organizations of various types (government, NGO, business, etc.) and each organization may be located in a different country.

Visualizing the data

Since the purpose of the visualization was to find clusters, and show the interconnectedness, the most obvious visualization I started out with was a network or graph visualization. I started out with a network or graph visualization. I used the Force-Directed layout of Protovis to create a network of all the links of all the Councils. I also used a K-Means clustering algorithm and a Community Detection Algorithm to find clusters, but the graph was too dense to find any sensible clusters. It appeared that almost every Council links to every other Council. So even though the visualization looked impressively complex, you could not get any valuable information from it. So I stopped pursuing this direction.

My next approach was to try if a radial layout would work. I was inspired by many of the visualizations on www.visualcomplexity.com and Circos. First I started out with just a radial layout of all the Councils, and then played with the visual encoding some of the dimensions, like line thickness and color. For the image below, I filtered the data only to rank 1 and Global Agenda. This resulted in less data which is easier to work with when prototyping.

This was a good start, and proved enough potential for me to continue working on this. On of the biggest flaws of the image above is that you don’t see who interlinks with who (who is the respondent Council and who is the linked Council).  So next I decided to create two half circles instead of one: one for respondent Councils and one for linked Councils. This appeared to be a good choice. I also worked on a better color palette, and more encoding of the data (for instance, width of the bar shows the number of links). This is what I ended up with:

Then I added more refinements, like adding a height to each bar for stronger links, adding a filter option for the combination of rank and Agenda, and also the ability to view Council links in isolation.  I changed the color to blue and orange instead of green and orange, because of the colorblind people. I also kept ‘data-ink ratio’ by Edward Tufte in mind: remove as much (visual) clutter as possible. Martin Wattenberg once said: “if you start playing with your visualization, you know you’re on the right direction”. And that’s exactly what happened when I added the ability to view Council links in isolation. The final result looks like this:

Challenges

One of my biggest challenges was that I didn’t understand the matrix in the data set; I couldn’t understand the logic behind it. And then when I finally thought I realized that not all data was shown, but just the strongest links so that apparently uninteresting data was omitted, I came to realize that the data in the matrix was not normalized. So, a link between Council A and B of 0.5 in the matrix was not the same as a link between Council C and D of 0.5. And because I didn’t understand the logic behind the values in the matrix, I decided not to use the values in the matrix, but do my own calculations on link strength.

The result

I’m very satisfied with the result, and at the same time I see room for improvements. I like the fact that the visualization communicates mainly visually: line thickness, line color, bar width and bar height are the main visual elements that make it easy to spot interesting links or Councils. Also, I haven’t seen circular layouts like this in Protovis yet, so it was also fun to try something new like this.

A suggestion I received from Mike Bostock was to make the selection of the Councils fuzzier. Right now the bars of the Councils can become very thin or small, and selecting them may be somewhat difficult. By using a more fuzzy selection the user experience may improve.

I have considered adding a filter for link strength as way to reduce the number of links shown. But so far I’m not yet convinced that this will reveal more clusters.

The visualization omits some data that may be of interest: for instance, organization type is currently ignored, as well as country. It may be interesting to see if finding clusters would be easier if organization type or country (or a combination) would be used instead of a respondent Council.  Also, there is currently no back-link from the linked Council to the respondent Councils. So, you cannot see if the linked Council wants to interact with the Councils that are linking to this Council.

Finally, I think that in order to find clusters the survey should not have this many options for it respondents. I would suggest just 2 ranks for Global Agenda, and 1 rank for Industry / Regional Agenda. It appears that giving Councils (or better yet, the organizations of each Council) this many options to link to other Councils, results in a situation that at some level, almost every Council links to every other Council. Using fewer ranks to choose from will probably reveal a more polarized choice, and will make it easier to find clusters.

Technology

I have used custom Scala code for pre-processing the data, and Protovis for visualizing the data. Both technologies are very interesting, and highly recommended!