Tag Archives: Visualization

Spatial Debris visualization in Scientific American

Some time ago Scientific American approached me to commission me for the creation of a visualization of spatial debris. And there is a lot of spatial debris floating around the earth. Think of abandoned rockets or broken satellites. The purpose of the visualization I created was to illustratie that there is a huge amount of  spatial debris, and not necessarily to provide an exact representation of it. In fact, not all the required data in order to determine exact position of debris and satellites was available in the dataset I received: right ascension and argument of perigee were missing, so I have used random numbers for those. But all the other data to calculate the orbit was there. In order to calculate the orbits of satellites and debris I had to apply Kepler’s Laws of Planetary Motion.

The graphic has been created in Processing. With all the orbits calculated and the satellites and debris positioned randomly on those orbits, the next thing was to get the color and positioning right. Positioning was rather easy, since it’s just applying some transformations to the image, which resulted in a nice perspective (circles nearby are larger than the ones further away). Coloring was the final step. Initially the idea was to color by country (US, USSR, China and Others), but this resulted in an image with colored dots all over. So to communicate a more focussed message, we decided to show the difference between active satellites (magenta) and spatial debris (black). As a nice extra, the ISS space station is also marked to get an even better sense of the amount of debris.

After completing the project, I played around with the data a little more, just to see if an animated version would have an even greater impact communicating the message. Well, judge for yourself…

Ghost Counties part of exhibition in Foosaner Art Museum, Melbourne FL

I am honored that my Ghost Counties visualization will be part of an exhibition entitled The Art of Networks that will be on view between March 8-April 8, 2012 at the Foosaner Art Museum in Melbourne, Florida. The exhibit will open on March 8th as a parallel event to the 3rd Workshop on Complex Networks, CompleNet 2012, hosted by the Department of Computer Sciences at Florida Institute of Technology.

In total there will be 14 recent visualizations on networks representing data from different fields, from social networks and migrations to speech cognition and housing issues in the US. The visualizations will be presented both in static and dynamic media: large print formats on the walls and short movies in two computers inside the gallery space.

If you’re in the neighborhood, be sure to drop by!

Nerds Unite recap

Wednesday I had the honor to give a small presentation about data visualization at Nerds Unite in Utrecht (NL). It was a small group of people who are interested in data, open data, government and of course visualization. After a short introduction I showed some of my work, which was well received. After that Eugene Tjoa gave a talk about creating visualizations for the Central Bureau of Statistics (CBS) in The Netherlands.

It was a great evening with many interesting people. Thanks!

Photo by Sebastiaan Terburg

Visualizing Europe

Today was one of those days I won’t forget very soon: today was Visualizing Europe day, and many enthusiasts, practitioners, researchers and users of data visualization gathered in Brussels for an inspiring day of talks and meeting interesting and kind people from the data visualization community. The day was divided into 3 sessions:

  1. the power and potential of data visualization
  2. a vision for Europe
  3. where do we go from here?

The first sessions showed some of the best works currently created in data visualization: Santiago Ortiz from Bestiaro showed the power of the visual programming paradigm of Impure can be used to create sophisticated data visualizations in minutes (did I say minutes? seconds!)

Next, Moritz Stefaner showed two of his recent and impressive projects: the Better Life Index project that was recently launched by the OECD. And his previous famous project: Notabilia, which shows deletions on Wikipedia.

 

Enrico Bertini gave a fantastic talk from a research perspective and explained different approaches of making a data visualization for the public and for the tiny group of people who are actually solving real world problems with data visualization. A quote that was tweeted numerous times immediately: “data visualization is useless, it is indispensable”. He also highly recommends the book: “How Maps Work” (which of course is on my wishlist now!).

Last but not least of this first sessions was Dave McCandless from Information is Beautiful. Dave showed some of his work, and a remarkable quote was: “I disagree with Moritz, I’m not looking for 1000 stories, I’m looking for 1 story that’s interesting”.

After a short coffee break, session 2 started with Gregor Aisch showing how he creates data visualizations as the Open Spending project for the Open Knowledge Foundation. He proposed a new approach for data visualization, namely ‘open data visualization’, which is open source + open data + open to community. An fascinating idea I’d like to learn more about.

Assaf Biderman from the MIT Senseable City Lab impressed us with some of their cutting edge projects they do together with governments and cities, like the Trash Tag project which tracks and visualizes where trash is being transported all over the USA after people have emptied their trash bin. Another project that keeps impressing me is the Copenhagen Wheel, an augmentation to bicycles that allows bikers to track their own performance, and at the same time measures various air conditions of the city. This data is collected and visualized to understand more about the city’s air pollution.

Salvatore Iaconesi from Art is Open Source elaborated on how the artistic world uses data and visualization to change paradigms, for example in supermarkets: while in the supermarket data is visualized on your iPhone and shows the geographic origins of the chemical compounds of your products.

Last but not least, Peter Miller from ITO World had to rush through his slides where he showed some very compelling and sometimes fine-grained user contributions to the Open Streetmap project. It’s impressive to see how user contributions can lead to sometimes more correct maps than non-crowd-sourced maps.

The final session was a discussion between Franco Accordino and Jean-Claude Burgelman from the Europen Commission and Toby Green from the OECD. The main subject was: what did they take from today’s sessions, and what will they do with it. It was very good to see that the value of data visualization was recognized, and that the EU sees data visualization as one possible and valuable way to create new knowledge, which is very important.

Finally, the day was finished by meeting so many people from the data visualization community. It was amazing to meet so many people whom I’ve been in contact with for quite some time now. Thanks Visualizing.org for organizing this wonderful day, and everybody who has contributed. It was a memorable experience!

Eyeo Data Visualization Challenge: Ghost Counties

About a day before the deadline I have submitted my entry for the Eyeo Data Visualization Challenge by Visualizing.org where the grand prize is a ticket to the brilliant Eyeo Festival.

You can see the final result here: http://www.janwillemtulp.com/eyeo.

The visualization depicts the number of homes and vacant homes for all the counties for each state. The size of the outer bubble represents the total number of homes, the size of the inner bubble represents the number of vacant homes. The y-axis shows the population size (on a logarithmic scale) and the x-axis of the bubbels shows the number of vacant homes per population. Each bubble is also connected with a line to another axis: the population / home ratio. On the top right you can see some exact numbers for this data.

This time I built the visualization in Processing, mainly because I expected to work with large datasets from the US Census Bureau and I might had to use some OpenGL for better performance. Eventually I didn’t use OpenGL. Building the visualization in Processing was lots of fun. To get sense of the data I tried as many as 5 completely different approaches. Here are some of the sketches that eventually led to this visualization (view this selection on my Flickr stream).

The data itself was not very complex, but rather big, and the biggest challenge was to find a creative approach to visualize this data, but without using a map (which would be rather obvious since it’s about locations).

SEE#6 conference

This weekend I have visited SEE#6 in Wiesbaden, Germany. Although this was the sixth SEE conference, it was my first visit. And I must say, it was overwhelming! Here’s a brief overview of my experience.

Day 1 – conference

The conference was situated in the beautiful Lutherkirche, which was one of the most beautiful and atmostpheric conference locations I’ve ever been to. After a friendly welcome from Micheal Volkmer from the hosting organization Scholz & Volkmer, prof. dr. Harald Welzer. He gave an inspirational talk about sustainability, the main subject of the conference, and he gave us his view on how to fight climate change and change human behavior to improve a sustainable society. 

After the keynote Carlo Ratti of the MIT Senseable City Lab in Boston showed us some of his recent projects on how ubiquitous computing is entering our society more and more, and how sensors can help cities and citizens to be more aware of the environment, and improve sustainability. On this image you see a visualization by the Lab that shows real-time whether and rain situation in combination with taxi locations: are taxi’s on places where they are needed the most?

After a break, the young and talented Alexander Lehman took over. He showed us some of his animated infographics, and how he uses satire to tell a story. He elaborated on his most successful project: “du bist terrorist” where he uses satire in an attempt to make people more aware about the increasing danger of the government collecting all kinds of data about its citizens.

Brendan Dawes had a very humorous and inspirational talk about how he is using his creativity to do very inspiring projects, both for customers as personal research projects. He had many great examples of his projects and one of them was a homebuilt digital/wooden weather indicator: not because it really solves a problem, but just because it’s fun and cool:

Wesley Grubbs from Pitch Interactive showed us some of his great projects. Projects where he has visualized many rows of data that resulted in high resolution (and long rendering time) images. One of his most compelling examples was a visualization where he shows some insight in how the US Defense is spending its money.

After the beautiful images from Wesley, Joshua-Prince Rasmus from REX architects gave his presentation. Before the conference I didn’t really know how architecture would fit in a conference about information visualization, but it turned out that this was one of the talks that impressed me most. At REX they’ve defined a very strong process of doing work for clients. Joshua talked us through it. He showed the beautiful images he uses in presentations to show his ideas, and at REX they’re brilliant in working within constraints, creating flexible buildings and, perhaps most importantly, really understand their clients, so that they eventually build something clients really understand and agree with. Great talk!

Final talk was by the talented Justin Manor from Sosolimited. Justin inspired us by showing a few of his great installations. One that he gave special attention was real-time analysis of political debates, where a very large number of different approaches were shown how to interpret, categorize and analyze the words and sentences politicians are saying. His visualization is built in Processing. This concluded the first day.

Day 2 – discussion and workshop

The second day was an extra day for data visualization die hards, to have some good discussions about data visualization. The discussion was led by 3 prominent people from the data visualization community: Moritz Stefaner, Andrew vandeMoere and Benjamin Wiederkehr. The day was basically split into 2 halves where the first part was a discussion, and the second part some of the presenters of the conference gave some insight in how they do their work.

The discussion took of immediately after the main topics had been presented:

  • how to engage people through visualization?
  • how to use visualization to change people?
  • “facts are useless, stories are everything!” (quote by prof. dr. Harald Welzer made during the keynote on the first day)
  • what is the impact of data.gov shutting down?
  • how can design critique push the community forward?

Although all of these topics were a used as a starting point for the discussion, the main topic of the discussion evolved around questions like:

  • as a (visualization) designer, should you help the user make his conclusion?
  • is linear storytelling, like the satire movies from Alexander Lehman, the right way of passing information to the user?
  • can you really be objective if you visualize data?
  • is showing war casualties, for example like the one from Stamen design, a good way to really engage people?

The discussion was really interesting, and people had some very good points. But at the same time I found it somewhat hard to really take a stand in this, because I really think it depends. Anyway, after a short break, some of the presenters from day 1 gave some insight into their work process, things they run into, etc.

  • Wesley Grubbs showed us some more details on how he approaches visualization work for both clients and as personal research projects, like handling large amounts of data.
  • Alexander Lehman showed us how he uses 3DS Max to create his infographic movies, and how the overkill on functionality of 3DS Max actually helps him to be very productive.
  • Moritz Stefaner gave a very short but great introduction to Protovis, and how you could use Google Docs cleverly to use it as a real-time data source for your visualization
  • Justin Manor showed some more of his great installations with water, pneumatics, LED, Processing, Arduino, etc., and various ways of how they were made.

That wraps it up.

The conference was a blast! It was so inspiring, a great location, fantastic speakers, and very good talks! I am looking forward to SEE#7 conference.

If you want to see the full talks of SEE#6, go to http://www.see-conference.org/video-stream/. On the conference website you can also see video registrations of previous talks, which are highly recommended to watch!

Day 2 at the O’Reilly Strata Conference

 

After a day of tutorials, the second day at Strata was the first of two conference days, packed with fascinating sessions. The day was kicked of with a plenary session with a long list of top-speakers in field of data science: Edd Dumbill of O’Reilly Media, Alistair Croll of Bitcurrent, Hilary Mason of bit.ly, James Powell of Thomson Reuters, Mark Madsen of Third Nature, Werner Vogels of Amazon.com, Zane Adam of Microsoft Corp, Abhishek Mehta of Tresata, Mike Olson of Cloudera, Rod Smith of IBM Emerging Internet Technologies and last but not least Anthony Goldbloom of Kaggle. Various topics were presented in presentations of 10 minutes each, like data without limits, data marketplace, and the mythology of big data. The shortest presentation struck me most: “the $3 Million Heritage Health Prize” presented by Anthony Goldbloom: people are challenged to create a predictive application that uses healthcare data to predict which people are most likely to go to hospital, so that ‘US healthcare becomes healthcare instead of sickcare’. The prize is $3 Million for the one who solves this!

Next up were the individual sessions, and I was very much looking forward to the talk “Telling Greate Data Stories Online” Jock MacKinlay of Tableau. And though the talk itself was excellent, for me it was all known stuff, but the talk is highly recommended for those unfamiliar with Visual Analytics or Tableau. Being biased towards visualization related sessions, my next session was “Desinging for Infinity” by Dustin Kirk of Neustar. Dustin showed 8 Design Patterns of User Interface Design, like infinite scrolling, which were really good. It reminded me of the updated version of the material in Steve Krugg’s book Don’t Make Me Think.

Next up was the best talk of the day: “Small is the New Big: Lessons in Visual Economy”. Kim Rees of Periscopic showed us very good examples of effective information visualizations. I was really blown away by this presentation, mostly because she really showed how creatively removing clutter and distractions can make the visualization very effective. Also the creative interactions that help the user using the visualization were compelling. Next was Philip Kromer of Infochimps on “Big Data, Lean Startup: Data Science on a Shoestring”. Though my expectations were that Philip was going to explain the Lean Startup principles, evangelized by Eric Ries, the talk was more about Infochimps approach to doing business. Some remarkable comments by Philip: “everything we do is for the purpose of programmer joy”, and “Java has many many virtues, but joy is not one of them”. Great presentation and inspiring insights!

My next sessions was “Visualizing Shared, Distributed Data” by Roman Stanek (GoodData), Pete Warden (OpenHeatMap) and Alon Halevy (Google). After short presentations of each, these three guys had a panel discussion where the audience could as questions. Their discussion evolved mostly around the fact that all three deal with data that is created and uploaded by a user, and how do you deal with that: do you clean it, what’s the balance between complex query functionality and ease of use, etc. My final session was “Wolfram Alpha: Answering Questions with the World’s Factual Data” by Joshua Martell. Half the talk was a demonstration of the features of WolframAlpha, and the other half was more or less a high level talk about how WolframAlpha handles user input, how data is stored, how user analytics is performed, and more.

The day ended with a Science Fair where students, researchers and companies were showing new advancements in the field of data science. There were really interesting showcases, like a simulation tool for system dynamics. But again biased towards visualization, the one that struck me most was Impure by Bestiaro. Impure is a visual programming language that allows users to easily create their own visualization, both simple and very advanced. It was also great to see the passion of Bestiario for their own product.

Finally one of the best things of the conference so far has been meeting people, some of which I only know virtually for some time now. I especially enjoyed meeting all the visualization people today. It’s really great to meet many of the online visualization community in person.

So again, a fantastic day at Strata, and I am looking forward to tomorrow!

Visualizing the World Economic Forum Global Agenda Interlinkage

The World Economic Forum (WEF) and Visualizing.org have recently issued a Data Visualization contest in which interactive designers were asked to develop cutting-edge visualizations that will help elucidate the interconnectedness among issues, highlight emerging clusters and catalyze dialogue at the Summit between Councils. The data for this contest was derived from a survey of experts of the 72 Global Agenda Councils of the WEF, and they were asked the following three questions:

  • “Please select a maximum of 5 Global Agenda Councils that your Council would benefit from interacting with by order of priority”
  • “Please select a maximum of 3 Industry / Regional Agenda Councils that your Council would benefit from interacting with by order of priority”
  • “Please describe how it interlinks with your Council”

The data

The data was an Excel workbook with 3 sheets (or 3 CSV files) that contained the survey data:

  • A matrix with pre-calculated weighted links between Councils
  • A flat list of all the survey data
  • All the survey data, but in a different structure (this time by respondent Council)

The World Economic Forum Councils

The WEF consists of 3 Agenda’s:

  • Global Agenda (divided into 3 subgroups: Drivers and Trends, Risks and Opportunities, Policy and Institutional Responses)
  • Industry Agenda
  • Regional Agenda

The Global Agenda has 72 Councils, the Regional Agenda 10 Councils, and the Industry Agenda 14 Councils. All of the Councils are concerned with a specific issue (e.g. human rights or ocean governance). Each Council has 1 or more organizations of various types (government, NGO, business, etc.) and each organization may be located in a different country.

Visualizing the data

Since the purpose of the visualization was to find clusters, and show the interconnectedness, the most obvious visualization I started out with was a network or graph visualization. I started out with a network or graph visualization. I used the Force-Directed layout of Protovis to create a network of all the links of all the Councils. I also used a K-Means clustering algorithm and a Community Detection Algorithm to find clusters, but the graph was too dense to find any sensible clusters. It appeared that almost every Council links to every other Council. So even though the visualization looked impressively complex, you could not get any valuable information from it. So I stopped pursuing this direction.

My next approach was to try if a radial layout would work. I was inspired by many of the visualizations on www.visualcomplexity.com and Circos. First I started out with just a radial layout of all the Councils, and then played with the visual encoding some of the dimensions, like line thickness and color. For the image below, I filtered the data only to rank 1 and Global Agenda. This resulted in less data which is easier to work with when prototyping.

This was a good start, and proved enough potential for me to continue working on this. On of the biggest flaws of the image above is that you don’t see who interlinks with who (who is the respondent Council and who is the linked Council).  So next I decided to create two half circles instead of one: one for respondent Councils and one for linked Councils. This appeared to be a good choice. I also worked on a better color palette, and more encoding of the data (for instance, width of the bar shows the number of links). This is what I ended up with:

Then I added more refinements, like adding a height to each bar for stronger links, adding a filter option for the combination of rank and Agenda, and also the ability to view Council links in isolation.  I changed the color to blue and orange instead of green and orange, because of the colorblind people. I also kept ‘data-ink ratio’ by Edward Tufte in mind: remove as much (visual) clutter as possible. Martin Wattenberg once said: “if you start playing with your visualization, you know you’re on the right direction”. And that’s exactly what happened when I added the ability to view Council links in isolation. The final result looks like this:

Challenges

One of my biggest challenges was that I didn’t understand the matrix in the data set; I couldn’t understand the logic behind it. And then when I finally thought I realized that not all data was shown, but just the strongest links so that apparently uninteresting data was omitted, I came to realize that the data in the matrix was not normalized. So, a link between Council A and B of 0.5 in the matrix was not the same as a link between Council C and D of 0.5. And because I didn’t understand the logic behind the values in the matrix, I decided not to use the values in the matrix, but do my own calculations on link strength.

The result

I’m very satisfied with the result, and at the same time I see room for improvements. I like the fact that the visualization communicates mainly visually: line thickness, line color, bar width and bar height are the main visual elements that make it easy to spot interesting links or Councils. Also, I haven’t seen circular layouts like this in Protovis yet, so it was also fun to try something new like this.

A suggestion I received from Mike Bostock was to make the selection of the Councils fuzzier. Right now the bars of the Councils can become very thin or small, and selecting them may be somewhat difficult. By using a more fuzzy selection the user experience may improve.

I have considered adding a filter for link strength as way to reduce the number of links shown. But so far I’m not yet convinced that this will reveal more clusters.

The visualization omits some data that may be of interest: for instance, organization type is currently ignored, as well as country. It may be interesting to see if finding clusters would be easier if organization type or country (or a combination) would be used instead of a respondent Council.  Also, there is currently no back-link from the linked Council to the respondent Councils. So, you cannot see if the linked Council wants to interact with the Councils that are linking to this Council.

Finally, I think that in order to find clusters the survey should not have this many options for it respondents. I would suggest just 2 ranks for Global Agenda, and 1 rank for Industry / Regional Agenda. It appears that giving Councils (or better yet, the organizations of each Council) this many options to link to other Councils, results in a situation that at some level, almost every Council links to every other Council. Using fewer ranks to choose from will probably reveal a more polarized choice, and will make it easier to find clusters.

Technology

I have used custom Scala code for pre-processing the data, and Protovis for visualizing the data. Both technologies are very interesting, and highly recommended!