Tag Archives: Data

Ghost Counties part of exhibition in Foosaner Art Museum, Melbourne FL

I am honored that my Ghost Counties visualization will be part of an exhibition entitled The Art of Networks that will be on view between March 8-April 8, 2012 at the Foosaner Art Museum in Melbourne, Florida. The exhibit will open on March 8th as a parallel event to the 3rd Workshop on Complex Networks, CompleNet 2012, hosted by the Department of Computer Sciences at Florida Institute of Technology.

In total there will be 14 recent visualizations on networks representing data from different fields, from social networks and migrations to speech cognition and housing issues in the US. The visualizations will be presented both in static and dynamic media: large print formats on the walls and short movies in two computers inside the gallery space.

If you’re in the neighborhood, be sure to drop by!

Eyeo Data Visualization Challenge: Ghost Counties

About a day before the deadline I have submitted my entry for the Eyeo Data Visualization Challenge by Visualizing.org where the grand prize is a ticket to the brilliant Eyeo Festival.

You can see the final result here: http://www.janwillemtulp.com/eyeo.

The visualization depicts the number of homes and vacant homes for all the counties for each state. The size of the outer bubble represents the total number of homes, the size of the inner bubble represents the number of vacant homes. The y-axis shows the population size (on a logarithmic scale) and the x-axis of the bubbels shows the number of vacant homes per population. Each bubble is also connected with a line to another axis: the population / home ratio. On the top right you can see some exact numbers for this data.

This time I built the visualization in Processing, mainly because I expected to work with large datasets from the US Census Bureau and I might had to use some OpenGL for better performance. Eventually I didn’t use OpenGL. Building the visualization in Processing was lots of fun. To get sense of the data I tried as many as 5 completely different approaches. Here are some of the sketches that eventually led to this visualization (view this selection on my Flickr stream).

The data itself was not very complex, but rather big, and the biggest challenge was to find a creative approach to visualize this data, but without using a map (which would be rather obvious since it’s about locations).

SEE#6 conference

This weekend I have visited SEE#6 in Wiesbaden, Germany. Although this was the sixth SEE conference, it was my first visit. And I must say, it was overwhelming! Here’s a brief overview of my experience.

Day 1 – conference

The conference was situated in the beautiful Lutherkirche, which was one of the most beautiful and atmostpheric conference locations I’ve ever been to. After a friendly welcome from Micheal Volkmer from the hosting organization Scholz & Volkmer, prof. dr. Harald Welzer. He gave an inspirational talk about sustainability, the main subject of the conference, and he gave us his view on how to fight climate change and change human behavior to improve a sustainable society. 

After the keynote Carlo Ratti of the MIT Senseable City Lab in Boston showed us some of his recent projects on how ubiquitous computing is entering our society more and more, and how sensors can help cities and citizens to be more aware of the environment, and improve sustainability. On this image you see a visualization by the Lab that shows real-time whether and rain situation in combination with taxi locations: are taxi’s on places where they are needed the most?

After a break, the young and talented Alexander Lehman took over. He showed us some of his animated infographics, and how he uses satire to tell a story. He elaborated on his most successful project: “du bist terrorist” where he uses satire in an attempt to make people more aware about the increasing danger of the government collecting all kinds of data about its citizens.

Brendan Dawes had a very humorous and inspirational talk about how he is using his creativity to do very inspiring projects, both for customers as personal research projects. He had many great examples of his projects and one of them was a homebuilt digital/wooden weather indicator: not because it really solves a problem, but just because it’s fun and cool:

Wesley Grubbs from Pitch Interactive showed us some of his great projects. Projects where he has visualized many rows of data that resulted in high resolution (and long rendering time) images. One of his most compelling examples was a visualization where he shows some insight in how the US Defense is spending its money.

After the beautiful images from Wesley, Joshua-Prince Rasmus from REX architects gave his presentation. Before the conference I didn’t really know how architecture would fit in a conference about information visualization, but it turned out that this was one of the talks that impressed me most. At REX they’ve defined a very strong process of doing work for clients. Joshua talked us through it. He showed the beautiful images he uses in presentations to show his ideas, and at REX they’re brilliant in working within constraints, creating flexible buildings and, perhaps most importantly, really understand their clients, so that they eventually build something clients really understand and agree with. Great talk!

Final talk was by the talented Justin Manor from Sosolimited. Justin inspired us by showing a few of his great installations. One that he gave special attention was real-time analysis of political debates, where a very large number of different approaches were shown how to interpret, categorize and analyze the words and sentences politicians are saying. His visualization is built in Processing. This concluded the first day.

Day 2 – discussion and workshop

The second day was an extra day for data visualization die hards, to have some good discussions about data visualization. The discussion was led by 3 prominent people from the data visualization community: Moritz Stefaner, Andrew vandeMoere and Benjamin Wiederkehr. The day was basically split into 2 halves where the first part was a discussion, and the second part some of the presenters of the conference gave some insight in how they do their work.

The discussion took of immediately after the main topics had been presented:

  • how to engage people through visualization?
  • how to use visualization to change people?
  • “facts are useless, stories are everything!” (quote by prof. dr. Harald Welzer made during the keynote on the first day)
  • what is the impact of data.gov shutting down?
  • how can design critique push the community forward?

Although all of these topics were a used as a starting point for the discussion, the main topic of the discussion evolved around questions like:

  • as a (visualization) designer, should you help the user make his conclusion?
  • is linear storytelling, like the satire movies from Alexander Lehman, the right way of passing information to the user?
  • can you really be objective if you visualize data?
  • is showing war casualties, for example like the one from Stamen design, a good way to really engage people?

The discussion was really interesting, and people had some very good points. But at the same time I found it somewhat hard to really take a stand in this, because I really think it depends. Anyway, after a short break, some of the presenters from day 1 gave some insight into their work process, things they run into, etc.

  • Wesley Grubbs showed us some more details on how he approaches visualization work for both clients and as personal research projects, like handling large amounts of data.
  • Alexander Lehman showed us how he uses 3DS Max to create his infographic movies, and how the overkill on functionality of 3DS Max actually helps him to be very productive.
  • Moritz Stefaner gave a very short but great introduction to Protovis, and how you could use Google Docs cleverly to use it as a real-time data source for your visualization
  • Justin Manor showed some more of his great installations with water, pneumatics, LED, Processing, Arduino, etc., and various ways of how they were made.

That wraps it up.

The conference was a blast! It was so inspiring, a great location, fantastic speakers, and very good talks! I am looking forward to SEE#7 conference.

If you want to see the full talks of SEE#6, go to http://www.see-conference.org/video-stream/. On the conference website you can also see video registrations of previous talks, which are highly recommended to watch!

Day 3 at the O’Reilly Strata Conference

The third day of the Strata Conference was again packed with great sessions. The day started off with numerous keynotes. The first one was Simon Rogers of The Guardian. Simon is not just a fabulous presenter, also the examples of his work at the Guarding were great examples of how to tell stories with data, and how The Guardian actually enhanced its news stories by sharing data with the public. Next up was an interesting panel discussion with Toby Segaran (Google), Amber Case (Geoloqi) and Bradford Cross (Flightcaster) and moderated by Alistair Croll (Bitcurrent). Topic of discussion was Posthumus, Big Data and New Interfaces. After the discussion we had some good presentations by Ed Boyajian (EnterpriseDB) and after that Barry Devlin (9sight consulting). Next was a very lively talk by DJ Patil (LinkedIn), and he showed very convincingly that the success of working with big data at LinkedIn is only possible with a good team of talented people. Scott Yara (EMC) came next, and also had a lively talk full of humor on how Your Data Rules The World. The closing keynote was from Carol McCall (Tenzing Health) with a serious problem brought with humor on how big data analytics can be used to improve the US healthcare, and turn it ‘from sickcare into healthcare’.

As my first session I chose a talk on Data Journalism, Applied Interfaces. Marshall Kirkpatrick (ReadWriteWeb) showed some really useful tools, like NeedleBase, that he uses for discovering stories on the Internet. He was followed up by Simon Rogers of The Guardian again, who more or less continued his keynote, showing very compelling examples of how The Guardian uses data to tell stories, and how they use for instance Google Fusion Tables to publish many of their data. The last speaker of this sesion was Jer Thorpe, and he absolutely blew me away with a beautiful interface he has created in Processing as an R&D project together with the New York Times. It’s called Cascade, and shows a visual representation of how Twitter messages are cascaded over various followers and links.

My next session was on ‘RealTime Analytics’ at Twitter where Kevin Weil mainly explained RainBird, a project they use for various counting applications so that realtime analytics can easily be applied. The project will be opensourced in the near future.

After the break I saw a session on AnySurface: Bringing Agent-based Simulation and Data Visualization to All Surfaces by Stephen Guerin (Santa Fe Complex). He showed how using a projector and a table of sand can be used to enhance a data visualization for simulation purposes. As an example he showed us how he projects agent-based models and emergent phenomena in complex system dynamics can help firefighters simulate bottlenecks in escape routes. It was also very cool to see that many of his simulations are built in Processing. Next up was a session by Creve Maples (Event Horizon) and I really like the first part of his talk, because he had a very good story on how we should keep the capacity of the human brain for processing information in mind when designing products and tools. It was really good to hear such a strong emphasis on this. The last part of his talk was mainly about some of the 3D visualizations he has done in the past that were very successful for his company, but didn’t struck me as much as the first half of his talk.

The session on Data as Art by J.J. Toothman (NASA Ames Research Center) was a good an fun talk with many examples of infographics and visualizations. I had already seen most of them myself, some were new. It was a great talk with lots of eye-candy. The final talk of the conference I saw was about Predicting the Future: Anticipating the World with Analytics. Three speakers gave their vision on how they do that: Christopher Ahlberg (Recorded Future) showed how his companies uses time-related hints (like the mention of the word ‘tommorrow’) in existing content on the Internet can be used to more or less predict the future. Robert McGrew (Palantir Technologies) showed how analyzing many large datasets in combination with human analysis can be used to perform effective fraud and crime predication. Finally Rion Snow (Twitter) showed that research has proven that analyzing tweets can be used effectively for stock market prediction (3 days ahead!), flu and virus spread prediction, and UK election result prediction (more accurate than exit polls). The predictive power of analyzing the Twitter crowd was really stunning.

This concluded the O’Reilly Strata Conference. The conference was fantastic, the sessions were great, and most of all, meeting all these people was probably even the best of all!

Day 2 at the O’Reilly Strata Conference

 

After a day of tutorials, the second day at Strata was the first of two conference days, packed with fascinating sessions. The day was kicked of with a plenary session with a long list of top-speakers in field of data science: Edd Dumbill of O’Reilly Media, Alistair Croll of Bitcurrent, Hilary Mason of bit.ly, James Powell of Thomson Reuters, Mark Madsen of Third Nature, Werner Vogels of Amazon.com, Zane Adam of Microsoft Corp, Abhishek Mehta of Tresata, Mike Olson of Cloudera, Rod Smith of IBM Emerging Internet Technologies and last but not least Anthony Goldbloom of Kaggle. Various topics were presented in presentations of 10 minutes each, like data without limits, data marketplace, and the mythology of big data. The shortest presentation struck me most: “the $3 Million Heritage Health Prize” presented by Anthony Goldbloom: people are challenged to create a predictive application that uses healthcare data to predict which people are most likely to go to hospital, so that ‘US healthcare becomes healthcare instead of sickcare’. The prize is $3 Million for the one who solves this!

Next up were the individual sessions, and I was very much looking forward to the talk “Telling Greate Data Stories Online” Jock MacKinlay of Tableau. And though the talk itself was excellent, for me it was all known stuff, but the talk is highly recommended for those unfamiliar with Visual Analytics or Tableau. Being biased towards visualization related sessions, my next session was “Desinging for Infinity” by Dustin Kirk of Neustar. Dustin showed 8 Design Patterns of User Interface Design, like infinite scrolling, which were really good. It reminded me of the updated version of the material in Steve Krugg’s book Don’t Make Me Think.

Next up was the best talk of the day: “Small is the New Big: Lessons in Visual Economy”. Kim Rees of Periscopic showed us very good examples of effective information visualizations. I was really blown away by this presentation, mostly because she really showed how creatively removing clutter and distractions can make the visualization very effective. Also the creative interactions that help the user using the visualization were compelling. Next was Philip Kromer of Infochimps on “Big Data, Lean Startup: Data Science on a Shoestring”. Though my expectations were that Philip was going to explain the Lean Startup principles, evangelized by Eric Ries, the talk was more about Infochimps approach to doing business. Some remarkable comments by Philip: “everything we do is for the purpose of programmer joy”, and “Java has many many virtues, but joy is not one of them”. Great presentation and inspiring insights!

My next sessions was “Visualizing Shared, Distributed Data” by Roman Stanek (GoodData), Pete Warden (OpenHeatMap) and Alon Halevy (Google). After short presentations of each, these three guys had a panel discussion where the audience could as questions. Their discussion evolved mostly around the fact that all three deal with data that is created and uploaded by a user, and how do you deal with that: do you clean it, what’s the balance between complex query functionality and ease of use, etc. My final session was “Wolfram Alpha: Answering Questions with the World’s Factual Data” by Joshua Martell. Half the talk was a demonstration of the features of WolframAlpha, and the other half was more or less a high level talk about how WolframAlpha handles user input, how data is stored, how user analytics is performed, and more.

The day ended with a Science Fair where students, researchers and companies were showing new advancements in the field of data science. There were really interesting showcases, like a simulation tool for system dynamics. But again biased towards visualization, the one that struck me most was Impure by Bestiaro. Impure is a visual programming language that allows users to easily create their own visualization, both simple and very advanced. It was also great to see the passion of Bestiario for their own product.

Finally one of the best things of the conference so far has been meeting people, some of which I only know virtually for some time now. I especially enjoyed meeting all the visualization people today. It’s really great to meet many of the online visualization community in person.

So again, a fantastic day at Strata, and I am looking forward to tomorrow!

Day 1 at O’Reilly Strata Conference: Data Bootcamp

Today was my first day at O’Reilly Strata Conference: a full day of tutorial sessions. The session I picked was the Data Bootcamp by Joseph Adler (LinkedIn), Hilary Mason (bit.ly), Drew Conway (New York University) and Jake Hofman (Yahoo!). The purpose of this bootcamp tutorial was to turn everybody in the room into data scientists by getting our hands dirty with some real hands-on experience.

The tutorial was kicked-off with an introduction of the speakers, and a general overview of the various aspects of working with data: getting data, cleaning data, applications of data intensive applications, and much more. Then Drew gave an interactive introduction in visualizing data using Python and R. The audience had to produce a normal-distribution of random numbers in R. And although some people managed to get along with all the examples, there were also lots of people struggling due to the fact that libraries were missing, or simply for the fact that everything was going pretty fast, at least for R and Python newbies like myself.

Next Jake gave an great introduction into image processing, and especially how you can cluster images based on similar features, color in our case. We used a K-Means clustering algorithm to cluster similar images based on color, and after that we classified images, whether they were images of landscapes or head-shots.

After the break Hilary took over with a great presentation on working with text-data. Starting with some basic examples on extracting data from webpages using command-line commands like curl and wget, and using Python and the BeautifulSoup Python library. After that we turned to the main example: ‘hacking’ a gmail account, and try to get some valuable information out of it. Hilary showed us how to classify email using probability statistics, and then Drew took over to show us how to visualize this data and turn it into network diagrams.

Last but not least Joseph gave a talk about Big Data. This was not an interactive session. Joseph shared some of his knowledge and experience of working with big data at LinkedIn, and explained the basics of Map/Reduce, Hadoop, and why and when to start thinking about big data solutions like Hadoop.

Overall it was an interesting day, also because I’ve met really great people. It was especially great to meet Naomi (@nbrgraphs), Kim (@krees), Jerome (@jcukier) and Daniel (@danielgm). For me the Data Bootcamp was especially an inspirational tutorial with lots of ideas to try out on my own. For some people tempo tempo was a little to high, especially if you’ve never programmed R or Python before. And becoming a Data Scientist in just 1 day may be an illusion anyway. At least the tutorial gave me a good head start, lots of inspiration, and great learnings of how the presenters approach working with data. So for me, this was a great and successful first day, and I’m looking forward to the next two days!

The source code and slides of the Data Bootcamp are available online at: https://github.com/drewconway/strata_bootcamp