Category Archives: Data Visualization

Tutorial: Line interpolations in D3

In this tutorial we’re going to explore line interpolations in D3.

First we start with 2 scales that we will use to convert values to x- and y-coordinates on the screen:

var x = d3.scale.linear().domain([0,10]).range([0,400]),
y = d3.scale.linear().domain([0,1]).range([0,50]),
groupHeight = 60,
topMargin = 100

Next we generate some random data:

var data = []
d3.range(10).forEach(function(d) { data.push(Math.random()) })

We’re also creating an array which contains all the possible interpolations D3 supports. We’ll see the effects of every interpolation in a moment:

var interpolations = [
	"linear",
	"step-before",
	"step-after",
	"basis",
	"basis-closed",
	"cardinal",
	"cardinal-closed"]

In SVG there is a difference between a line and a path. A line is a straight line where you define the start and end position of the line: , whereas with a path you draw the outline of any arbitrary shape by specifying a series of connected lines, arcs, and curves. You do this by specifying the d attribute of the path. Every path must begin with a moveto command. The command letter is a capital M followed by an x- and y-coordinate, separated by commas or whitespace. This command sets the current location of the “pen” that’s drawing the outline. This is followed by one or more lineto commands, denoted by a capital L, also followed by x- and y-coordinates, and separated by commas or whitespace. You can see more of this specification here.

In our example we’re not actually creating an SVG line, but an SVG path, so we need to set the d attribute of the paht. Luckily D3 has a helper function to ease the burden to create this data: d3.svg.line. For this helper function you can set:

  • an accessor function for obtaining x values
  • an accessor function for obtaining y values
  • an interpolation type, which defaults to linear
  • a tension value which affects the cardinal interpolations only

In our example, we want to show the different kinds of interpolations for the same data, so we create a function that takes the name of an interpolation as an argument, and then returns the d3.svg.line function as a result. This is the code that does that (you can play with the out-commented tension property to see the effect):

function getLine(interpolation) {
    return d3.svg.line().x(function(d,i) {
        return x(i)
    }).y(function(d) {
        return y(d)
    }).interpolate(interpolation)
//.tension(0)
}

Note the following: the function for x has 2 arguments: d and i. The d is the current item in the dataset (which we will provide later), and i is the index of the current item in the dataset. Also note that we’re using the x-scale to convert i to an x-coordinate, and the y-scale to convert the data value to a y-coordinate.

Now we initialize the visualization:

var vis = d3.select("body")
	.append("svg:svg")
	.attr("class", "vis")
	.attr("width", window.width)
	.attr("height", window.height)

Next up is greating a group for each of the lines we want to show:

var lg = vis.selectAll(".lineGroup")
	.data(interpolations)
	.enter().append("svg:g")
	.attr("class", "lineGroup")
	.attr("transform", function(d,i) {
	return "translate(100," + (topMargin + i * groupHeight) + ")"
}).each(drawLine)

We set the interpolations array as data for this group, so that svg:g elements for each of the interpolations will be added to the visualization. The svg:g element can be used to group other elements together, so that if you apply a transformation to the group for instance, it will be applied to all of its members. Note that we add the class lineGroup in our selection to select all these elements. Next we set the transform attribute, and we use the index to position the groups based on their position in the interpolation array. For each of the group, we want to draw a line. We do that by calling the drawLine function in the .each(drawLine) statement. The drawLine function itself looks like this:

function drawLine(p,j) {
	d3.select(this)
		.selectAll(".lineGroup")
		.data(data)
		.enter().append("svg:path")
		.attr("d", getLine(p)(data))
		.attr("fill", "none")
		.attr("stroke", "steelblue")
		.attr("stroke-width", 3)
		//.attr("stroke-dasharray", "15 5")
}

The drawLine function itself has to parameters: p and j where p is the parent data item (the current interpolation name), and j is the parent index. First we select the current element with d3.select(this), and next we select all the .lineGroup elements. We assign the line data to the data property, and append a path to each lineGroup element. The d attribute calls the getLine function and provides the current interpolation name as an argument. The result of that is the d3.svg.line function with the that uses the interpolation we just provided. Next we assign the line data to this function so that D3 will calculate the data string that will be used by the d attribute of the svg:path element. Finally we set some basic properties. The final out-commented is one of the stroke properties you can set, where 15 is the dash length and 5 is the gap length. Just play around with those properties to see what else is possible.

This concludes this tutorial. All the lines you see are using the same data, but they use different interpolations.

Tutorial: Conway’s Game of Life in D3


See the final version here.

This is an example of Conway’s Game of Life, built in D3. According to Wikipedia:

The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970.[1]The “game” is a zero-player game, meaning that its evolution is determined by its initial state, requiring no further input. One interacts with the Game of Life by creating an initial configuration and observing how it evolves.

There are only 4 rules in the Game of Life:

The universe of the Game of Life is an infinite two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, live or dead. Every cell interacts with its eight neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

  1. Any live cell with fewer than two live neighbours dies, as if caused by under-population.
  2. Any live cell with two or three live neighbours lives on to the next generation.
  3. Any live cell with more than three live neighbours dies, as if by overcrowding.
  4. Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

    The initial pattern constitutes the seed of the system. The first generation is created by applying the above rules simultaneously to every cell in the seed—births and deaths occur simultaneously, and the discrete moment at which this happens is sometimes called a tick (in other words, each generation is a pure function of the preceding one). The rules continue to be applied repeatedly to create further generations.

    I’m sure there are multiple ways of implementing Conway’s Game of Life, but this is just one of them. We start by declaring a few variables:

    			var ccx = 120, // cell count x
    				ccy = 30, // cell count y
    				cw = 5, // cellWidth
    				ch = 5,  // cellHeight
    				del = 100, // delay
    				xs = d3.scale.linear().domain([0,ccx]).range([0,ccx * cw]),
    				ys = d3.scale.linear().domain([0,ccy]).range([0,ccy * ch]),
    				states = new Array()
    

    The states variable will be used to hold the states of each cell: true for on and false for off. Next up, we’re going to fill the states array with data:

    d3.range(ccx).forEach(function(x) {
    	states[x] = new Array()
    	d3.range(ccy).forEach(function(y) {
    		states[x][y] = Math.random() > .8 ? true : false
    	})
    })
    

    This is a good example of the use of d3.range([start], stop, [step]) function which returns a range of number. We’re using only the stop argument, so in our case the first use of the range() function will generate an array of 0 (which is the default start) to 120, and uses the default step of 1. As you can see we’re building a 2-dimensional array here, so that we can easily access each state, for example states[0][0] to access the first state. We randomly set the state to either true or false.

    I have created the toGrid() function so that the 2-dimensional array is turned into an array of Objects, so that D3 can easily bind all the values to an SVG element (binding to the 2-dimensional array directly should also be possible, which I leave as an exercise for you at the moment…):

    			function toGrid(states) {
    				var g = []
    				for (x = 0; x < ccx; x++) {
    					for (y = 0; y < ccy; y++) {
    						g.push({"x": x, "y": y, "state": states[x][y]})
    					}
    				}
    				return g
    			}
    

    Now we initialize the visualization:

    			var vis = d3.select("body")
    				.append("svg:svg")
    				.attr("class", "vis")
    				.attr("width", window.width)
    				.attr("height", window.height)
    

    After that we create the initial state of the grid:

    vis.selectAll("rect")
    	.data(function() { return toGrid(states) })
      .enter().append("svg:rect")
    	.attr("stroke", "none")
    	.attr("fill", function(d) { return d.state ? "green" : "white" })
    	.attr("x", function(d) { return xs(d.x) })
    	.attr("y", function(d) { return ys(d.y) })
    	.attr("width", cw)
    	.attr("height", ch)
    

    Note that the data that is bound to svg:rect elements is the result from the toGrid() function. Also, there are multiple ways to show the on and off state of a cell, for instance using the visibility property. I chose to use the fill property in this case. It is either colored green or white (which is the background color) based on the state. This is all that’s needed to create the initial grid. Now comes the fun part: creating the new generations:

    			function createNewGeneration() {
    				var nextGen = new Array()
    
    				for (x = 0; x < ccx; x++) {
    					nextGen[x] = new Array()
    					for (y = 0; y < ccy; y++) {
    						var ti = y - 1 < 0 ? ccy - 1 : y - 1 // top index
    						var ri = x + 1 == ccx ? 0 : x + 1 // right index
    						var bi = y + 1 == ccy ? 0 : y + 1 // bottom index
    						var li = x - 1 < 0 ? ccx - 1 : x - 1 // left index
    
    						var thisState = states[x][y]
    						var liveNeighbours = 0
    						liveNeighbours += states[li][ti] ? 1 : 0
    						liveNeighbours += states[x][ti] ? 1 : 0
    						liveNeighbours += states[ri][ti] ? 1 : 0
    						liveNeighbours += states[li][y] ? 1 : 0
    						liveNeighbours += states[ri][y] ? 1 : 0
    						liveNeighbours += states[li][bi] ? 1 : 0
    						liveNeighbours += states[x][bi] ? 1 : 0
    						liveNeighbours += states[ri][bi] ? 1 : 0
    
    						var newState = false
    
    						if (thisState) {
    							newState = liveNeighbours == 2 || liveNeighbours == 3 ? true : false
    						} else {
    							newState = liveNeighbours == 3 ? true : false
    						}
    
    						nextGen[x][y] = newState
    					}
    				}
    
    				return nextGen
    			}
    

    This function implements the Game of Life rules mentioned earlier. We’re building a new 2-dimensional array here that will eventually be used to replace the value of the states variable. We’re determining the top, right, bottom and left index to use for each cell. If one of the index numbers would fall out of the range (greater than the length, or smaller than 0), then the index of the opposite side is used. For example, if we are currently at cell(0,5) (which means x = 0 and y = 5), then to calculate li (left index) we end up with index -1. This of course does not exist, so we use ccx.length - 1 instead, which is 199. This way all the cells will have a top, right bottom and left to work with. Next the number of liveNeighbours is calculated by summing up the number of true states of 8 neighbour cells. Finally the new state for this cell is calculate by actually applying the Game of Life rules. The new state is stored in the temporary array, which is being returned as the result of the function.

    The last part we need to do is to create new generations repeatedly and animate the grid accordingly:

    			function animate() {
    				states = createNewGeneration()
    				d3.selectAll("rect")
    					.data(toGrid(states))
    				  .transition()
    					.attr("fill", function(d) { return d.state ? "green" : "white" })
    					.delay(del)
    					.duration(0)
    			}
    
    			setInterval("animate()", del)
    

    This is done by the setInterval() Javascript function. We call the animate function with a delay of 100 milliseconds (the value of the del variable). The animate function itself is pretty straightforward. The value of the states variable is replaced with a new generation. Then all the rect elements are selected and the new generation is bound to these rectangles (note that again the 2-dimensional array is converted with the toGrid() function. After that we define the transition() we want to apply, and all we do is modify the fill property of each cell. Setting the delay to the del value as well seems to be working quite well. I guess this helps the browser to have enough time to calculate the new generation. We explicitly set the duration to 0 to override the default.

    That’s all there’s to it. I thought it would be more complex to build the Game of Life in D3, but it appears to be fairly straightforward. This code of course does require some calculation power from your browser, so just play around with the delay or grid size to get an optimal setting that works for you. Also, you can play with various other attributes to create interesting effects, for instance, using this for creating the grid gives a nice blurry effect:

    			vis.selectAll("rect")
    				.data(function() { return toGrid(states) })
    			  .enter().append("svg:rect")
    				.attr("stroke", "none")
    				.attr("fill", function(d) { return d.state ? "green" : "white" })
    				.attr("fill-opacity", .3)
    				.attr("x", function(d) { return xs(d.x) })
    				.attr("y", function(d) { return ys(d.y) })
    				.attr("width", function() { return 2 * cw })
    				.attr("height", function() { return 2 * ch })
    

    Enjoy!

    Tutorial: Introduction to D3

    Check out the final result here!

    D3 is a brand new visualization framework created by Mike Bostock. It is the successor of the successful great visualization framework Protovis. There are a few differences and similarities with Protovis. One of the most important differences is that in D3 you work more directly with SVG which gives you much greater flexibility than Protovis. Also, the performance of D3 is much better than Protovis, especially with animation, because in D3 only the properties that are changing are updated, instead of re-rendering the entire visualization. And once you dive into D3, it’s really easy to pick up, so let’s get started!

    The example we’re going to work on is really simple and will show you some of the basic concepts of D3. Basically we just plot hidden circles randomly on the screen, and then transition them to a portion of the screen. Then we add some interaction to it so that the circles will move once you move your mouse over them.

    The first thing you need to do is to make a reference to D3 from your HTML page. You can download D3, or make a link to a stable version on GitHub. The reference I used for this tutorial is: https://github.com/mbostock/d3/raw/v1.8.2/d3.js

    To get started we generate some data that we want to bind the circles to. All we do here is just generate a bunch of x and y values and put them in an array:

    var data = []
    for (i=0; i &lt; 1000; i++) {
        data.push({"x": Math.random(), "y": Math.random()})
    }
    

    Now we need to add an SVG element to the body of the page. This is how you do that:

    var h = 1000
    var vis = d3.select("body")
        .append("svg:svg")
        .attr("width", screen.width)
        .attr("height", screen.innerHeight)
    

    First the body is selected using d3.select("body") which is a similar selection you would do with jQuery selectors. The h variable is the height I want to use. I refer to it later in the code. Next we’re going to plot the invisible circles:

    var x = d3.scale.linear().domain([0,1]).range([screen.width / 2 - 400,screen.width / 2 + 400]),
    y = d3.scale.linear().domain([0,1]).range([0,h]),
    r = d3.scale.linear().domain([0,1]).range([5,10]),
    c = d3.scale.linear().domain([0,1]).range(["hsl(250, 50%, 50%)", "hsl(350, 100%, 50%)"]).interpolate(d3.interpolateHsl)
    
    vis.selectAll("circle")
    	.data(data)
    	.enter().append("svg:circle")
    	.attr("cx", function(d) { return x(d.x) })
    	.attr("cy", function(d) { return y(d.y) })
    	.attr("stroke-width", "none")
    	.attr("fill", function() { return c(Math.random()) })
    	.attr("fill-opacity", .5)
    	.attr("visibility", "hidden")
    	.attr("r", function() { return r(Math.random()) })
    

    First we create some d3.scale variables. Since we’re working with random data, we need to convert the output of Math.random() outputs to screen positions, radii and colors. We use linear scales, and you see that the domain is [0,1] for all the scales. That’s because the output of Math.random() yields a number between 0 and 1. The range tells us to which range the domain should be converted. Each of these variables is actually a function, so you can use r for example as a function to convert a number of the domain to a number of the range, so for example: r(.5) will result in 7.5. The c variable will turn any value between 0 and 1 to a color.

    Next we select all the circle elements, which is an empty collection at this time. We bind our data to this collection with the .data(data) statement, and then use the .enter().append("svg:circle") statement so that each element in our data array will be bound to a new svg:circle element. Here we set various properties: cx for the x-position of the circle (note that in SVG position 0,0 is the top left corner). The cy is used for the y-position of the circle, the r is used for the radius of the circle. The other properties speak for themselves. Note that a difference with Protovis is that you cannot use the shorthand way for using a function: function(d) x(d.x), but in D3 you have to write out the return keyword and the curly braces: function(d) { return x(d.x) }.

    Now, when you view this page, you’ll see nothing, because we’ve set the .attr("visibility", "hidden"). If you want to see the results so far, just remove this line. Next we want to move all those hidden circles to their new random position, so that together they will form the colorful bar. This is the code that does just that:

    var y2 = d3.scale.linear().domain([0,1]).range([h/2 - 20, h/2 + 20])
    var del = d3.scale.linear().domain([0,1]).range([0,1])
    
    d3.selectAll("circle").transition()
    	.attr("cx", function() { return x(Math.random()) })
    	.attr("cy", function() { return y2(Math.random()) })
    	.attr("visibility", "visible")
    	.delay(function(d,i) { return i * del(Math.random()) })
    	.duration(1000)
    	.ease("elastic", 10, .45)
    

    First we need 2 extra scales to convert random numbers: y2 which is the scale that converts a random number to a new range. The del scale will be used for the delay of the transition. We now first select all the circle elements and we start a transition(). We set a few properties of the transition itself: delay, duration and ease, and also the properties of the selected circles elements that we want to animate. The cx, visibility and cy properties of the circle will all be animated. All you have to do is provide the end state for the properties you want to animate. The animation will last 1000 milliseconds, there will be some delay for each circle before starting the transition, and we apply an elastic easing function to create the nice elastic bouncing effect. Go see what you’ve got so far! Fun, isn’t it?

    The last part will be a little extension of the piece of code we used to add the circles initially, so change your code so that it looks like this:

    vis.selectAll("circle")
    	.data(data)
    	.enter().append("svg:circle")
    	.attr("cx", function(d) { return x(d.x) })
    	.attr("cy", function(d) { return y(d.y) })
    	.attr("stroke-width", "none")
    	.attr("fill", function() { return c(Math.random()) })
    	.attr("fill-opacity", .5)
    	.attr("visibility", "hidden")
    	.attr("r", function() { return r(Math.random()) })
    	.on("mouseover", function() {
    		d3.select(this).transition()
    		.attr("cy", function() { return y2(Math.random()) })
    		.delay(0)
    		.duration(2000)
    		.ease("elastic", 10, .3)
    	})
    

    A mouseover event has been added to each of the circles. And basically we’re doing something similar again: just add a transition, set some transition properties and the properties of the selected elements. In this case d3.select(this) selects the current circle so that you apply the transition to the current selected circle.

    That’s it!

    Visualizing the World Economic Forum Global Agenda Interlinkage

    The World Economic Forum (WEF) and Visualizing.org have recently issued a Data Visualization contest in which interactive designers were asked to develop cutting-edge visualizations that will help elucidate the interconnectedness among issues, highlight emerging clusters and catalyze dialogue at the Summit between Councils. The data for this contest was derived from a survey of experts of the 72 Global Agenda Councils of the WEF, and they were asked the following three questions:

    • “Please select a maximum of 5 Global Agenda Councils that your Council would benefit from interacting with by order of priority”
    • “Please select a maximum of 3 Industry / Regional Agenda Councils that your Council would benefit from interacting with by order of priority”
    • “Please describe how it interlinks with your Council”

    The data

    The data was an Excel workbook with 3 sheets (or 3 CSV files) that contained the survey data:

    • A matrix with pre-calculated weighted links between Councils
    • A flat list of all the survey data
    • All the survey data, but in a different structure (this time by respondent Council)

    The World Economic Forum Councils

    The WEF consists of 3 Agenda’s:

    • Global Agenda (divided into 3 subgroups: Drivers and Trends, Risks and Opportunities, Policy and Institutional Responses)
    • Industry Agenda
    • Regional Agenda

    The Global Agenda has 72 Councils, the Regional Agenda 10 Councils, and the Industry Agenda 14 Councils. All of the Councils are concerned with a specific issue (e.g. human rights or ocean governance). Each Council has 1 or more organizations of various types (government, NGO, business, etc.) and each organization may be located in a different country.

    Visualizing the data

    Since the purpose of the visualization was to find clusters, and show the interconnectedness, the most obvious visualization I started out with was a network or graph visualization. I started out with a network or graph visualization. I used the Force-Directed layout of Protovis to create a network of all the links of all the Councils. I also used a K-Means clustering algorithm and a Community Detection Algorithm to find clusters, but the graph was too dense to find any sensible clusters. It appeared that almost every Council links to every other Council. So even though the visualization looked impressively complex, you could not get any valuable information from it. So I stopped pursuing this direction.

    My next approach was to try if a radial layout would work. I was inspired by many of the visualizations on www.visualcomplexity.com and Circos. First I started out with just a radial layout of all the Councils, and then played with the visual encoding some of the dimensions, like line thickness and color. For the image below, I filtered the data only to rank 1 and Global Agenda. This resulted in less data which is easier to work with when prototyping.

    This was a good start, and proved enough potential for me to continue working on this. On of the biggest flaws of the image above is that you don’t see who interlinks with who (who is the respondent Council and who is the linked Council).  So next I decided to create two half circles instead of one: one for respondent Councils and one for linked Councils. This appeared to be a good choice. I also worked on a better color palette, and more encoding of the data (for instance, width of the bar shows the number of links). This is what I ended up with:

    Then I added more refinements, like adding a height to each bar for stronger links, adding a filter option for the combination of rank and Agenda, and also the ability to view Council links in isolation.  I changed the color to blue and orange instead of green and orange, because of the colorblind people. I also kept ‘data-ink ratio’ by Edward Tufte in mind: remove as much (visual) clutter as possible. Martin Wattenberg once said: “if you start playing with your visualization, you know you’re on the right direction”. And that’s exactly what happened when I added the ability to view Council links in isolation. The final result looks like this:

    Challenges

    One of my biggest challenges was that I didn’t understand the matrix in the data set; I couldn’t understand the logic behind it. And then when I finally thought I realized that not all data was shown, but just the strongest links so that apparently uninteresting data was omitted, I came to realize that the data in the matrix was not normalized. So, a link between Council A and B of 0.5 in the matrix was not the same as a link between Council C and D of 0.5. And because I didn’t understand the logic behind the values in the matrix, I decided not to use the values in the matrix, but do my own calculations on link strength.

    The result

    I’m very satisfied with the result, and at the same time I see room for improvements. I like the fact that the visualization communicates mainly visually: line thickness, line color, bar width and bar height are the main visual elements that make it easy to spot interesting links or Councils. Also, I haven’t seen circular layouts like this in Protovis yet, so it was also fun to try something new like this.

    A suggestion I received from Mike Bostock was to make the selection of the Councils fuzzier. Right now the bars of the Councils can become very thin or small, and selecting them may be somewhat difficult. By using a more fuzzy selection the user experience may improve.

    I have considered adding a filter for link strength as way to reduce the number of links shown. But so far I’m not yet convinced that this will reveal more clusters.

    The visualization omits some data that may be of interest: for instance, organization type is currently ignored, as well as country. It may be interesting to see if finding clusters would be easier if organization type or country (or a combination) would be used instead of a respondent Council.  Also, there is currently no back-link from the linked Council to the respondent Councils. So, you cannot see if the linked Council wants to interact with the Councils that are linking to this Council.

    Finally, I think that in order to find clusters the survey should not have this many options for it respondents. I would suggest just 2 ranks for Global Agenda, and 1 rank for Industry / Regional Agenda. It appears that giving Councils (or better yet, the organizations of each Council) this many options to link to other Councils, results in a situation that at some level, almost every Council links to every other Council. Using fewer ranks to choose from will probably reveal a more polarized choice, and will make it easier to find clusters.

    Technology

    I have used custom Scala code for pre-processing the data, and Protovis for visualizing the data. Both technologies are very interesting, and highly recommended!

    Page 2 of 212