Graphs 'n stuff: simple scatterplot with d3
Recently I’ve been learning how to use d3.js (D3), often touted as an incredibly powerful cool for creating graphics. I thought it’d be helpful to any other novices out there to write up my D3 examples as I do them. I come from a statistical background and while I have plenty of experience coding in R, JavaScript is a whole new challenge.
I’ll be writing up short explanations on any particular aspect that confuses me with a particular visualisation. Hopefully this can save someone else a few minutes of their day.
Let’s get started!
Below is a simple scatter plot. This particular graph plots the runtime of a randomforest as a function of the number of observations of the sample. Today, however, it’s just a simple dataset!
The scatterplot is very basic. It consists of an x-axis and a y-axis, and labels for these axes. It consists of black dots to represent data. It also consists of an exceedingly basic tooltip function: you can mouseover each datapoint to see its x and y values.
The dataset contains 50 observations and is located in a file called “obs_timings.csv”. Whilst we’ll be putting “Number of Rows” as our label on the graph, here I’ve just labelled the x-values as"size" to save my tired hands.
An excerpt of the dataset looks like
size,time
100,1.17
200,1.27
300,1.42
400,1.67
500,1.84
600,2
700,2.26
800,2.51
900,2.76
1000,3.01
The code is listed below:
<!DOCTYPE html>
<meta charset="utf-8">
<html>
<head>
<style>
.axis path,
.axis line
{
fill: none;
stroke: black;
shape-rendering: crispEdges;
}
.axis text
{
font-family: sans-serif;
font-size: 11px;
}
div.tooltip
{
position: absolute;
text-align: center;
width: 70px;
height: 14px;
padding: 2px;
font: 12px sans-serif;
background: lightsteelblue;
border: 0px;
border-radius: 8px;
pointer-events: none;
}
</style>
</head>
<body>
<script src="http://d3js.org/d3.v3.min.js"> </script>
<script>
var max_time = 22
var margin = {top: 20, right: 20, bottom: 30, left: 40},
width = 960 - margin.left - margin.right,
height = 500 - margin.top - margin.bottom;
//Create a svg element to store the graph in
var svg = d3.select("body").append("svg")
.attr("width" , width + margin.left + margin.right)
.attr("height" , height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
//Set up scales that we can use to draw the axes
var x = d3.scale.linear()
.domain([0,5000])
.range([0,width]);
var y = d3.scale.linear()
.domain([0,max_time + 1])
.range([height,0]);
//Setting up the axes
var xAxis = d3.svg.axis()
.scale(x)
.orient("bottom")
.ticks(5);
var yAxis = d3.svg.axis()
.scale(y)
.ticks(5)
.orient("left");
//Add the axis to our svg element
// x-axis
svg.append("g")
.attr("class", "x axis label")
.attr("transform", "translate(" + 0 + "," + height + ")")
.call(xAxis)
.append("text")
.text("number of rows")
.attr("x", width)
.attr("y", -6)
.style("text-anchor", "end");
// y-axis
svg.append("g")
.attr("class", "y axis label")
.call(yAxis)
.append("text")
.text("time")
.attr("transform", "rotate(-90)")
.attr("y",6)
.attr("dy", ".71em")
.style("text-anchor","end")
//For the mouseover bubbles
var tooltip = d3.select("body")
.append("div")
.attr("class","tooltip")
.attr("style","display:none")
var datas;
d3.csv("obs_timings.csv", function(d)
{
return
{
time : +d.time,
size : +d.size
};
},
function(data)
{
datas = data;
svg.selectAll("circle")
.data(datas)
.enter()
.append("circle")
.attr("cx", function(d) {return x(d.size); })
.attr("cy", function(d) {return y(d.time); })
.attr("r", 5)
.on("mouseover", function(d)
{
tooltip.transition()
.duration(100)
.style("opacity", .9);
tooltip.html("(" + d3.round(d.size) + "," + d3.round(d.time,2) + ")") //what to display on mouseover
.style("left", (d3.event.pageX + 5) + "px")
.style("top", (d3.event.pageY - 28) + "px")
.style("display", "block")
})
.on("mouseout", function(d)
{
tooltip.transition()
.duration(500)
.style("opacity",0);
});
});
</script>
</body>
</html>
A few things that tripped me up:
- I found the
d3.csv
function really confusing at first. According to the d3.js documentation, this function looks like: d3.csv(url[, accessor][, callback]). Not knowing what either an accessor function or a callback function does with any great clarity, it was no wonder that I run into problems with the scope of the function.It turns out that code for generating the graph should be written inside the d3.csv function, as a callback function. If I try to access it outside the function, my code will return errors and behave strangely. I found that I could execute my commands in the javascript console and everything would work perfectly, but when I ran it as a single script, it wouldn’t work. The reason for this? Javascript runs asynchronously. That’s where callback functions come into play - functions that get invoked once the data is loaded. Otherwise, you’re trying to work with data that hasn’t been loaded and parsed, and that’s not going to end romantically. - SVG’s x and y attributes are absolute coordinates, while dx and dy are relative coordinates, with respect to the specified x and y. This appears often in code that looks somewhat like
group.append("text")
.text("My text")
.attr("x",function (d) {return d;})
.attr("y",200)
.attr("dy","0.35em")
.attr("transform", function(d, i) {return "rotate("+45*i+","+d+",200)";});
When dy is used together with em (scalable font size units) like .attr("dy","0.35em")
, it aligns text relative to the specified y coordinate. The scalable font units allow the positioning of the text to be dependent on the size of the text. Here, .35 em has the effect of vertically centering the text by adding half the height of the text to the y dimension. This could have many uses, and a common one is axis labels.