A Bright, Shiny Service: Sparklines

June 22, 2005

I really was working on the bookmark web service—really, I was!—but I got distracted. What grabbed my attention was Sparklines.

What are Sparklines?

Sparklines, as defined by Edward Tufte, are intense, simple word-sized graphics. They are small graphics embedded within a context of words or numbers. They are described in a chapter, which he's published on the Web, of his soon to be released book Beautiful Evidence. That article has been up for a while, but looking at the number of new links to it per month as recorded by Technorati 0 6 you can see that interest is ramping up. That little graph showing the links per month is a sparkline.

The BitWorking Sparkline Generator is my contribution to the Web 2.0: it's a web service, web application, and the source code to both. It's also the subject of this article.

Drawing Sparklines in Python

Sparklines are useful for presenting a large volume of information in a small space using a context sensitive manner. I have test routines that I run regularly and the volume of output can be tremendous. I stumbled onto sparklines and found they are a great way to ease the information overload. I started drawing sparklines in Python using the Python Imaging Library. It is easy to get started with a basic template and then modify the code. Here are some examples:

import Image, ImageDraw

import StringIO



def plot_sparkline(results):

  """Returns a sparkline image as a data: URI.

     The source data is a list of values between

     0 and 100. Values greater than 95

     are displayed in red, otherwise they are displayed

     in green"""



  im = Image.new("RGB", (len(results)*2, 15), 'white')

  draw = ImageDraw.Draw(im)

  for (r, i) in zip(results, range(0, len(results)*2, 2)):

      color = (r > 50) and "red" or "gray"

      draw.line((i, im.size[1]-r/10-4, i, (im.size[1]-r/10)), fill=color)

  del draw



  f = StringIO.StringIO()

  im.save(f, .gif")

  return f.getvalue()

This will produce a sparkline that looks like this:

This is just one kind of plot; it's not that much work to create a different kind of sparkline, for example, one of the continuous plots:

The code for the image above is the following:

def plot_sparkline2(results):

   im = Image.new("RGB", (len(results)+2, 20), 'white')

   draw = ImageDraw.Draw(im)

   coords = zip(range(len(results)), [15 - y/10 for y in results])

   draw.line(coords, fill="#888888")

   end = coords[-1]

   draw.rectangle([end[0]-1, end[1]-1, end[0]+1, end[1]+1], fill="#FF0000")

   del draw 



   f = StringIO.StringIO()

   im.save(f, .gif")

   return f.getvalue()

A note about the limitations of what we can do. We aren't going to reproduce Galileo's drawings of the moons of Jupiter, nor are we going to get the resolution that can be achieved on paper.

On the other hand, we can exploit the advantages of the platform of our choice. We can put info or raw data into the "title" attribute of the image. Putting the raw data into the "title" attribute of the image causes the raw data to be displayed when the mouse hovers over the image. Here is how it looks in FireFox:

A popup showing the raw data that informs a sparkline.

We can also make the sparkline clickable, either making the entire image a link, or using an image map to make parts of the sparkline lead to further resources.

Spreading the Joy via Web Services

Now hacking Python is fun, but it's not for everyone. Let's open this up for everyone to use. First, we'll start by creating a web service. What else did you expect me to do?

For simple sparklines we can use query parameters to pass the data into a CGI application that draws the sparkline. Let's start by reviewing with the four questions we ask when building any web service:

What are the resources? The resources are sparklines. To specify how each sparkline will appear we can pass in the data via query parameters. Our sparkline code only takes one parameter, a list of data with values between 0 and 100. That data can be passed in by a query parameter d whose value is a comma separated list of values between 0 and 100. For example: http://bitworking.org/projects/sparklines/spark.cgi?d=10,20,30,40.
What are their representations? The representations can be in an image format:.gif, GIF, JPEG or maybe even SVG.
What methods do those resources support? GET
What errors could be generated? 4XX if the parameters passed in don't correspond to data that can be graphed.

Now remember our follow-up questions about GETs. Is our use of GETs both safe and idempotent? Retrieving an image is certainly safe, and doing so multiple times still returns the same image, so we are using GET correctly.

Here is a first pass at an implementation of our web service. Note that this is not the service I deployed, it is a much simpler version used here just for exposition:

#!/usr/bin/env python

import cgi 

import cgitb 

import sys 

import os 

 

cgitb.enable() 

 

import Image, ImageDraw 

import StringIO 

import urllib 

 

 

def plot_sparkline(f, results): 

  """Returns a sparkline image as a data: URI.

     The source data is a list of values between

     0 and 100. Values greater than 95

     are displayed in red, otherwise they are displayed

     in green"""

  im = Image.new("RGB", (len(results)*2, 15), 'white')

  draw = ImageDraw.Draw(im)

  for (r, i) in zip(results, range(0, len(results)*2, 2)):

     color = (r > 50) and "red" or "gray"

     draw.line((i, im.size[1]-r/10-4, i, (im.size[1]-r/10)), fill=color)

  del draw

 

  im.save(f, .gif")

 

def plot_error(f): 

  im = Image.new("RGB", (40, 15), 'white')

  draw = ImageDraw.Draw(im)

  draw.line((0, 0) + im.size, fill="red")

  draw.line((0, im.size[1], im.size[0], 0), fill="red")

  del draw

  im.save(f, .gif")

 

def error(status="Status: 400 Bad Request"): 

   print "Content-type: image.gif"

   print status

   print ""

   plot_error(sys.stdout)

   sys.exit()

 

def cgi_param(form, name, default): 

   return form.has_key(name) and form[name].value or default

 

if not os.environ['REQUEST_METHOD'] in ['GET', 'HEAD']: 

   error("Status: 405 Method Not Allowed")

form = cgi.FieldStorage() 

raw_data = cgi_param(form, 'd', '') 

if not raw_data: 

   error()

data = [int(d) for d in raw_data.split(",") if d] 

if min(data) < 0 or max(data) > 100: 

   error()

 

print "Content-type: image.gif" 

print "Status: 200 Ok" 

print "" 

plot_sparkline(sys.stdout, data)

There are a few noteworthy points:

Errors: If some of the parameters are missing, or incorrect, we return an error message that is the same type as a successful response, that is, we return a big red X as a .gif to indicate that there was an error. That's because our service will be used to serve up images that will most likely appear in web pages via the <img/> element. This way if an error occurs there will be visible feedback by the appearance of a large red X.
Methods: Note that we manually restrict our handling of HTTP methods to those of just GET and HEAD. If we don't do this, then our web service will also respond to POST methods. That's because our query parameter parsing library is a little too helpful and will handle POSTed data in an indistinguishable manner from GET requests. In this case it's not really that damaging, but imagine if the tables had been turned and we had created a service that should only respond to POST. Unless we check the incoming method ourselves then our service would gleefully accept both GET and POST requests and treat them as the same. That can lead to ugly problems, particularly if we settled on using POST because the action taken wasn't idempotent or safe.

Full Web Service Description

Here is a full description of the web service as it is deployed today:

            http://bitworking.org/projects/sparklines/spark.cgi

Common Parameters
Parameter	Description
d	The data for the plot. All data values must be between 0 and 100.
height	The height of the image in pixels.
type	"discrete" - One vertical bar per data point. "smooth" - all the points plotted as a continuous line.

If the type is "smooth" then the following parameters apply:

"Smooth" Parameters
Parameter	Description
min-m	If set to 'true', then place a special marker at the smallest value in the data set.
max-m	If set to 'true', then place a special marker at the largest value in the data set.
last-m	If set to 'true', then place a special marker at the last value in the data set.
min-color	The color of the marker placed at the smallest value in the data set.
max-color	The color of the marker placed at the largest value in the data set.
last-color	The color of the marker placed at the last value in the data set.
step	The points are to be plotted every n'th pixel.

If the type is discrete then the following parameters apply:

"Discrete" Parameters
Parameter	Description
upper	Data values â‰¥ upper will be plotted in the `above-color`, otherwise data points will be plotted in the `below-color`.
above-color	The color for data points â‰¥ `upper`.
below-color	The color for data points < `upper`.

Here are some example sparklines and their URIs to get you started.

***Examples***
Sparkline	URI
	`http://bitworking.org/projects/sparklines/spark.cgi? type=smooth&d=10,20,30,90,80,70&step=4`
	`http://bitworking.org/projects/sparklines/spark.cgi? type=smooth&d=10,20,30,90,80,70&step=4&min-m=true&max-m=true`
	`http://bitworking.org/projects/sparklines/spark.cgi? type=smooth&d=10,20,30,90,80,70`

A Web Application for Sparklines

Not everybody is a web services hacker either, so let's put together a web application that will allow anyone to create a sparkline interactively. Let's build our web application using JavaScript to create a smoother application. We're not going to go whole hog on this little app (No sliding tiles a la Google Maps.); we'll just use JavaScript to reduce the number of round trips to the server.

The first step is to build our form, which has all the form controls that we need to specify a sparkline. The first problem we run into is that the parameters are different if we are making a discrete sparkline as opposed to a smooth sparkline. We only want to show the parameters that are relevant. That can be accomplished by tweaking the CSS of the page on the fly via JavaScript. We'll enclose each section of type specific controls in a <div/> and when that type of sparkline is selected we'll show that div by setting its display property. Similarly, we'll hide the divs of the controls that are not relevant.

The JavaScript for this is heavily table driven, there are only 38 lines of non-table code. Here are the tables:

// All the controls for the sparkline graphing, mapped 

// to the events we use to track if they have changed, 

// and the function to call when that event occurs. 

var controls = { 

     'type_s': ['onclick', create_swapper('type_s')], 

     'type_d':['onclick', create_swapper('type_d')], 

     'd': ['onchange', controlChanged], 

     'height': ['onchange', controlChanged],

     'min': ['onclick', controlChanged],

     'max': ['onclick', controlChanged],

     'last': ['onclick', controlChanged],

     'step': ['onchange', controlChanged],

     'upper': ['onchange', controlChanged],

     'above-color': ['onchange', controlChanged],

     'below-color': ['onchange', controlChanged],

     'min-color': ['onchange', controlChanged],

     'max-color': ['onchange', controlChanged],

     'last-color': ['onchange', controlChanged],

}; 

 

// Each type of curve takes a different set of parameters 

parameters_per_type = { 

   "smooth" : ['d', 'height', 'min', 'max', 'last', 

        'min-color', 'max-color', 'last-color', 'step'],

   "discrete" : ['d', 'height', 'upper', 'above-color', 'below-color']

}; 

 

// Different controls have different ways of  

// having their values accessed 

parameters_accessor = { 

  'd': 'value',

  'height': 'value',

  'min': 'checked',

  'max': 'checked',

  'last': 'checked',

  'step': 'value',

  'upper': 'value',

  'above-color': 'value',

  'below-color': 'value',

  'min-color': 'value',

  'max-color': 'value',

  'last-color': 'value'

} 

 

// Associates the type of sparkline with the div that 

// contains the controls specific to it. 

var shape_specific_divs = { 

  'type_s': 'smooth_specific',

  'type_d': 'discrete_specific'

};

The "controls" table lists all the controls in the form and maps the control id to the event they fire when they change and a pointer to the function to call when the event occurs. This table will make hooking up all the form control events to the right callback function easy. We can just loop over each entry in this table, find the specified control, and hook the listed function in the controls event.

The "parameters_per_type" table categorizes all the controls based on which ones apply based on the type of sparkline. This table makes it easy to construct the URI of the sparkline.

The "parameters_accessor" table lists all the controls and maps their id to the name of the property that you use to access the value of the control. Yes, believe it or not, HTML has some inconsistencies; luckily we can again use a table driven design to hide those problems.

The "shape_specific_divs" table maps the type of sparkline, either smooth or discrete, to the divs that contain the controls that are specific to each type of sparkline. This is the table we use when we hide and show controls based on the type of sparkline the user wants to create.

Now for the code:

function controlChanged() { 

   var type = "discrete"

   for (shape in shape_specific_divs) {

     if (document.getElementById(shape).checked) {

        type = document.getElementById(shape).value;

     }

   }

   var output_uri = 'spark.cgi?type=' + type;

   var parameters = parameters_per_type[type];

   for (var i=0; i<parameters.length; i++) {

     output_uri = output_uri + "&" 

           + parameters[i] +"=" 

           + document.getElementById(parameters[i])

		      [parameters_accessor[parameters[i]]];

   }

   document.getElementById('output_uri').value = 

      'http://bitworking.org/projects/sparklines/' + output_uri;

   document.getElementById('output_img').src = output_uri;

 

   return true;

} 

 

function create_swapper(choice) { 

   return function swap_specific() {

     for (type in shape_specific_divs) {

        var s = document.getElementById(shape_specific_divs[type]);

        if (type == choice) {

          s.style.display = 'block';

        } else {

          s.style.display = 'none';

        }

     }

     controlChanged();

   }

} 

 

function setup() { 

   for (id in controls) {

     document.getElementById(id)[controls[id][0]] = controls[id][1];

   }

   controlChanged();

}

The setup function is the first function called when the page is loaded. It hooks up all the control events to the correct callback function. I told you the controls table would make this function easy.

The controlChanged function is the function that gets called every time a control gets updated. The function scoops up all the values in the controls and builds the URI of the new sparkline. It then updates the DOM of the page to use the new URI.

You may have noticed that I've not really been telling the whole story. Almost every control event calls controlChanged, but there are two exceptions. The exceptions are the "type_s" and "type_d" controls. These are the radio buttons that are used to select between discrete and smooth sparkline types. When those radio buttons change, not only do we want to update the sparkline image, but we also need to swap out the divs that contain the type-specific controls. To do that we could have created two functions, one for "type_s" and another for "type_d," but their code would have been too similar.

In each case the function would have just looped through all the divs for type-specific controls, displayed the div we needed, and hidden all the rest. We avoid writing two functions by creating a function create_swapper that returns functions. We pass the name of the div we want displayed into create_swapper and in turn it returns a function that, when called, will display that div, hide all the rest of the shape_specific_divs, and then call controlChanged to update the sparkline image. The thing returned by create_swapper is not actually just a function since it also keeps around the value of choice. That difference changes the return value from merely a function to a closure. You may find it easier to learn about continuations first, of which closures are a specific type.

Optimizing

Our application would be even faster if we could cut down on the number of GETs we did for each image. How can HTTP help us optimize?

ETags

The ETags: and If-None-Match: headers are used to change a regular GET into a conditional GET. The idea is that when you do the first GET on a resource an entity tag is returned in the ETag header. That entity tag is then sent in an If-None-Match: header on each following GET to the same resource. If the resource is unchanged then the server can detect this by looking at the entity tag and the GET is not performed and a response of 304 (Not Modified) is returned.

Client                          Server

   |  ----------GET----------->   |

   |                              | 

   |  <-- Response + ETag -----   |

If we then do a subsequent GET and the resource has changed then we will receive a full response:

Client                          Server

   |  --GET+If-None-Match----->   |

   |                              | 

   |  <------ Response --------   |

The speed increase comes when we do a subsequent GET and the resource has not changed. In that case we get a 304 Not Modified response from the server, which contains no response entity.

Client                          Server

   |  ----GET+If-None-Match--->   |

   |                              | 

   |  <-----304 Not Modified---   |

That means if our image is unchanged, then the response body is empty. Now 100% reduction in size, that's what I call a good performance increase.

To implement conditional GET here we need to add two items to our implementation. The first is the generation of the ETag: header. For that we need to have a good algorithm for generating an entity tag, the value of the ETag header. We need a value that will be the same for images that are the same, and different for images that are different. Since all the information that defines an image is in the query parameters, then we should begin by trying to generate a value from that. In Python:


print 'ETag: "%d"' % hash(os.environ['QUERY_STRING'])

But there is one more thing that could cause the image to change for the very same query string. That would be if we upgrade our CGI script and modify how the images are constructed. So to be perfectly safe we should also include the version of our CGI application in the hash:


print 'ETag: "%d"' % hash(os.environ['QUERY_STRING'] + __version__)i

The ok function can be modified to return the ETag header:

def ok():

   print "Content-type: image.gif"

   print "Status: 200 Ok"

   print "ETag: " + str(hash(os.environ['QUERY_STRING'] + __version__))

   print ""

The second addition will be the check for a match if the If-None-Match: header is included in the request.


if_none_match = os.environ.get('HTTP_IF_NONE_MATCH', '')

if if_none_match and str(hash(\

   os.environ.get('QUERY_STRING', '') + __version__)\

   ) == if_none_match:

   not_modified()

And we introduce the not_modified() function, which just issues a 304 Not Modified and exits:


def not_modified():

   print "Status: 304 Not Modified"

   print ""

   sys.exit()

So does this really save us anything? Here is an excerpt from my log file showing the frequently requested images used in the web application: the first GET returns the full image; the second GET is conditional and gets a response with no entity body and a status code of 304 Not Modified since the image hasn't changed since we last requested it.

68.221.46.94 - - [12/Jun/2005:22:50:24 -0400] 

"GET /projects/sparklines/spark.cg...tep=3 

HTTP/1.1" 200 452 "-" "curl/7.11.1 

(i686-pc-cygwin) libcurl/7.11.1 OpenSSL/0.9.7g zlib/1.2.2"

68.221.46.94 - - [12/Jun/2005:22:50:52 -0400] 

"GET /projects/sparklines/spark.cg...tep=3 

HTTP/1.1" 304 - "-" "curl/7.11.1 

(i686-pc-cygwin) libcurl/7.11.1 OpenSSL/0.9.7g zlib/1.2.2"

That first response is 391 bytes long, and includes just the image, not the headers. The next time the same request comes along the conditional GET returns a 304 Not Modified and the whole response is 0 bytes long, that's what the dash means. Not only is this saving us bandwidth, it is also saving us computation time since we avoid replotting the sparkline.

gzip

We might also see some performance improvements by implementing gzip compression. HTTP allows the client to indicate that it will accept a gzip'd response body by sending an Accept-Encoding: header with the value of "gzip." If the server supports gzip encoding it can then compress the response body and return the Content-Encoding: header with a value of "gzip" to indicate that the body has been compressed.

Rip, Mix and Burn

Now having this shiny, new web service and web application is fun, but the real power of Web 2.0 comes from combining services in new ways. Let's combine our new sparkline web service with data from Technorati. At the beginning of this article I showed a sparkline that displayed the links per month for a URI based on data from Technorati. This is just a matter of combining the two services, taking the output of a Technorati search, and pumping that data in the sparkline web service. Here is the code for that service:

import urllib 

import libxml2 

import time 

 

LICENSE_KEY = "insert your technorati API key here" 

 

def cosmos(uri, start=0): 

   """Get a list of struct_time's for the creation time of a 

      link to the given URI."""

   args = {'url':uri, 'type':'link', 'start':start, 'format':'xml', 

      'key':LICENSE_KEY, 'limit':'100'}

   url = "http://api.technorati.com/cosmos?" + urllib.urlencode(args)

   doc = libxml2.parseDoc(urllib.urlopen(url).read())

   return [time.strptime(e.content.split(" ")[0], '%Y-%m-%d') 

      for e in doc.xpathEval("//tapi/document/item/linkcreated")]

 

def diff_dates(oldest, newest): 

   """ """

   return (newest[0] - oldest[0]) * 12  + newest[1] - oldest[1]

 

today = time.localtime()  

URI = 'http://bitworking.org' 

alldates = dates = cosmos(URI) 

 

# We can only get 100 items at a time, keep looping until we 

# get less than 100 items to ensure that we get them all. 

start_from = 101 

while len(dates) == 100: 

   dates = cosmos(URI, start=start_from)

   alldates.extend(dates)

   start_from += 100

 

links_per_month = [0] * (diff_dates(alldates[-1], today) + 1) 

for l in alldates: 

   links_per_month[diff_dates(l, today)] += 1

 

# Since we indexed by counting the difference in time between 

# today and the time of the link creation, we have our list in 

# reverse order. 

links_per_month.reverse() 

 

max_links_per_month = max(links_per_month) 

points = ",".join([str(int(float(d)/max_links_per_month * 100)) 

   for d in links_per_month]) 

points_unscaled = ",".join([str(d) for d in links_per_month]) 

print """<html> 

  <body>

   <div>

    <p>

     <img src="http://bitworking.org/projects/sparklines/spark.cgi?type=smooth&\

              d=%s&height=15&min-m=true&max-m=true\

              &min-color=red&max-color=blue&step=2" title="%s"/> 

              <span style="color:red">%d</span> 

              <span style="color:blue">%d</span>

    </p>

   </div>

  </body>

</html> 

""" % (points, points_unscaled, min(links_per_month),  max_links_per_month)

The "cosmos" function uses the Technorati API to find the date of creation for all the links to our target URI. Once we have the XML representation we then use libxml2 to pick out the "linkcreated" element from each item. We then parse each time stamp and return the complete list of times. Note that since we only care about the month we don't bother parsing in the time, but only the date.

Since the Technorati API limits all such query results to 100, we need to loop and keep getting the next 100 results until we have all the results. Once we have all the dates, it's only a matter of creating a bin for each month and incrementing a bin each time we find a link that was created in that month. After that we use the sparkline web service to plot the results. For an example of what this script produces, here 2 43 is the links per month for bitworking.org. Note that if you are using a capable browser that you can hover you mouse over the sparkline and get a little pop-up window that shows the raw data used to generate the plot.

Lessons Learned

This was a fun project and in the course I learned quite a few important lessons.

JavaScript Is Nice: With support for programming constructs like closures, JavaScript surprised me with its expressiveness and compactness.
Web Service First: If possible, build the web service first and then the web application. This helped in several ways. It allows you to better leverage the work you did on the web service by utilizing it in the web application. It also forces you to be a consumer of your web service and that will give you ideas on how to make it better. Third, if you've built a web service and can't find a way to use it in your web application, then maybe you need to go back to the drawing board.
Leverage GET: By using GET to retrieve the sparklines we can use ETags and If-None-Match headers to reduce the bandwidth our web service uses. In addition those changes make our web application that much quicker.
Look Ma, No XML: While most of the web services we talk about have XML somewhere in them, it's good to have a reminder once in a while that XML isn't a prerequisite for a RESTful web service.