A Bright, Shiny Service: Sparklines
by Joe Gregorio
|
Pages: 1, 2
A Web Application for Sparklines
Not everybody is a web services hacker either, so let's put together a web application that will allow anyone to create a sparkline interactively. Let's build our web application using JavaScript to create a smoother application. We're not going to go whole hog on this little app (No sliding tiles a la Google Maps.); we'll just use JavaScript to reduce the number of round trips to the server.
The first step is to build our form, which has all the form controls that we need to specify a sparkline. The first problem we run into is that the parameters are different if we are making a discrete sparkline as opposed to a smooth sparkline. We only want to show the parameters that are relevant. That can be accomplished by tweaking the CSS of the page on the fly via JavaScript. We'll enclose each section of type specific controls in a <div/> and when that type of sparkline is selected we'll show that div by setting its display property. Similarly, we'll hide the divs of the controls that are not relevant.
The JavaScript for this is heavily table driven, there are only 38 lines of non-table code. Here are the tables:
// All the controls for the sparkline graphing, mapped
// to the events we use to track if they have changed,
// and the function to call when that event occurs.
var controls = {
'type_s': ['onclick', create_swapper('type_s')],
'type_d':['onclick', create_swapper('type_d')],
'd': ['onchange', controlChanged],
'height': ['onchange', controlChanged],
'min': ['onclick', controlChanged],
'max': ['onclick', controlChanged],
'last': ['onclick', controlChanged],
'step': ['onchange', controlChanged],
'upper': ['onchange', controlChanged],
'above-color': ['onchange', controlChanged],
'below-color': ['onchange', controlChanged],
'min-color': ['onchange', controlChanged],
'max-color': ['onchange', controlChanged],
'last-color': ['onchange', controlChanged],
};
// Each type of curve takes a different set of parameters
parameters_per_type = {
"smooth" : ['d', 'height', 'min', 'max', 'last',
'min-color', 'max-color', 'last-color', 'step'],
"discrete" : ['d', 'height', 'upper', 'above-color', 'below-color']
};
// Different controls have different ways of
// having their values accessed
parameters_accessor = {
'd': 'value',
'height': 'value',
'min': 'checked',
'max': 'checked',
'last': 'checked',
'step': 'value',
'upper': 'value',
'above-color': 'value',
'below-color': 'value',
'min-color': 'value',
'max-color': 'value',
'last-color': 'value'
}
// Associates the type of sparkline with the div that
// contains the controls specific to it.
var shape_specific_divs = {
'type_s': 'smooth_specific',
'type_d': 'discrete_specific'
};
The "controls" table lists all the controls in the form and maps the control id to the event they fire when they change and a pointer to the function to call when the event occurs. This table will make hooking up all the form control events to the right callback function easy. We can just loop over each entry in this table, find the specified control, and hook the listed function in the controls event.
The "parameters_per_type" table categorizes all the controls based on which ones apply based on the type of sparkline. This table makes it easy to construct the URI of the sparkline.
The "parameters_accessor" table lists all the controls and maps their id to the name of the property that you use to access the value of the control. Yes, believe it or not, HTML has some inconsistencies; luckily we can again use a table driven design to hide those problems.
The "shape_specific_divs" table maps the type of sparkline, either smooth or discrete, to the divs that contain the controls that are specific to each type of sparkline. This is the table we use when we hide and show controls based on the type of sparkline the user wants to create.
Now for the code:
function controlChanged() {
var type = "discrete"
for (shape in shape_specific_divs) {
if (document.getElementById(shape).checked) {
type = document.getElementById(shape).value;
}
}
var output_uri = 'spark.cgi?type=' + type;
var parameters = parameters_per_type[type];
for (var i=0; i<parameters.length; i++) {
output_uri = output_uri + "&"
+ parameters[i] +"="
+ document.getElementById(parameters[i])
[parameters_accessor[parameters[i]]];
}
document.getElementById('output_uri').value =
'http://bitworking.org/projects/sparklines/' + output_uri;
document.getElementById('output_img').src = output_uri;
return true;
}
function create_swapper(choice) {
return function swap_specific() {
for (type in shape_specific_divs) {
var s = document.getElementById(shape_specific_divs[type]);
if (type == choice) {
s.style.display = 'block';
} else {
s.style.display = 'none';
}
}
controlChanged();
}
}
function setup() {
for (id in controls) {
document.getElementById(id)[controls[id][0]] = controls[id][1];
}
controlChanged();
}
The setup function is the first function called when the
page is loaded. It hooks up all the control events to the correct
callback function. I told you the controls table would make this
function easy.
The controlChanged function is the
function that gets called every time a control gets
updated. The function scoops up all the values in the
controls and builds the URI of the new sparkline. It then
updates the DOM of the page to use the new URI.
You may have noticed that I've not really been telling the
whole story. Almost every control event calls
controlChanged, but there are two exceptions.
The exceptions are the "type_s" and "type_d"
controls. These are the radio buttons that are used to
select between discrete and smooth sparkline types. When
those radio buttons change, not only do we want to update
the sparkline image, but we also need to swap out the divs that
contain the type-specific controls. To do that we could
have created two functions, one for "type_s" and another
for "type_d," but their code would have been too
similar.
In each case the function would have just
looped through all the divs for type-specific controls,
displayed the div we needed, and hidden all the rest. We
avoid writing two functions by creating a function
create_swapper that returns functions. We pass
the name of the div we want displayed into
create_swapper and in turn it returns a
function that, when called, will display that div, hide all
the rest of the shape_specific_divs, and then call
controlChanged to update the sparkline image.
The thing returned by create_swapper is not
actually just a function since it also keeps around the
value of choice. That difference changes the
return value from merely a function to a
closure. You
may find it easier to
learn
about continuations first, of which closures are a
specific type.
Optimizing
Our application would be even faster if we could cut down on the number of GETs we did for each image. How can HTTP help us optimize?
- ETags
-
The ETags: and If-None-Match: headers are used to change a regular GET into a conditional GET. The idea is that when you do the first GET on a resource an entity tag is returned in the ETag header. That entity tag is then sent in an If-None-Match: header on each following GET to the same resource. If the resource is unchanged then the server can detect this by looking at the entity tag and the GET is not performed and a response of 304 (Not Modified) is returned.
Client Server | ----------GET-----------> | | | | <-- Response + ETag ----- |If we then do a subsequent GET and the resource has changed then we will receive a full response:
Client Server | --GET+If-None-Match-----> | | | | <------ Response -------- |The speed increase comes when we do a subsequent GET and the resource has not changed. In that case we get a 304 Not Modified response from the server, which contains no response entity.
Client Server | ----GET+If-None-Match---> | | | | <-----304 Not Modified--- |That means if our image is unchanged, then the response body is empty. Now 100% reduction in size, that's what I call a good performance increase.
To implement conditional GET here we need to add two items to our implementation. The first is the generation of the ETag: header. For that we need to have a good algorithm for generating an entity tag, the value of the ETag header. We need a value that will be the same for images that are the same, and different for images that are different. Since all the information that defines an image is in the query parameters, then we should begin by trying to generate a value from that. In Python:
print 'ETag: "%d"' % hash(os.environ['QUERY_STRING'])But there is one more thing that could cause the image to change for the very same query string. That would be if we upgrade our CGI script and modify how the images are constructed. So to be perfectly safe we should also include the version of our CGI application in the hash:
print 'ETag: "%d"' % hash(os.environ['QUERY_STRING'] + __version__)iThe
okfunction can be modified to return the ETag header:def ok(): print "Content-type: image.gif" print "Status: 200 Ok" print "ETag: " + str(hash(os.environ['QUERY_STRING'] + __version__)) print ""The second addition will be the check for a match if the If-None-Match: header is included in the request.
if_none_match = os.environ.get('HTTP_IF_NONE_MATCH', '') if if_none_match and str(hash(\ os.environ.get('QUERY_STRING', '') + __version__)\ ) == if_none_match: not_modified()And we introduce the
not_modified()function, which just issues a 304 Not Modified and exits:def not_modified(): print "Status: 304 Not Modified" print "" sys.exit()So does this really save us anything? Here is an excerpt from my log file showing the frequently requested images used in the web application: the first GET returns the full image; the second GET is conditional and gets a response with no entity body and a status code of 304 Not Modified since the image hasn't changed since we last requested it.
68.221.46.94 - - [12/Jun/2005:22:50:24 -0400] "GET /projects/sparklines/spark.cg...tep=3 HTTP/1.1" 200 452 "-" "curl/7.11.1 (i686-pc-cygwin) libcurl/7.11.1 OpenSSL/0.9.7g zlib/1.2.2"68.221.46.94 - - [12/Jun/2005:22:50:52 -0400] "GET /projects/sparklines/spark.cg...tep=3 HTTP/1.1" 304 - "-" "curl/7.11.1 (i686-pc-cygwin) libcurl/7.11.1 OpenSSL/0.9.7g zlib/1.2.2"That first response is 391 bytes long, and includes just the image, not the headers. The next time the same request comes along the conditional GET returns a 304 Not Modified and the whole response is 0 bytes long, that's what the dash means. Not only is this saving us bandwidth, it is also saving us computation time since we avoid replotting the sparkline.
- gzip
- We might also see some performance improvements by implementing gzip compression. HTTP allows the client to indicate that it will accept a gzip'd response body by sending an Accept-Encoding: header with the value of "gzip." If the server supports gzip encoding it can then compress the response body and return the Content-Encoding: header with a value of "gzip" to indicate that the body has been compressed.
Rip, Mix and Burn
Now having this shiny, new web service and web application is fun, but the real power of Web 2.0 comes from combining services in new ways. Let's combine our new sparkline web service with data from Technorati. At the beginning of this article I showed a sparkline that displayed the links per month for a URI based on data from Technorati. This is just a matter of combining the two services, taking the output of a Technorati search, and pumping that data in the sparkline web service. Here is the code for that service:
import urllib
import libxml2
import time
LICENSE_KEY = "insert your technorati API key here"
def cosmos(uri, start=0):
"""Get a list of struct_time's for the creation time of a
link to the given URI."""
args = {'url':uri, 'type':'link', 'start':start, 'format':'xml',
'key':LICENSE_KEY, 'limit':'100'}
url = "http://api.technorati.com/cosmos?" + urllib.urlencode(args)
doc = libxml2.parseDoc(urllib.urlopen(url).read())
return [time.strptime(e.content.split(" ")[0], '%Y-%m-%d')
for e in doc.xpathEval("//tapi/document/item/linkcreated")]
def diff_dates(oldest, newest):
""" """
return (newest[0] - oldest[0]) * 12 + newest[1] - oldest[1]
today = time.localtime()
URI = 'http://bitworking.org'
alldates = dates = cosmos(URI)
# We can only get 100 items at a time, keep looping until we
# get less than 100 items to ensure that we get them all.
start_from = 101
while len(dates) == 100:
dates = cosmos(URI, start=start_from)
alldates.extend(dates)
start_from += 100
links_per_month = [0] * (diff_dates(alldates[-1], today) + 1)
for l in alldates:
links_per_month[diff_dates(l, today)] += 1
# Since we indexed by counting the difference in time between
# today and the time of the link creation, we have our list in
# reverse order.
links_per_month.reverse()
max_links_per_month = max(links_per_month)
points = ",".join([str(int(float(d)/max_links_per_month * 100))
for d in links_per_month])
points_unscaled = ",".join([str(d) for d in links_per_month])
print """<html>
<body>
<div>
<p>
<img src="http://bitworking.org/projects/sparklines/spark.cgi?type=smooth&\
d=%s&height=15&min-m=true&max-m=true\
&min-color=red&max-color=blue&step=2" title="%s"/>
<span style="color:red">%d</span>
<span style="color:blue">%d</span>
</p>
</div>
</body>
</html>
""" % (points, points_unscaled, min(links_per_month), max_links_per_month)
The "cosmos" function uses the Technorati API to find the date of creation for all the links to our target URI. Once we have the XML representation we then use libxml2 to pick out the "linkcreated" element from each item. We then parse each time stamp and return the complete list of times. Note that since we only care about the month we don't bother parsing in the time, but only the date.
Since
the Technorati API limits all such query results to 100, we need to loop
and keep getting the next 100 results until we have all the results.
Once we have all the dates, it's only a matter of creating a bin
for each month and incrementing a bin each time we find a link
that was created in that month. After that we use the sparkline web
service to plot the results. For an example of what this script
produces, here
2 43
is the links per month for bitworking.org. Note that if you are using a
capable browser that you can hover you mouse over the sparkline and get a little pop-up window that
shows the raw data used to generate the plot.
Lessons Learned
This was a fun project and in the course I learned quite a few important lessons.
- JavaScript Is Nice
- With support for programming constructs like closures, JavaScript surprised me with its expressiveness and compactness.
- Web Service First
- If possible, build the web service first and then the web application. This helped in several ways. It allows you to better leverage the work you did on the web service by utilizing it in the web application. It also forces you to be a consumer of your web service and that will give you ideas on how to make it better. Third, if you've built a web service and can't find a way to use it in your web application, then maybe you need to go back to the drawing board.
- Leverage GET
- By using GET to retrieve the sparklines we can use ETags and If-None-Match headers to reduce the bandwidth our web service uses. In addition those changes make our web application that much quicker.
- Look Ma, No XML
- While most of the web services we talk about have XML somewhere in them, it's good to have a reminder once in a while that XML isn't a prerequisite for a RESTful web service.
- A limiting technology
2006-05-15 00:48:59 David Byrden - A couple of potential issues
2005-07-14 10:13:59 Matthew Shomphe - Bug, or misunderstanding?
2005-06-26 00:23:42 Jeremy Dunck - Cool, but XML.com should do SVG
2005-06-23 22:36:00 jesse_132 - On the viability of SVG
2005-06-24 05:49:22 Joe Gregorio - plug-in or different browser
2005-07-02 15:14:21 steltenpower - Cool!
2005-06-23 13:04:53 burt4684