Validation of Sparkline Computer Code
How can we collect, check, validate, and certify the major computer implementations of sparklines?
Checks and validations should include (1) robust functioning of the code, particularly in the face of wild data or in the face of corrupting interactions with other aspects of computing, (2) closeness of output to the sparkline concept (word-like, high resolution, integrated with text and numbers, contextual labeling, and the availability of “boxing” for each sparkline (as in the euro example in the sparkline chapter, with begin/ end and high/low dots and numbers), (3) ease of use by innocent users trying to implement the code, (4) avoiding malicious codes, and (5) demonstration, by examples, of the output. Maybe we should develop a set of test data that each implementation must handle.
How do we validate the validators? What does Linux do? We’ll need real names of those involved, working email addresses, and possibly even some credentials. We don’t want to get too bureaucratic about this but we need to maintain rigorous standards.
This project gains strength, I hope, from its open-source character: publicly available code, public reviews, many Kindly Contributors. At this board, the result is that we would provide for various sparkline coding schemes a public validation/certification.
This project needs some discussion or at least an example validation/certification. All told, there are probably somewhere between 25 to 50 necessary codings for various products and various languages (LaTex, Word, Excel, Flash, Illustrator, the major statistics packages, scientific data analysis, Microsoft, Apple, Linux and so on). Or is there a way to build this in at the OS level?
What do our Kindly Contributors think? Some discussion is needed before we undertake this useful project, which ultimately should make high-quality sparklines available to everyone regardless of computer system or application. This should accelerate the use of sparklines and also avoid having endless messed-up proprietary codes that fail to maintain the integrity of the sparkline design.
If this works, one could imagine a few other good data designs worthy of the same process.
If the goal is to add functionality to existing programs, then it seems to me that the page offered here should be a menu of downloads that plug into existing programs, open source or not. I’m thinking the way Adobe and Documents to Go embed their own toolbars in Microsoft Office products. Literally, a user should be able to read a menu of plug-ins, click “Sparklines for Office 2003” and that starts the download and then the install inserts the toolbar. I, however, don’t know what it takes to get to that level from having the basic source code for a single implementation. Can anyone elucidate the tasks involved developing such a product? The skills required for those tasks would probably be a starting point for credentials.
That, of course, is a client-side solution. Is there a server-side solution? I think the data would have to be on the machine making the sparkline. True?
Before the construction of an open source project on Sparklines, it would likely be worth cataloging an initial set of Sparklines deemed viable for screen display. Importantly a standard XML data structure defined with proper DTD and documentation should be created based on those viable examples. Several types may be needed to accommodate the set. A ubiquitous standard would allow language agnostic interpretation of the data sets.
When data sources are no longer esoteric, then we can begin to disect deployment method alternatives.
I wrote about some ideas a few weeks ago and provided some XAML to show how Sparklines might look.
http://www.primordia.com/blog/archives/2004/12/xaml_sparklines.html
To be honest, the ideas presented there were merely a different way of looking at the issue which was raised by others. Credits and links to these persons are in the entry.
The whole exciting thing (for me anyway) is the final markup snippet suggested in the blog entry, which makes sparklines literally as easy as text.
I think we should follow the lead of Agile Software Development‘s Test Driven Development. For example, when someone wants to port the Framework for Integrated Test (FIT) to a new programming language, they need to demonstrate that their FIT implementation passes the published acceptance tests before being
If we publish a set of very small, very simple test cases, we could judge each implementation by its ability to pass these tests. The tests could be gathered into suites according to area of interest and would represent a set of executable specifications and a very precise form of documentation. Furthermore, we could demand that the implementation software be accompanied by the testing software so that we could independently verify compliance in our environment, i.e., follow the spirit of The Scientific Method.
I recently proposed a similar tack for innovators that publish computational simulation models and algorithms in the article CFD: A Castle in the Sand?
“we should follow the lead of Agile Software Development’s Test Driven Development. For example, when someone wants to port the Framework for Integrated Test (FIT) to a new programming language, they need to demonstrate that their FIT implementation passes the published acceptance tests before being blessed.”
“Furthermore, we could demand that the implementation software be accompanied by the testing software so that we could independently verify compliance in our environment”
How about trying to make it easy for Kindly Contributors to understand what a good sparkline implementation has to do, rather than forcing a development methodology upon them? Sparkline implementors are generally volunteer Kindly Contributors with a wide variety of backgrounds working in a cornucopia of languages and applications, many of which do not support faddish development methodologies.
A good start would be clear requirements and some standard data sets for validation. A beauty of sparklines is that they enable at-a-glance comparison across small multiples. Let contributors provide sparklines that their implementations generate from standard validation test data sets as evidence of their conformance. And those interested in sparklines can see, by visual inspection, if an implementation passes muster.
Speaking as someone who implemented sparklines for Photoshop, what would make my life easier is a standard representation for data sets and the sparkline’s appearance (XML would be fine), standard data sets for testing and validation, and clear requirements on the characteristics that sparklines should have. Implementation decisions I made that would have benefited from rules or guidelines mostly involved scaling, sizing and plotting. For example, if the sparkline is going to be clipped by a boundary, what to do? Is auto-scaling the x and/or y axis a good idea, or not? What’s a good default size for a sparkline — equivalent to 12 points of text, or something else? Curve fitting/smoothing or not? Where should the y intercept be? Should it be shown as a line? It seems that this is where the best “bang for the buck” would be in terms of rigor.
If both the implementation and the test data sets are available, anyone with an interest can verify compliance by generating the test sparklines from the test data set. I know testing is not a proof of correctness, but it’s a good balance between ease and rigor for this type of code development.
Mathew
I vote they should be scalable, that is, vector-based, like fonts.
Although vector is a great deployment method, a program might produce a sufficient dpi raster graphic in pdf or bmp format that would be acceptable for offline viewing. The choice of vector may be more for the software user.
I am currently hunting for information on GXL, GraphML, and some other formats. If anyone can find anything, let me know. We may learn something from previous attempts along the trail. So far it looks like GraphML fell of the grid a short while ago, although many programs seem to support it. GXL is an graph exchange language http://www.gupro.de/GXL/dtd/gxl-1.0.html , used as a serialised layer between different programs and systems which must share graph data. GXL, it seems, is frequently misused to transport just about anything though. The input-output rules of GXL struck me as interesting, only because the specification allows for the standardization of output data as well input. A system that can display a Sparkline in a propriatary format should at least be allowed to output the data in a public ML format as well. This output ability may be a large factor in qualifying any software as a diffrentiator between a true Sparkline tool, and a mere view level Sparkline deployment method.
I’ll keep hunting for the GraphML specifications, but it may be a cold lead.
I agree with Mathew Lodge. What is need are clear specifications of what a
sparkline actually is. This is before we even start talking about XML and other
development definitions.
To get the ball rolling this is what I have in mind –
You would then go on to state, for example, that the YHeight is normally equal
to the distance between the ascenders and descenders of the embedding font and
that the XWidth should be 12 times the YHeight.
Obviously the actual terms used will come from concensus and would require
other definitions such as colour terms for each of the above, line widths, dot
diameters etc.
Here’s a sample schema (w3 xsd) for the components that Andrew Nicholls purposed. It’s a first draft and I believe that we should agree on which items are optional or not. This schema below can be used to validate a XML document which would be consumed by the rendering engine, be it SVG, XAML, GXL, Flash or whatever your heart desires.
Sean
See what W3C is doing at
http://w3.org/Graphics/
Spectacular on-the-fly generation of sparkline-like data graphics from the National Center
for Biotechnology Information: the NCBI Map Viewer. Check out this representation of the
Drosophila genetic map, a segment of which is linked. The coolest bit is the
drawing on the left of the chromosome’s banding patterns. Clicking the chromosome lets
you zoom in/out. Very, very cool.
Very good; these people aren’t fooling around.
One aspect of sparklines I haven’t seen discussed is the problem of embedding them
within the text stream in the context of the web. It’s been suggested, for example, that the
height of a sparkline should be limited to the distance between the ascender and
descender of the surrounding font. Unfortunately, people with poor eyesight and those
who use non-IE web browsers routinely scale their font size, potentially creating some
ugly intra-line spacing problems.
It’s my thinking that an existing and truly open standard for generation of sparklines should be used. I like the previous posts about definition of the “bones” of what a sparkline actually is.
SVG provides everything one needs to generate the actual graphics. A simplified XML grammar can be used in conjunction with XSLT to provide an easy input method that could be transformed into just about any graphic file format imaginable on just about any platform.
This problem has largely been solved with the exception of the easy-input language, which may or may not be necessary. The validation tools for SVG, XSLT and XML are quite mature and are easily and freely attained. In fact, this is exactly what this tool-chain was developed for.
So, I would propose designing a simple input XML grammar (SparkML?) that could easily be converted to SVG and then output to anything. Potential uses include the conversion to an SVG-Font Character (UNICODE-based), embedded as an EPS in an InDesign document, exported as a PDF, generate a Flash movie…endless possibilities with easy-to-use and generally available tools. Why reinvent the tool chain?
This solution is completely platform independent and based on open source and open standards (unlike XAML, which is Microsoft based – no vendetta here, just an observation).
This is my 2 cents. Your mileage may vary.
–Aaron
I’m a bit surprised that no one bit on Bil Kleb’s suggestion above that a standard suite of test data and results would be of value. Perhaps his reference to a particular kind of development approach swamped the real contribution.
I understood ET to be asking how we could validate various sparkline implementations, and Bil to be suggesting that a suite of standard tests would be of value. I’d suggest that a suite of tests would rather nicely address most of ET’s five concerns about checks and validation.
I’m sure that anyone building sparkline code would in fact test it as carefully as they knew how: we’re all concerned professionals here. I, for one, would value having some standard tests and standard output to compare with. If the tests were in any rational format, XML, comma separated, doesn’t really matter, they’d be easy enough to translate into a program that I’d write.
I’d suggest that a common style “Report on a Sparkline Implementation” might include a page showing what the program does to all the standing tests. We could see at a glance that the contributed implementation actually works.
Hi all,
I am the architect/designer of a Java performance management product that has tried to put in practice the advice found in Mr. Tufte’s books and website postings. We recently released a new version of the product where we attempted to incorporate some of the early postings of the of the forthcoming book – Beautifiul Evidence.
The Sparklines thread caught my attention and we did try to incorporate them into the product especially for time series analysis and distributions. After factoring in interactivity, which is in general not focused on in the books, we were forced to add additional pixels to help with correlation across different visible metrics and allow for a horizontal tolerance in the matching of mouse points to measurement time intervals.
I hope to revisit our sparklines like graphics and see whether we can achieve a much better implementation both in terms of visualization and code design (better outliner handling). I am very interested in this proposed standards work.
If you are interested in seeing some our work that tries best to adhere to the promoted information visualization design pattens please visit the web links below. I hope you find the call stack table cell visualizations as well as our table thumbnails interesting application of Mr Edward Tufte’s principles.
Kind regards,
William
The links for screenshots are:
A very loose interpretation of sparklines:
http://www.jinspired.com/products/jdbinsight/downloads/new-in-3.0.html
Time series analysis:
http://www.jinspired.com/products/jdbinsight/downloads/new-in-2.5.html
Table thumbnails and embedded bar charts:
http://www.jinspired.com/products/jdbinsight/whatsnew.html
Regards,
William
Problems with several of the implementations of sparklines scattered over all our sparkline threads:
Need to be more thoughtful about aspect ratios (too many very flat sparklines in particular). See page 16 of the chapter:
https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1&topic=
Sparklines are word-like and typographic. Keep the size and line weights consistent with the local typography. Surround sparklines with type; for example, box-in the sparklines as shown on page 8 of the chapter:
https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1&topic=
It’s very hard to generate a comprehensive test suite for something like this with so many different implementations, but tests should focus on the following:
– Boundary conditions on maxima and minima, both of the sparkline and the average-range-highlight
– Correctness in plotting the line
– Aspect ratio calculation: perhaps we could work to find a formula for calculating this? This should take into consideration the number and range of datapoints with respect to the local typeface size
Along with some unit tests that could be attached to implementations, there is no substitute for careful analysis of the code and comparisons of output to a reference implementation. Developing a series of pathological data sets which strain the resolution, aspect ratio, and variance of a sparkline would probably be the best approach. Current implementations could be tested, authors notified, and results posted on a webpage.
I’m willing to help out with this.