baseline for amount scale
October 1, 2001 | John Holm
4 Comment(s)
What do you consider in choosing a baseline figure for the vertical amount scale of a graph? In The Visual Display of Quantitative Information (second edition), pages 68 and 74-75, I noticed that you chose nonzero baselines.
In general, in a time-series, use a baseline that shows the data not the zero point. If the zero point reasonably occurs in plotting the data, fine. But don’t spend a lot of empty vertical space trying to reach down to the zero point at the cost of hiding what is going on in the data line itself. (The book, How to Lie With Statistics, is wrong on this point.)
For examples, all over the place, of absent zero points in time-series, take a look at any major scientific research publication. The scientists want to show their data, not zero.
The urge to contextualize the data is a good one, but context does not come from empty vertical space reaching down to zero, a number which does not even occur in a good many data sets. Instead, for context, show more data horizontally!
.
Sometimes using a zero base line makes no sense at all. For
example, a graph of the variations in a patient’s temperature over
time is useful only if the baseline slightly below the normal
temperature of 97.3 degrees F in order to readily reveal slight
changes and the trend.
The New York Times regularly publishes graphs depicting
newsworthy changes in the stock price of selected
publicly-traded companies. In one regular feature in its Financial
Section, stock-price-change graphs for a dozen or so companies
are shown in a single-panel, small-multiples format but each
graph has–until recently–been constructed with varying
baselines and y-axis scales so the extent of price variation is not clearly revealed.
The practice of showing many graphs with different scales in
juxtaposition has always been vexing to me since my eye tends
to be drawn to notice and unconsciously compare the magnitude
of price change depicted in the trend line of each graph without
adjusting for variations in the y-axis scale. If, OTOH, I try to
consciously think through the significance of the depicted
change from graph to graph by mentally adjusting for the
observable differences on the y-axis, I find I am working way too
hard and the supposed value of the visual information goes
negative.
Fortunately, the NYT recently reconsidered its designs and now
chooses base lines for its cluster of multiples so that the
magnitude of the change depicted from graph to graph is
proportional. In other words, a $1 change in the price of a $10 per share price is shown to be twice as great as a $1 change in a $20 per share price.
Economists usually show comparisons of change in long
economic time series by using log scales with all the data lines
shown on a single graph to assure proportional change among
various time series is properly revealed. However, general
interest audiences are not comfortable with that method.
I think that the general answer is, as ET stated, to select a baseline and scale that accurately highlights the information you need to convey. The value of the baseline isn’t nearly so important as the information conveyed in the rest of the plot. You might do well to remove axis ticks and labels when initially creating your figures, then add them back in at the end of the process.
Below are some slightly more specific examples from my own work.
For control charts of in-control processes, typical published graphs have a baseline that places the lower control limit about one-fifth of the total scale up from the baseline, and a scale that places the upper control limit about one-fifth of the total scale from the top of the y-axis.
For the kind of data I often work with, I find it convenient to set vertical scales to the total expected (or acceptable) measurable range, even though that is typically rather larger than the range on the current data. This way I can see the variation of the current data set within the context of the total range that it might vary over. For instance, there may be a “floor” value of 10 and a “ceiling” (or “cut-off”) value of 15, and the current data set might actually vary from 12 to 13. I can see what the variation looks like and where it falls within the overall limits. I would set the baseline to the “floor” value. There are some obvious limitations to this approach. Data that varies by less than about 10% of the total range will look artificially flat, for instance, though such a case may be ideal for a small multiple with one plot scaled to the data’s range and the other plot scaled to the total range. This is rather similar to Loren Needles’ example of body temperature.
In other cases, I try to ensure a baseline and scale that highlights patterns in the data, in a manner similar to the example in Visual Display of Quantitative Information of sunspot activity (if I remember correctly) scaled to highlight the sinusoidal nature of the variation, or to Loren Needles’ example from the New York Times. Depending on the audience and medium, I might have to back the baseline way off from the data, or set it to the minimum data point’s y-value.