baseline for amount scale

October 1, 2001  |  John Holm
4 Comment(s)

What do you consider in choosing a baseline figure for the vertical amount scale of a graph? In The Visual Display of Quantitative Information (second edition), pages 68 and 74-75, I noticed that you chose nonzero baselines.

Comments
  • Edward Tufte says:

    In general, in a time-series, use a baseline that shows the data not the zero point. If the zero point reasonably occurs in plotting the data, fine. But don’t spend a lot of empty vertical space trying to reach down to the zero point at the cost of hiding what is going on in the data line itself. (The book, How to Lie With Statistics, is wrong on this point.)

    For examples, all over the place, of absent zero points in time-series, take a look at any major scientific research publication. The scientists want to show their data, not zero.

    The urge to contextualize the data is a good one, but context does not come from empty vertical space reaching down to zero, a number which does not even occur in a good many data sets. Instead, for context, show more data horizontally!
    .

  • Loren R. Needles says:

    Sometimes using a zero base line makes no sense at all. For
    example, a graph of the variations in a patient’s temperature over
    time is useful only if the baseline slightly below the normal
    temperature of 97.3 degrees F in order to readily reveal slight
    changes and the trend.

  • Loren R. Needles says:

    The New York Times regularly publishes graphs depicting
    newsworthy changes in the stock price of selected
    publicly-traded companies. In one regular feature in its Financial
    Section, stock-price-change graphs for a dozen or so companies
    are shown in a single-panel, small-multiples format but each
    graph has–until recently–been constructed with varying
    baselines and y-axis scales so the extent of price variation is not clearly revealed.

    The practice of showing many graphs with different scales in
    juxtaposition has always been vexing to me since my eye tends
    to be drawn to notice and unconsciously compare the magnitude
    of price change depicted in the trend line of each graph without
    adjusting for variations in the y-axis scale. If, OTOH, I try to
    consciously think through the significance of the depicted
    change from graph to graph by mentally adjusting for the
    observable differences on the y-axis, I find I am working way too
    hard and the supposed value of the visual information goes
    negative.

    Fortunately, the NYT recently reconsidered its designs and now
    chooses base lines for its cluster of multiples so that the
    magnitude of the change depicted from graph to graph is
    proportional. In other words, a $1 change in the price of a $10 per share price is shown to be twice as great as a $1 change in a $20 per share price.

    Economists usually show comparisons of change in long
    economic time series by using log scales with all the data lines
    shown on a single graph to assure proportional change among
    various time series is properly revealed. However, general
    interest audiences are not comfortable with that method.

  • Tom Hopper says:

    I think that the general answer is, as ET stated, to select a baseline and scale that accurately highlights the information you need to convey. The value of the baseline isn’t nearly so important as the information conveyed in the rest of the plot. You might do well to remove axis ticks and labels when initially creating your figures, then add them back in at the end of the process.
      Below are some slightly more specific examples from my own work.
      For control charts of in-control processes, typical published graphs have a baseline that places the lower control limit about one-fifth of the total scale up from the baseline, and a scale that places the upper control limit about one-fifth of the total scale from the top of the y-axis.
      For the kind of data I often work with, I find it convenient to set vertical scales to the total expected (or acceptable) measurable range, even though that is typically rather larger than the range on the current data. This way I can see the variation of the current data set within the context of the total range that it might vary over. For instance, there may be a “floor” value of 10 and a “ceiling” (or “cut-off”) value of 15, and the current data set might actually vary from 12 to 13. I can see what the variation looks like and where it falls within the overall limits. I would set the baseline to the “floor” value. There are some obvious limitations to this approach. Data that varies by less than about 10% of the total range will look artificially flat, for instance, though such a case may be ideal for a small multiple with one plot scaled to the data’s range and the other plot scaled to the total range. This is rather similar to Loren Needles’ example of body temperature.
      In other cases, I try to ensure a baseline and scale that highlights patterns in the data, in a manner similar to the example in Visual Display of Quantitative Information of sunspot activity (if I remember correctly) scaled to highlight the sinusoidal nature of the variation, or to Loren Needles’ example from the New York Times. Depending on the audience and medium, I might have to back the baseline way off from the data, or set it to the minimum data point’s y-value.

Contribute

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.