How do design choices affect perception of data visualizations?
[Joint work with Prof. Maneesh Agrawala at UC Berkeley's Visualization Lab and Steve Rubin. ]
A picture is worth 1000 words. As data becomes cheap and ubiquitous, so do data visualizations. I love looking at a beautiful data visualization, but how do the design choices affect the viewer's perception of the data? Anyone who has used a charting program (e.g., Excel) quickly sees that by adjusting the axes one can hide a real trend or highlight a spurious pattern. A NYT Upshot piece highlighted a related phenomenon of seeing trends in noise. As a first step, we examined the perception of statistical patterns in scatterplots. Future work may examine other, more complex, data visualizations.
Scatterplots graphically depict all of the points in a data set and allow viewers to perceive summary statistics (e.g. sample size N, mean μ, standard deviation σ). However, poor design can make it difficult for viewers to accurately extract such statistics from the visualization. We designed a series of crowdsourced graphical perception experiments examining how well people perceive summary statistics in 1D scatterplots of Gaussians, as we vary the data parameters (N, μ, and σ) and the graphic design parameters (transparency, dot size and frame size). Our approach combines experimental methodology from psychophysics with crowdsourcing to build a mapping that describes how changing these six parameters affects discrimination thresholds and estimation bias in the perception of N, μ, and σ. We use this mapping to propose a set of perceptual design guidelines for creating 1D scatterplots that better convey summary statistics. Below we show the the minimum amount needed to discriminate a change, or Just Noticeable Difference (JND), on three different discrimination tasks as a function of dot size and frame size.
One application of this research is to take a data set and set of design constraints, and adjust the remaining design parameters for optimal perception. For example, given a target frame size, it determines the dot size that optimizes perception of summary statistics as shown below.