Published on:

Let’s take a look at a plot from a survey conducted by DigitalWarRoom, its “2015 Ediscovery IQ Meter.” On page 12 of the report, which was published in July 2105, there is a plot that looks quite similar to the plot below. (The reproduction does not have tiny tick marks on the horizontal axis placed at the ends of the axis and between the vertical bars nor does it match the green color gradient of the bars.) Nevertheless, we can draw from it a few lessons in graphical presentation.

Rplot01 Digital green
First, if you label for bars with values, such as the four percentages on top of the four bars, you don’t gain anything from horizontal grid lines. In truth, you clutter the plot. Even odder, the vertical y-axis has no values so the reader can’t even calibrate lines to values!

Second, although the plot above does not show how the original has each bar with the same gradient of darker green at the bottom gradually changing hue to a lighter gree

Published on:

Bearing in mind the benefits of more reproducible research regarding legal management, a piece in the Economist, July 25, 2015 at page 8, makes a good point. That short article explains how pharmaceutical companies have not been publishing results from clinical trials regarding their drug research that are negative or inconclusive. Without the full results, no one can accurately and comprehensively assess the efficacy of a new drug.

 

Stated differently, someone is not practicing reproducible research if they cherry pick only the results that show their drug in clinical trials has met with success. It would be akin to a surveyor who asks law firms or law departments a set of questions but then publicizes only the data that puts the surveyor’s views or products or services in a favorable light. In contrast, research that is done with integrity discloses contradictory findings, unexplained findings, as well as favorable findings. Reproducible research implies full disclosure.

Published on:

Surveyors sometimes weight their data to make the findings more representative of some other set of information. This point comes through in an article in the New York Times, July 23, 2015 at 83 regarding political polls. Pollsters may get too few responses from some demographic slice, such as farmers, and want to correct for that imbalance when they present conclusions respecting the entire population. The polling company weights the few farmer respondents more heavily to make up for the imbalance and represent the locations of residents more in line with reality.

How does this transformation of data apply in surveys for the legal industry? Let’s assume that we know roughly how many companies in the United States there are that have revenue over $100 million by each major industry. Let’s also assume that a benchmark survey of law departments has gathered compensation data regarding the lawyers in the responding law departments.

If the participants in the law department survey materially under-represent some industry — the proportions in each industry don’t match the proportions that we know to be true – it is not hard to adjust the compensation data. One way would be to replicated representatives in industries that have been insufficient number to be proportional by enough to make up the difference. This is what is happens when a surveyor weights survey data to present more proportional data.

Published on:

A precept of reproducible research, such as survey results that allow readers to understand the methodology and credibility of the findings, is to make generous use of “N = some number”. That conventional shorthand for “how many are we talking about” shows up in almost every reproducible-research graphic. Whether in the title of a plot, the text that relates to it, on the plot itself or in a footnote, a reader should always be quickly able to learn how many respondents answered each question or how many documents were reviewed or how many law departments had a given benchmark, or whatever pertains to the topic of the plot.

 

The larger the N, the more reliable the averages or medians that result from the data. For example, if the “average base compensation of general counsel” rose 2% from one year to the next, it makes a huge difference whether that change applies to N = 8 [general counsel] or N = 80.  Changes in small numbers of observations have much less credibility than changes in large numbers.

Published on:

We can take one more look at the seminal Winston & Strawn plot, now streamlined and improved as discussed previously. A few graphical design choices deserve comment. We emphasize, however, that graphical design choices are many, which means the permutations and combinations of them are even more numerous. Experience (and some research on how humans perceive and interpret graphs) suggest quite a few well-accepted guidelines, such as simplicity and clarity, but graphical visualization remains in the subjective domain of what feels appropriate to the designer. We could analogize to writing style.

A convention in plotting is that the so-called factors run along the x-axis at bottom and the quantitative values run up the y-axis on the left. With such long axis labels, however, that choice has no appeal here. If we shorten the labels and rotate them, it is possible, as seen in the plot below.

Another choice would have eschewed bars in favor of points.

Published on:

Returning once again to the same plot from the Winston & Strawn survey report, but shifting from criticism, we should praise several aspects of the original plot.

Screenshot (6)_snip Winston pg19
The somewhat-narrow width of the bars makes a more appealing impression than when bars are thick and therefore tightly packed shoulder to shoulder. Compare the version below where thick bars put more ink on the plot, but offer no more insights or clarity.

Rplot08nojunk
Similarly, the spacing between the bars helps a reader take in the message of the plot, and better than very narrow lines. The version above takes away that spacing although it adds around each box a frame colored black to clarify individual bars. This is not an improvement!

Published on:

We revisit the same Winston & Strawn plot which appears as the plot as it was in the most recent post in its improved re-incarnation. Now, let’s take up four more observations.

The thick black line on the vertical y-axis adds nothing: It is an example of what is referred to as “chart junk”, an element of a plot that adds no useful information but clutters up the plot and makes it that much harder to grasp.

Second, neither axis has a label to explain what the axis represents. Labels are generally a good thing so that a plot can stand on its own without explanations in the report text.

Published on:

Another aspect of the plot that has been discussed previously [Click here for the latest post in this series] should be called out.

Whoever prepared the plot chose to color differently each bar of the three risks most often selected. The blue bar represents “geographic locations in which the company operates”, a sort-of red bar represents another risk, and the third with yellow. In addition to those color distinctions, the plot also embeds the labels of those three risks in black boxes with white font. Shown below is the plot as it originally appeared.

Screenshot (6)_snip Winston pg19Neither of these graphical techniques add value to the plot or, indeed, make sense. They make readers work more to figure them out. Are the choices of colors significant, as in red-yellow-green means something? Is there a linkage between the coloring and the boxing? What do either or both tell us that the length of the bar and the label at the end don’t?

Published on:

We return to the same survey plot and our topic of effective visualization of survey results. To see the previous post that explains the source data and the purpose of this series, click here. The version shown below incorporates the changes recommended previously regarding redundant data and serves as the starting point for the improvements discussed here. Let’s focus on the typography.

Winstonpg19noredundantdata2
A font comes from a font family, such as the familiar Helvetica, Courier or Times Roman. The face of a font could be normal, italic, bold, upper case, or other formats. Third, with any family and face, the size of a letter, number or symbol can be small, medium, large or some specified size. There are other ways to characterize type (such as kerning and left or right alignment), but we will limit ourselves here to the three of family, face and size. We will use the term “typeset” to summarize font, face, and size.

The font on the left-hand, y-axis labels is different from the font on the x axis along the bottom, and both of those fonts differ from the bulky numbers at the ends of the columns. Additionally, on the original plot, but not shown here, there are black rectangles around three of the labels, which also have white coloring instead of black, so we could say that there are four different typesets employed in this one plot.

Published on:

In this series of blog posts, we will use a survey by the U.S. law firm Winston & Strawn to learn about survey methodology. In 2013 the firm produced a 33-page report based on the survey results entitled “The Winston & Strawn International Business Risk Survey 2013”.  To download a PDF of the report, click here.

The plot in the image below comes from page 18 of the report. The survey had asked respondents the question stated in the header, given them eight choices, and this plot presents the results as a graph. Here we will focus on one aspect of that plot: how effectively it presents the sum of the number of times respondents selected each of the risk choices.

Screenshot (6)_snip Winston pg19
Notice that the plot identifies the number of companies selecting a risk by three methods. One is the horizontal x-axis that ranges from zero on the left to 80 on the right. For example, “Rogue employees” is just to the left of the 50 marker on the x-axis so a reader could estimate 47-49 respondents chose if from that bar’s end point, where it reaches on the x-axis, and the figure from the y-axis.