Andy Kraftsow wrote a piece for Inside Counsel (February 21, 2014). He explained the mathematics of the Poisson distribution to show in discovery how to dramatically reduce the number of documents that need to be reviewed to understand what they say about the issues.
Most of the piece explains the iterative process of requesting documents and categorizing them by keywords and phrases into what Kraftsow calls an “organizational schema.”
He then highlights the advantage of a Poisson calculation. “Assume that the organizational schema consists of 50 categories and that each category has been populated with 2,000 documents. Do you need to read all 100,000 documents to understand what the collection says about each of the 50 issues? Poisson says “no.” You need only read 15,000.”
Kraftsow explains: “To be 95 percent certain you have seen all of the relevant language that appears in more than 1 percent of the documents in the category (a “rare event”), you need only read 300 documents in that category. In other words, by reading 300 randomly selected documents from each category, you are 95 percent certain to see the relevant language that appears in all but 50 (1 percent) of the 2,000 documents in each category.” Thus, 300 times 50 categories equals 15,000 documents or 15% of the collection.
As analysts tackle larger and larger collections of data that managers of lawyers care about, Poisson-based calculations will help them figure out how much they need to analyze to reach a high degree of confidence in their conclusions.