Each year for some time now, your law department has defended more than a dozen employment discrimination lawsuits. Your case management system has stored for those lawsuits their independent variables: type of discrimination alleged, court, plaintiff’s counsel, your counsel, duration of the case, characteristics of the employee, responsible law department litigator. You wish you could figure out how the costs of such cases – called the dependent variable because it depends on the independent variables – vary according to these attributes.
To the rescue rides the Lone Data Arranger, the statistical tool of multiple regression. Regression analysis can explain, illustratively, that duration of the case more than any of the other variables predicts the eventual legal fees plus settlement or judgment, and even more than duration explains about 40 percent of the cost outcome. Perhaps years out of law school of the plaintiff’s attorney predicts cost outcomes second best, at about 25 percent. And so on. The statistical tool can provide confidence levels, too, so that you can describe how likely these conclusions will hold true. [For more on statistics, see my posts of April 9, 2005 on representativeness in surveys; May 15, 2005 on Monte Carlo simulations; and July 25, 2005 on power laws.]