Raise your hand if you’ve read the TREC 2008 Legal Track study released in March. TREC is a great project to help legal professionals understand the accuracy and process of document review. To improve doc review and gain the value of this on-going study, however, TREC organizers and the legal profession need more dialogue. 

Background and Articles about TREC Study. The TREC study “focuses on evaluation of search technology for discovery of electronically stored information in litigation and regulatory settings.” Two recent articles help explain the TREC study and its importance. TREC 2008 Stresses Human Element in EDD by Jason Krause at law.com (1 May 2009) does an excellent job explain the TREC study. Krause also authored another article about TREC, In Search of the Perfect Search in the April 2009 ABA Journal.

The Study is a Hard Read. After the release of Overview of the TREC 2008 Legal Track, it generated little immediate discussion in blogs and at Twitter. The limited coverage is not surprising; I found reading it a tough slog. At risk the risk of being immodest, I have a good background to understand it: lawyer, 20+ years experience with full-text retrieval, three years of college majors-sequence math, and four years of hands-on econometrics experience. So if I have trouble understanding the report, that does not augur well for other legal professionals. It’s a tough read for three reasons: (1) long, (2) academic, and (3) too much passive voice, which makes understanding who did what hard. So I was glad to see Kruse’s articles.

TREC “Interactive Task” Focused on Replicating Decisions of Experienced Litigator. The part of the study I found most interesting focuses on how best to automate or replicate responsiveness decisions an experienced litigator would make. This was the TREC Interactive Task in contrast to the Ad Hoc and Relevance Feedback tasks. The study “gold standard” was the document designation a “Topic Authority” would make. “Gold standard” is a term of art that means the best-established or best-accepted solution. “Topic Authority” means an experienced litigator who best understands the facts and issues of the case and who, time permitting, would personally review all the documents. Topic Authority seems a variant on the more generic “domain expert” or “subject matter expert”.

Potential Issues with “Gold Standards”. The TREC study assumed that the Topic Authority’s judgment is the gold standard. More on that in a minute. Contrast that to the apparent assumption many litigators make. In a March 2007 blog post, The Gold Standard for E-Discovery Document Review , I wrote

“Many lawyers appears honestly to believe that human review is accurate, the “gold standard” for document review. “Honestly held” and “right” can diverge. I, for one, have never seen data to support the commonly accepted “gold standard.”

My concern then was that too many lawyers seem ready to dismiss software designations of documents in favor of lawyer designations, even if the lawyer has no case-specific know-how. I might accept the Topic Authority’s judgment as the gold standard but I don’t accept any old lawyer’s judgment as such (especially not a contract lawyer new to case who might have had only an hour or two of training about the matter).

Even the Topic Authority as Gold Standard raises issues for me. The accuracy and reproducibility of Topic Authority review has not been established. That said, it is not clear what standard is better. On the one hand, this standard acknowledges we must rely on the subjective opinion of an expert. On the other hand, reasonable experts can and do disagree. In contrast to doctors conducting clinical trials, lawyers cannot establish an objective and reliably reproducible standard. This conundrum is perhaps best summed up by the TREC Topic Authority, Maura Grossman, quoted in Krause’s law.com article:

“The decision process that determines what is responsive, and therefore what must be produced, is inherently a task involving human judgment. The concept of responsiveness in e-discovery is not an objective standard, it is ultimately a judgment found in the senior attorney’s head.”

TREC Compared How Closely Different Approaches Came to Replicating Human Judgment. The study examined how accurately different teams – where a team could be human or software or both – came to replicating the Topic Authority’s judgment. Each team had an opportunity to interact with the Topic Authority to refine or tune its approach. After all teams finished the review, researchers compared doc designations and, where results differed, set up an “appeal process” to decide on the correct designation.

I think it would be helpful to consider explicitly how Topic Authority judgment variability might influence thinking about e-discovery issues. We have no data on whether similarly situated Topic Authorities would agree on most document designations. (My doctor friends tell me that in clinical trials, this is the inter-rater variability problem). For example, suppose a future study employs two Topic Authorities per topic and finds a 20% variance in doc designation between them? We need parameters around the variance among Topic Authorities otherwise comparison to software approaches seem suspect. More specifically, if software approach A compares well to Topic Authority A and software approach B compares well to Topic Authority B but A and B vary by 20%, what does that mean? If the variance were under 5%, I think everyone would agree “so what.” If the variance exceeded 25%, I think many would find that troubling.

Conclusion: Need More Dialogue to Make TREC Useful to Legal Professionals. The potential variation among Topic Authorities notwithstanding, TREC offers what may be the best possible approach to evaluating the reliability of alternate approaches to document designation. The study as it stands, meaning a rather academic work, may not provide enough guidance to real-world lawyers, e-discovery professionals, and perhaps most importantly, judges. That said, I think that TREC could be instrumental to the future of EDD. Having a disinterested but highly knowledgeable third party evaluate alternative approaches based on scientific principles (meaning based on valid statistics and reproducible) is the only way to establish reliable standards.

To achieve traction in the real world of EDD, the profession needs to discuss the TREC theory and findings more widely. Organizations such as EDRM and the Sedona Group, which are already deep into e-discovery, have the potential to help digest, explain, and disseminate the findings in a way that make the work more relevant to everyday practitioners. A dialogue with practitioners would not only inform litigators and EDD professionals how best to use study results but might also influence study design to make it more useful to the legal profession.

[Note: I want to express my appreciation to Nicolas Economou, Sandra Song, and Dan Brassil of H5, who helped me parse out some of the finer points of the study.]