Electronic Data Discovery for Everybody


Electronic Data Discovery for Everybody

  • 1 Tags
By Craig Ball, for Law Technology News

Electronic data discovery is just for big budget cases involving big companies, handled by big firms. Right. And suffrage is just for white, male landowners. Some Neanderthal notions take longer than others to get shown the door, and it’s time to explode the myth that e-discovery is just for the country club set.

Today, evidence means electronic evidence; so, like the courts themselves, access to evidence can’t be just for the privileged. Everyone gets to play.

If you think big firms succeed at EDD because they know more than you do, think again. Marketing hype aside, big firm litigators don’t know much more about EDD than solo practitioners. Corporate clients hire pricey vendors with loads of computing power to index, search, de-duplicate, convert and manage terabytes of data. Big law firms deploy sophisticated in-house or hosted review platforms that let armies of associates and contract lawyers plow through vast plains of data — viewing, tagging, searching, sorting and redacting with a few keystrokes. The big boys simply have better toys.

A hurdle for everyone else is the unavailability and high cost of specialized software to process and review electronic evidence. A Mercedes and a Mazda both get you where you need to go, but the EDD industry has no Mazdas on the lot. So let’s explore affordable, off-the-shelf ways to get you to your destination.


First, let’s set sensible expectations: Vast, varied productions of electronically stored information cannot be efficiently or affordably managed and reviewed with software from Best Buy. If you’re grappling with millions of files and messages, you’ll need to turn to some pretty pricy power tools.

It’s all about workflow. When every action is extrapolated across millions of messages and documents, seconds saved add up to big productivity gains. Tools designed for ESI review save considerable time over cobbled-together methods. But few cases involve millions of files. Most entail review of material collected from a handful of custodians in familiar Microsoft Office formats, Outlook e-mail, Word documents, Excel spreadsheets, and PowerPoint presentations.

Volume is a challenge in these cases, too. But with a mix of low-cost tools, and careful attention to process, you can realize defensible e-discovery on the cheap.


Paper filled the void for a time, but lately the cracks are starting to show. Lawyers who might once have printed out discoverable electronic documents are coming to appreciate that printing evidence isn’t just more expensive and slower, it puts clients at an informational disadvantage.

When you print an electronic document, you lose three things: money, time and metadata. Money and time are obvious, but the impact of lost metadata is often missed. When you move ESI to paper or paper-like formats like .tiff images, you cede much of your ability to search and authenticate information, along with the ability to quickly and reliably exclude irrelevant data. Losing metadata isn’t about missing the chance to mine embedded information for smoking guns. Losing metadata is like losing all the colors, folders, staples, dates and page numbers that help paper records make sense.


I polled a group of leading EDD lawyers and forensic technologists to see what tools and techniques they thought suited to the following hypothetical:

Your school chum, Edna, runs a small firm and wants your advice. A client is sending two DVDs containing ESI collected in a construction dispute: Outlook .pst files for six people, and a mixed bag of Word documents, Excel spreadsheets, PowerPoints, Adobe PDFs, and scanned paper records sans OCR. Maybe a little video, some photographs, and a smattering of voicemail in .wav formats. “Nothing too hinky,” she promises.

Edna is confident it will comprise less than 50,000 files, but it could grow to 100,000 items before the case concludes in a year or two. She’s determined to conduct an in-house, paperless privilege and responsiveness review, sharing the task with a tech-savvy associate and legal assistant. All have late-model, big screen Windows desktop PCs with Office Professional 2007 and Adobe Acrobat 9, and the network file server has ample available storage space.

Edna doesn’t own CT Summation or Concordance, but she’s willing to spend up to $1,000 for new software and hardware, but not a penny more. She’s open to an online Software as a Service (SaaS) option, but the review has to be completed using just the hardware and software she currently owns, supplemented only by the $1,000.00 in new purchases. Her team will supply as much brute force as necessary. She’s too proud to accept a loan of systems or software, and you can’t change her mind or budget. How should Edna proceed?


The review method employed should:

  1. Preserve relevant metadata.
  2. Incorporate de-duplication, as feasible.
  3. Support robust search of Outlook mail and productivity formats.
  4. Allow for efficient workflow.
  5. Enable rudimentary redaction.
  6. Run well on most late-model personal computers.
  7. Require no more than $1,000 in new software or hardware, though it’s fine to use fully-functional “free trial” software so long as the data stays accessible for the two-to three-year life of the case.

Sadly, there’s not that much new for those on shoestring budgets. Developers remain steadfastly disinterested in 85% of the potential market for desktop discovery tools. One possible bright spot was the emergence of hosted options. No one was sure the job could be begun — let alone completed — using SaaS on so tight a budget; but, there was enough mention of Saas to make it seem like a possibility.


Though the range of proposals was thin, the thought behind them was first-rate. All respondents recognized the peril of using the various Microsoft programs to review ESI. Outlook’s search capabilities are limited, especially with respect to attachments. If Edna expects to reliably search inside of every message, attachment and container file, she would need more than Outlook alone.

Notable by absence were any suggestions to use Google’s free desktop indexing and search tool. Though a painful interface for EDD, Google Desktop installed on a dedicated, “clean” machine would be capable of reading and searching Outlook e-mail, Word documents, Excel spreadsheets, PowerPoint presentations, PDF files, Zip archives, and even text within music, video and image files. It wouldn’t be pretty — and Edna would have to scrupulously guard against cross-contamination of the evidence with other data— but Google Desktop might get much of the job done without spending a penny.

Quin Gregor, of Georgia’s Strategic Data Retention, was first to respond with an endorsement of my two favorite affordable workhorses, the ubiquitous dtSearch indexing and search tool ($199 at www.dtsearch.com); and Aid4Mail ($69.95 at www.fookes.com), a robust utility for opening, filtering and converting common e-mail container files and message formats.

Gregor described a bankruptcy case where a microscopic budget necessitated finding a low-end option. He reports that dtSearch and Aid4Mail saved the day.

Ron Chichester, a Texas-based attorney and forensic examiner, pointed to the many open source Linux tools available without cost. These command line tools are capable of indexing, Bayesian analysis and much of the heavy lifting of the tools used by EDD vendors. But Chichester acknowledged that Edna and her staff would need a lot of Linux expertise to integrate the open source offerings.

Bottom line: The price is right, but the complexity unacceptable.


Ralph Losey, a partner at Florida’s Akerman Senterfitt and an active blogger/author/speaker on the EDD circuit, suggested using an online review tool such as Catalyst. He tried to dance around the budget barrier by pointing out that the cost could be passed on to the client. Losey argued that hosting would save enough lawyer time to pay for itself. No doubt he’s right; but, passing on the costs isn’t permitted in the Edna Challenge, and even in a real world situation. Unless the savings are considerable, Edna’s likely to keep the work and the revenue in house.

Another Floridian, veteran forensic examiner Dave Kleiman, suggested that Edna blow her budget on alcohol and amphetamines because she has a lot of toil ahead of her. Party on, Dave!


Our northern neighbor, Dominic Jaar of Ledjit Consulting, in Quebec, took a similar doleful tack.

Jaar thought that SaaS might be a possibility but added that Edna should use her grand to take an e-discovery course because she needs to learn enough to “stay far from the case.” Else, he offered, she could go forward and apply the funds to coffee and increased malpractice coverage. Ouch!

John Simek of Sensei Enterprises in Virginia prudently suggested that Edna use part of her budget to buy an hour of a consultant’s time to help her get started. Simek predicted that a SaaS approach would be priced out of reach, but he was another who thought salvation lay with dtSearch. He recognized that Adobe Acrobat could handle both redaction and light-duty OCR. As for the images, video and sounds, Edna’s in the same boat, rich or poor. She’s just going to have to view or listen to them, one by one.

Jerry Hatchett with Evidence Technology in Houston suggested LitScope, a SaaS offering from LitSoft. Jerry projected a cost of around $40/GB/month, which would burn through Edna’s budget in about three months.

Following up, I discovered that LitScope can’t ingest the native file formats Edna needs to review, unless accompanied by load files containing the text and metadata of the documents and messages. The cost to pre-process the data to load it would eat up Edna’s budget before she looked at a single page. That, and a standard $200 minimum on monthly billings coupled with a six-month minimum commitment, made this SaaS option a non-starter.

Attractive pricing, to be sure, but not low enough for Edna’s shallow pockets.

photoThe meager budget forced New York City’s George Rudoy, director of global practice technology and information services at Shearman & Sterling, to suggest using Outlook 2007 as the e-mail review tool, adding the caveat that metadata may change.

Unlike earlier versions, Outlook 2007 claims to extend its text search capabilities to attachments. Unfortunately, it works poorly in practice, meaning Edna and her staff will need to examine each attachment instead of ruling any out by search.

Rudoy also urged Edna to buy licenses for Quick View Plus — a universal file viewer utility — and hire an Access guru to design a simple database to track files and hyperlink to each review.


Michelle Mahoney of Mallesons Stephen Jaques in Melbourne shared several promising approaches. She suggested Karen’s Power Tools (a $30 suite of applications) to inventory and hash files, and Microsoft Access to de-duplicate by hash values.

Mahoney also favored hyperlinking from Access for review, working through the collection progressively, ordering them by file type and filename.

She envisions adding fields to the database for relevant and privilege designations and a checkbox for exceptional files that can’t be opened and require further work.

For the e-mail files, Mahoney also turns to Outlook as a review tool, proposing that folders be created for dragging and dropping items into Relevant Non Privileged, Relevant Privileged, and Non Relevant groups. She echoes warnings about metadata modification and gives her thumbs up to Aid4Mail.

Finally, she offers more kudos for dtSearch as the low cost tool of choice for keyword searching. DtSearch allows Edna to run keywords across files, including e-mails and attachments, and offers the option to copy them, with or without original path, into a folder. Messages emerge in the generic msg mail format, and Edna can either produce them in that format (with embedded attachments) or use Aid4Mail to copy them into an Outlook pst file format.

Tom O’Connor, director of the Gulf Coast Legal Technology Center in New Orleans, observed that he often gets requests like Edna’s from his clients in Louisiana and Mississippi. He weighed in with a mention of Adobe Acrobat, noting that it might be feasible to print everything to Acrobat and use Acrobat’s annotation and redaction features.

As mentioned, Acrobat also offers rudimentary OCR capabilities to help deal with the scanned paper documents in the collection, and it can convert modest volumes of e-mail to PDFs directly from Outlook. O’Connor cautions that using familiar off-the-shelf tools can be cumbersome: it may be preferable to trying to master new software under pressure.

Ohio-based consultant Brett Burney had very concrete ideas for Edna. She could try to find a SaaS system to host the data — and suggested Lexbe, NextPoint or Trial Solutions as candidates. Burney was most familiar with Lexbe and knew of small law firms that had successfully and inexpensively used its services.

Brett guessed Edna’s budget might allow her to upload everything to Lexbe, review it quickly, and then take everything down before the hosting costs ate up her budget. He reported that Lexbe will accept about any file format; users upload data or send it to Lexbe to load. Cost: $99 per month for two users and 1 GB of storage. Because Edna needs to host more than 1 GB of data, Burney projected that her outlay should be close to $200 month. “Edna and her crew can upload everything with the tools they have, get it reviewed pronto (i.e., less than a month), and then take everything down — paying only for what they use,” he said.

For the Outlook e-mail, Burney thought Edna should turn to Adobe Acrobat and convert the .pst container files to PDF Portfolios. Alternatively, Brett suggested Edna use the free Trident Lite tool from Wave Software to get a “snapshot” of the .pst files and then convert relevant messages to PDF or upload them to a hosting provider.

California-based Lisa Habbeshaw of FTI Consulting pointed to Intella by Vound Software as an all-in-one answer to Edna’s needs. It offers an efficient indexing engine, user-friendly interface, and innovative visual analysis capability sure to make quick work of Edna’s review effort. Lisa was unsure if the program could be had for under $1,000, but noted that Vound offers a free, fully-functional demo that might fill the bill for Edna’s immediate needs. It’s certainly a splendid new entry to the do-it-yourself market.


It’s hard to add much to so many fine ideas. Collectively, dtSearch, Adobe Acrobat, and Aid4Mail deliver the essential capabilities to unbundle, index, search, OCR and redact the conventional file formats and modest data volumes Edna faces. Her challenge will be cobbling together tools not designed for EDD so as to achieve an acceptable workflow and defensible tracking methodology. It won’t be easy.

For example, while dtSearch is best of class in its price range, it doesn’t afford Edna any reasonable way to tag or annotate documents as she reviews them. Accordingly, Edna will be obliged to move each document to a folder as she makes her assessments respecting privilege and responsiveness. That effort will get very old, very fast.

While Adobe Acrobat supports conversion of e-mail into PDFs, the process is painfully slow and cumbersome. Moreover, the conversion capabilities break down above 10,000 messages. That sounds like a lot, but it’s likely insufficient for the aggregate collections of six custodians.

Further, Edna may encounter an opponent smart enough to demand the more versatile electronic formats for e-mail (i.e., .pst, msg or eml). What’s Edna going to do if she finds herself locked into a review employing document images?

Whatever tools she employs, Edna will need to be meticulous in shepherding of the individual messages and documents through the process. To that end, I’d offer this advice:

  1. First, make a working copy of the data and secure the source dataset against any usage or alteration. Processing ESI poses risks of data loss or alteration. If errors occur, you must be able to return to uncorrupted ESI. For each major processing threshold, set aside a copy of the data for safekeeping, and carefully document when the copy was generated and what work had been done to that point (e.g., the status of deduplication, filtering and redaction).
  2. From the working copy, hash the files and generate an inventory of all files and their metadata. The processes you employ must account for the disposition of every file in the source collection or extracted from those files (i.e., message attachments and contents of compressed archives). Your accounting must extend from inception of processing to production.By hashing the constituents of the collection as it grows, you gain a means to uniquely identify files as well as a way to deduplicate and track identical files across custodians and sources.

    Karen’s Hasher is useful, but the best free tool for the task is AccessData’s FTK Imager. It not only hashes files, it also exports Excel-compatible comma delimited listings of filenames, file paths, file sizes and modified, accessed and created dates. Moreover, it supports loading the collected files into a container called a Custom Content Image that protects the data from metadata corruption.

  3. Devise a logical division scheme for the components of the collection; e.g., by machine, custodian, business unit or otherwise. Be careful not to aggregate files in a manner that risks files from one source overwriting identically named files from other sources.
  4. Expand files that hold messages and other files. Here, you should identify e-mail container files (such as Outlook .pst files) and archives (e.g., .Zip files) that must be opened or decompressed to make their constituents amenable to search. Most indexing tools can directly text within compressed formats. For example, dtSearch can extract text from Zip files and other archives.For e-mail client applications such as Outlook, typically permit export of individual messages and attachments. Better still, use an inexpensive utility, such as Aid4Mail or Trident Lite.

    DtSearch can process e-mail accessible through an Outlook profile, or using an included command line utility to convert Outlook .pst container files to individual messages (.msg) files for indexing. Neither approach works well, or easily, compared to Aid4Mail.

  5. A feature common to premium e-discovery tools but hard to match with off-the-shelf software is de-duplication. You can use hash values to identify identical files, but the challenge is to keep track of all de-duplicated content and reliably apply tagging for privilege and responsiveness to all deduplicated iterations. Most off-the-shelf utilities simply eliminate duplicates and so aren’t suited to e-discovery.This is where it’s a good investment to secure help from an expert in Microsoft Excel or Access because those applications can be programmed to support deduplication tracking and tagging.

    When employing deduplication, keep in mind that files with matching hash values can have different filenames and dates. The hash identicality of two files speaks to the contents of the files, not the names assigned to the files by the operating system or to information, like modified, accessed and created dates, stored outside the files.

  6. Above all, don’t process and review ESI in a vacuum. Be certain that you understand the other side’s expectations in terms of the scope of the effort, approach to search, and — critically — the forms of production they seek. You may not agree on much, but you may be pleasantly surprised to learn that some of the perils of a low budget EDD effort (e.g., altered metadata, limited search capabilities, native production formats) don’t concern the other side.

photoFurther, you may reach accord on limiting the scope of review in terms of time intervals, custodians and types of data under scrutiny. Why look at all e-mail if the other side is content with your searching just communications between Don and Betty during the third week of January 2009?

Finally, Edna will want answers to two common questions that should concern anyone taking the do-it-yourself route in e-discovery:

1. What if I change metadata?

Certain system metadata values — e.g., last access times and creation dates — are prone to alteration when processed using tools not designed for EDD. Such changes are rarely a problem if you adhere to three rules:

1. Preserve an unaltered copy of whatever you’re about to process.
2. Understand what metadata were altered.
3. Disclose the changes to the requesting party.

By keeping a copy of the data at each step, you can recover true metadata values if particular values prove significant. Then, disclosing what metadata values were changed eliminates any suggestion that you pulled a fast one. Many requesting parties have little regard for system metadata values; but they don’t want to be surprised by relying on inaccurate information.

2. Can I use my own e-mail account for review?

You wouldn’t commingle client funds with your own money, so why commingle e-mail that’s evidence in a case with your own mail? That said, when ESI is evidence and the budget leaves no alternative, you may be forced to use your own e-mail tools for small-scale review efforts. If so, remember that you can create alternate user accounts within Windows to avoid commingling client data with your own.

Better still, undertake the review using a machine with a clean install of the operating system. Very tech-savvy counsel can employ virtual environments (e.g., VMWare products) to the same end.

If using an e-mail client for review, it may be sufficient to categorize messages and attachments by simply dragging them to folders representing review categories; for example:

  1. Attorney-client privilege: entire item.
  2. Work product privilege: entire item.
  3. A-C Privilege: needs redaction.
  4. W-P privilege: needs redaction.
  5. Other privilege.
  6. Responsive.
  7. Non-responsive.

Once categorized, the contents of the various folders can be exported for further processing or for production, if in a suitable format.


The vast majority of cases filed, developed and tried in the U.S. are not multimillion dollar dustups between big companies. The evidence in modest cases is digital, too. Solo and small firm counsel like Edna need affordable, user-friendly tools designed for desktop e-discovery — tools that preserve metadata, offer efficient workflow and ably handle the common file formats that account for nearly all of the ESI seen in day-to-day litigation. Using the tools and techniques described by my thoughtful colleagues, Edna will get the job done on time and under budget. The pieces are there, though the integration falls short.

So, how about it, EDD industry? Can you divert your gaze from the golden calf long enough to see the future and recall the past? Sam Walton became the richest man of his era by selling to more for less. There’s a fast growing need — and a huge emerging market. The real Edna Challenge is waiting for the visionaries who will meet the need and serve the market.

Austin’s Craig Ball is a trial lawyer and computer forensics/EDD special master.
E-mail: craig@ball.net.

Almanya sohbet anal yapan escort