- 0 Comments
I got an email the other day from Justin, our business development rep in Chicago. He was writing about a case we had been helping with for maybe a year now. It was an interesting matter but nothing out of the ordinary. I believe they have about 150,000 documents on the site.
Justin wrote to tell me that our client would be adding data to the site, also nothing out of the ordinary. But then he surprised me. “They expect to be adding another 28 million pages to the site,” he reported. That did get my attention.
“Did you mean 28,000 pages?” I wrote back, mostly just to kid him. “Perhaps that was a typo,” I continued.
“Nope,” he answered. “That was not a typo. Our partner says to expect another 28 million pages on the site.” I had to laugh when I thought about that volume. We have come a long way in this industry when 28 million pages isn’t all that unusual.
When Discovery Had No ‘E’ in Front
We started Catalyst in 2000, more than 10 years ago. At the time, there was no “e” in front of discovery. Digital mostly meant scanned images of the paper originals. Even the email that existed was printed out to paper and then scanned. I remember former partners of mine who would get a CD in production. The first thing they asked for was to print it out. Often we would stamp the pages and scan them back into the system.
I smile when I think about those days. Back then a “big case” had 30,000 documents in it. (Of course, some were bigger than that but plenty were smaller, as well.) When we were getting started, I took pride in the fact that we had a dedicated storage device called a Net Appliance Filer. It was an industrial-strength set of hard drives that were striped using the RAID 5 protocol. I confess to bragging about it because it really seemed special to me.
Most important — and this is what makes me smile — the device had a whole 600 gigabytes of usable space on it. We thought we might never need another one. Ever. Indeed, I used to look forward to the day when we had a million pages on our system. That seemed like a huge reach to me. The thought of reaching the million-document mark seemed even more formidable. You need a lot of cases at 30,000 documents each to reach a million. Thirty-three seemed like a lot of cases to me back then.
A Million Documents is Now Routine
Fast forward to 2011 when you rarely see the word discovery without an “e” in front of it. We buy SANs these days rather than network-attached storage and they hold a lot more data. I believe our standard buying unit is now 48 terabytes which, as most know, is like 48,000 of those gigabytes we originally thought about. We now have a number of these devices and just bought another.
Volumes have risen incredibly. Several years ago, we automated the loading process just to keep up with the demand. Clients and partners send us data directly via a specialized FTP system for automated processing and loading. These days, loading a million documents in a day is not unusual. Indeed, just checking as I write this sitting on a porch on a sparkling afternoon in Nashville, Tenn., there are 162 separate automated loading tickets in our system. It just boggles my mind to think of that much data.
Taking it further, it is not just the number of documents being sent to us that has grown dramatically but also the size of cases. Four or five years ago, our average case size was about 16 gigabytes of documents. If you figured 5,000-10,000 documents per gigabyte (a number some people throw around but we believe is high), that would come to between 80 and 160,000 documents per case. That seemed big enough to me.
Today, after removing a number of really large outlier cases, our average has jumped to about 120 gigabytes a case. Using the same base figures as before, that suggests that cases have grown substantially, to somewhere between 600,000 and more than a million documents per case. That is just amazing when you think about it.
And speaking of outliers, we have clients with 20 terabytes of litigation documents on our system (and one prospect talking about 60 terabytes of case data). I have wondered how many total documents that represents, but I am not even going to pull out my calculator to figure that one. We see cases with as many as 8 million documents in them under review. It just boggles the mind.
So that is what I was thinking about when I got Justin’s email. My point isn’t to say that Catalyst is big or that our numbers are anything special. Rather, like many of you, I remember what litigation was like in the 1990s, a time when we thought paper discovery was getting out of hand. In those days, we moved from redwell folders to filing cabinets to war rooms and even warehouses. But we never contemplated having even 100,000 documents to review, let alone millions. It just didn’t happen, at least not in my practice.
The volume of digital data is growing at an explosive rate, as we all know. Some claim the world is creating more than 988 exabytes of data a year in new content. That comes to 988 billion gigabytes. (It goes exabyte to petabyte to terabyte to gigabyte.)
The great bulk of this data consists of video and audio files, but written data keeps expanding, too. And a lot of that stuff will be discoverable, which is what drives this industry.
The e-discovery market is all grown up now, with trade shows, industry magazines, experts, and analysts. My how you’ve grown! My how we all have grown!
About the Author
John Tredennick is a nationally known trial lawyer and longtime litigation partner at Holland & Hart. John founded Catalyst in 2000 and is responsible for its overall direction, voice, and vision.
John is the former chair of the ABA’s Law Practice Management Section. For many years, he was editor-in-chief of the ABA’s Law Practice Management magazine, a monthly publication focusing on legal technology and law office management. More recently, he founded and edited Law Practice Today, a monthly ABA webzine that focuses on legal technology and management.