- 0 Comments
The world is getting smaller. For large corporations, it is virtually certain that their operations span multiple countries. But it is no longer just large corporations that operate globally. These days, even small and mid-sized businesses are likely to have international components.
When a business is international, then any legal matters involving that business are also likely to be international in scope. In the context of litigation or a government investigation, that means the matter is likely to involve documents in more than one language. Often, such cases will involve collections of documents in a number of different languages – or even single documents containing multiple languages.
In other words, multi-language documents are a fact of e-discovery life these days. For e-discovery professionals, processing and review of multi-language collections raises a number of issues. In this post, I want to talk about one – review workflow.
The start of any successful multi-language review begins with computerized language identification. While most platforms support language identification, they tend to vary greatly in efficiency. Language identification uses built-in dictionaries to identify the primary (and sometimes secondary) language present in a document. This information can be used to route documents to the appropriate reviewer or to distinguish which documents need to be sent out for translation before they can be reviewed.
In the case of Chinese/Japanese/Korean (CJK) documents, language identification is less precise than when dealing with a Western Language character set. Frequently, document headers, email formatting, and email signatures contain CJK text while the substantial portion of the record would be in a Western language. This minimal amount of text causes the entire document to be coded as CJK. To navigate this problem, use a search tool that can both tokenize and count Western versus CJK words within a given document. These numbers can help establish a baseline to determine the true language of a document.
Language Specific Batching
With the limited number of foreign language reviewers and the often high cost associated with obtaining their services, it is important to have a clear system for assigning foreign language documents.
While most foreign language reviewers can review both their native language and English, you don’t want them wasting their time reviewing a document that 90% of your other reviewers can read. For English documents, documents with an unknown language, or documents with no text, these should go to your English reviewers. Only documents containing a non-English language should go to your foreign language reviewers.
Keep in mind though, when reviewing by document families, if any document contains a foreign language, the entire document set should go to your foreign language reviewer.
No language identification is perfect. Inevitably reviewers are going to come across documents they can’t read. That’s why it is important to have a flexible workflow when setting up your review.
About the Author
Ron Tienzo, Catalyst Consulting
With a degree from Sturm College of Law at the University of Denver and extensive training in a range of legal software applications and programming languages, Ron provides litigation consulting to corporate law departments and law firms. Prior to joining Catalyst, Ron was the software specialist at a full-service Los Angeles law firm. There, he was the firm’s lead person on the use of technology in complex litigation matters. In addition, he was the firm’s principal advisor on all software matters and was responsible for software training for all professionals and staff.