- 02May2014
-
Greatest Hits: Improving Precision in Keyword Search
- 0 Comments
Sometimes, a keyword needs an additional element in order to increase precision. For example, a keyword like “meeting minutes” might grab the targeted meeting minutes in addition to irrelevant meeting minutes. By adding a subject matter “anchor”—a keyword element that is thought to be highly correlated with the presence of relevant subject matter—precision can be increased: “meeting minutes” could become “marketing board AND meeting minutes” or “marketing board w/10 meeting minutes.”
Use Exclusion
If you can account for imprecision by identifying a particular responsible element, you can add exceptions to an existing keyword by using “NOT.” For example, “meeting minutes” could become “meeting minutes NOT executive committee meeting minutes.” This approach can be more time-consuming, however, as it often requires creating exceptions that are document-specific. Also, if a sample of a document population is being used to validate keywords, building document-specific exceptions is unlikely to address other irrelevant documents present in the document population but outside of the sample being used.
Revisit the Use of Wildcards
Wildcards, when used carefully with keywords, can safely increase precision by covering variations of a concept. However, wildcards can also go haywire unexpectedly and the results need scrutiny to see if a revision makes sense. For example, if the original intent of the keyword “sting*” was to return discussions about stinging insects, you may not want those documents with the word “stingy.” Replacing the wildcard operator with a more limited set of keyword variants (“sting or stings or stinger or stinging”) or using an exception to exclude unwanted hits (“sting* NOT stingy”) can help to boost precision.
Use Appropriate Proximity Operators
If a keyword includes a proximity operator, investigate whether reducing the operator size might result in increased precision. For example, the keyword “customer* w/50 marketing” might be too broad, and could be replaced with the keyword “customer* w/25 marketing,” especially if you observe that the two keyword elements (“customer” and “marketing”) tend to be closer together in relevant documents than they are in non-relevant documents.
Scrutinize Metadata
Sometimes, syntax allows the ability to draw on various metadata fields for use in keywords. When reviewing keyword hits, observe whether relevant documents tend to be within a certain date range or tend to be a certain kind of file extension. The related metadata fields can then be incorporated to refine keywords. For example, if a keyword seems to work only on documents with the .DOC file extension, add a metadata element to the keyword in order to limit the hits for that keyword to documents with this file extension. If a keyword seems to work in most documents, but also hits a number of irrelevant .XLS spreadsheets, add an exception to the keyword to “NOT out” .XLS file extensions from the keyword hits.
This article was originally posted on H5 Blog: True North
© Copyright H5 2014. All rights reserved.
COMMENTS