Crafting good keywords in EnCase and using conditions to refine results
I was contacted today by an examiner asking about some search hits that contained a very common keyword and how to quickly and easily refine them down to a relevant subset. This is pretty basic EnCase training stuff, but I figured I would put together a quick post that explained some things to think about when crafting keywords and then also the basic mechanics of creating a filter than can help you narrow down your hits. For this article, I am only going to discuss conducting a 'live' keyword search across an image, not using the index feature.
Filters & Conditions are probably one of the most underutilized features in EnCase and they can certainly help with issues like this when you have already run a search and then later realized the keyword was not the best choice. The best way to avoid this is to think ahead and use a good keyword that eliminates a lot of the noise for you and will hopefully leave you with just relevant hits.
Lets first talk about some ideas when crafting your keyword(s) to save you time later. For the purposes of this article, lets assume I want to search for the ANSI keyword "soft". As you can image, this keyword is likely to return thousands and thousands of search hits that are not relevant and create a lot of noise that I would have to filter through to get to the good stuff.
Admittedly, most people could predict that a keyword of 'soft' would result in a lot of hits (software, Microsoft, etc). But sometimes its not that apparent until after you start the search and later realize that your unique keyword is not so unique after all. This is where we will leverage a condition to help, but lets get back to choosing a good keyword.
Int he past, I have seen a lot of examiners create a keyword such as " soft ". Yes, that's a space in front and a space after the keyword itself. While this may work for a small percentage of the hits, this will most definitely miss things you would probably want to see. I do not recommend you ever use a keyword like this except in very rare unique cases. While this keyword would certainly exclude hits such as "Microsoft" & "software", you are also going to miss "soft." or "soft" at the beginning of a line.
Using a test image, a search of " soft " (space before and after the keyword) results in 167 search hits being found.
Another approach is to use the "Whole Word" option in EnCase:
Filters & Conditions are probably one of the most underutilized features in EnCase and they can certainly help with issues like this when you have already run a search and then later realized the keyword was not the best choice. The best way to avoid this is to think ahead and use a good keyword that eliminates a lot of the noise for you and will hopefully leave you with just relevant hits.
Lets first talk about some ideas when crafting your keyword(s) to save you time later. For the purposes of this article, lets assume I want to search for the ANSI keyword "soft". As you can image, this keyword is likely to return thousands and thousands of search hits that are not relevant and create a lot of noise that I would have to filter through to get to the good stuff.
Admittedly, most people could predict that a keyword of 'soft' would result in a lot of hits (software, Microsoft, etc). But sometimes its not that apparent until after you start the search and later realize that your unique keyword is not so unique after all. This is where we will leverage a condition to help, but lets get back to choosing a good keyword.
Int he past, I have seen a lot of examiners create a keyword such as " soft ". Yes, that's a space in front and a space after the keyword itself. While this may work for a small percentage of the hits, this will most definitely miss things you would probably want to see. I do not recommend you ever use a keyword like this except in very rare unique cases. While this keyword would certainly exclude hits such as "Microsoft" & "software", you are also going to miss "soft." or "soft" at the beginning of a line.
Using a test image, a search of " soft " (space before and after the keyword) results in 167 search hits being found.
Another approach is to use the "Whole Word" option in EnCase:
Using the same test image, a search using this option resulted in 6,036 search hits.
Using the same test image, a search of 'soft' (disregarding where is may appear in other words or sentences) resulted in 2,894,317 search hits:
Using a GREP statement of '[^a-z]soft[^a-z]' on the same test image resulted in 6,272 search hits:
So what does the GREP search include that the "Whole word" option does not? The GREP expression catches every occurrence of 'soft' when not preceded or proceeded with a alpha character (ASCII dec 65-90 or dec 97-122) . All other characters are acceptable (ASCII dec 0- 64, dec 91-96 & dec 124-254).
The "Whole Word" option does not catch all the non-alpha characters. It catches the common punctuation characters, but excludes high-ASCII characters. You can see an example here of what the GREP term found, but the "Whole word" option missed:
The other thing to notice is that with the "whole word" option, the preceding proceeding characters are not part of the hit, whereas with the GREP term they are. This changes your total hit size and beginning offset in case you are running some type of script that requires knowing the starting offset of where they keyword hit begins.
As a reminder, in general GREP searches require more time to run than non-GREP keywords. The exception to this is when using the "Whole word" option since this is treated similar to a GREP search where there could be multiple valid characters that precede/proceed the keyword and have to be checked (thus requiring more time).
So what if you have run your keyword search and have lots of false hits and want to start narrowing down the good hits. This is where a condition really comes into play. Remember that in EnCase v6, the filter and condition pane is exclusive to the display tab you are currently viewing (entries, search hits, keywords, etc). The first thing it to switch to the search hits tab. Then select the "conditions" tab in the lower-left pane and right-click choosing "new".
Name your condition, then right-click on "Main" and choose "new":
Once the "term" window appears, choose the "preview" property, then select MATCHES from the operator options and then enter all the common keywords int he value field that are appearing in your results that you want to exclude. In my example, I want to exclude "microsoft", "software", etc. Click on Okay and you are back to the initial condition window that looks like this:
Right-click on the condition and choose "NOT":
The condition should now say "if NOT Preview matches.......":
Click on "Ok" to complete the condition and then double click on it to run it:
Understanding that depending on the number of hits you are trying to filter down, this could take awhile to apply this filter.
The other downside to this approach is that it is possible the search hit is valid and only consists of 'soft', but a few words before or after your keyword hit is the word 'microsoft'. This filter would hide that search hit since we are asking it to hide anything (if NOT preview matches "microsoft" or "software") where the words 'microsoft' or 'software' are in the preview field (128 bytes before the hit and 128 bytes after the hit, including the hit). The only way to remove hits that contain common words that are not of interest and not risk excluding the hit because they appear elsewhere in the preview field is to use an actual filter (EnScript code) to check each hit and to exclude those where the actual hit text does not match a list of excluded words.
Here is a link to some brief documentation that I wrote regarding writing your own filters.
The underlying message to this article is that picking your keyword(s) from the beginning may be more important than you think and can save you a lot of time later, so try and think them through from the beginning ;)