Saturday, April 18, 2009

Filter to remove duplicates for export

A reader emailed me about needing a solution to remove some duplicates to then export some files. The scenario was that a keyword search was run and thousands of files were found that were responsive to the keywords. The reader tagged the files and then found that some of the files were duplicates, even though they were named or located in different places on the evidence. So to reduce the number of files that needed to be exported, he needed a way to remove the duplicate files.

EnCase comes with a standard filter that is named "remove duplicates by hash". This filter does exactly what he needed, except it did it against all files. He only wanted to remove the duplicates from the selected files. By adding one quick line, the following filter will remove duplicate files, based on the hash value, of the SELECTED files. So if you have 100 selected files and some of those files have the same hash value and then run this filter, what will be left will be only unique selected files.

You can create a new filter and paste the following code:

--------cut here------------

class MainClass {
NameListClass HashList;
bool UserCancel;

MainClass() :
HashList()
{
if (SystemClass::CANCEL ==
SystemClass::Message(SystemClass::ICONINFORMATION |
SystemClass::MBOKCANCEL, "Unique Files By Hash",
"Note:\nFiles must be hashed prior to running this filter."))
UserCancel = true;
}
bool Main(EntryClass entry) {

if (UserCancel)
return false;

if (entry.IsSelected()){
HashClass hash = entry.HashValue();
if (!HashList.Find(hash))
new NameListClass(HashList, hash);
else
return false;

return true;
}
else
return false;
}
}
------------------ cut here----------------

2 comments:

Brian Larsen Friday, 01 May, 2009  

Lance,

Here is a similar filter, with better performance, using a suggestion by Shawn Mcreight on the GSI forum. Basically just uses a hash array rather than name class array, since hash comparisons are orders of magnitude faster than string comparisons. This is also easily modified to dedupe the entire case.

typedef HashClass[] HashArray;
class MainClass {
 HashArray HashList;
 SearchClass Search;
 MainClass() :
 HashList(),
 Search()
 {
 }
 bool Main(EntryClass entry) {
 if (entry.IsSelected()) {
 HashClass hash = Search.ComputeHash(entry);
 if (HashList.Find(hash) == -1) {
 HashList.Add(hash);
 return true;
 }
 else
 Console.WriteLine("Duplicate File:\t" + entry.FullPath() + "\t" + hash);
 }
 return false;
 }
}

Thanks for the great resources available on your blog.

Brian Larsen

Anonymous Wednesday, 23 September, 2009  

Many thanks, this just cut the number of review files in half

John

Post a Comment

Computer Forensics, Malware Analysis & Digital Investigations

Random Articles