Showing posts with label Duplicates. Show all posts
Showing posts with label Duplicates. Show all posts

Saturday, April 18, 2009

Filter to remove duplicates for export

A reader emailed me about needing a solution to remove some duplicates to then export some files. The scenario was that a keyword search was run and thousands of files were found that were responsive to the keywords. The reader tagged the files and then found that some of the files were duplicates, even though they were named or located in different places on the evidence. So to reduce the number of files that needed to be exported, he needed a way to remove the duplicate files.

EnCase comes with a standard filter that is named "remove duplicates by hash". This filter does exactly what he needed, except it did it against all files. He only wanted to remove the duplicates from the selected files. By adding one quick line, the following filter will remove duplicate files, based on the hash value, of the SELECTED files. So if you have 100 selected files and some of those files have the same hash value and then run this filter, what will be left will be only unique selected files.

You can create a new filter and paste the following code:

--------cut here------------

class MainClass {
NameListClass HashList;
bool UserCancel;

MainClass() :
HashList()
{
if (SystemClass::CANCEL ==
SystemClass::Message(SystemClass::ICONINFORMATION |
SystemClass::MBOKCANCEL, "Unique Files By Hash",
"Note:\nFiles must be hashed prior to running this filter."))
UserCancel = true;
}
bool Main(EntryClass entry) {

if (UserCancel)
return false;

if (entry.IsSelected()){
HashClass hash = entry.HashValue();
if (!HashList.Find(hash))
new NameListClass(HashList, hash);
else
return false;

return true;
}
else
return false;
}
}
------------------ cut here----------------

Friday, May 2, 2008

Find duplicate files

This post is not really "forensic" related but it may have use to others...

If you are like me, you have several hard drives laying around that have all sorts of "stuff" on them. Things you have acquired over time and save with the thought that the information, programs or files may come in handy someday in the future. Then after I fill that drive up, I end up upgrading to a larger drive, but I don't want to delete anything in case I need it! After reviewing several of these drives, I found I had lots of duplicate files scattered throughout the drives in various folders.

My solution to this was to write an EnScript so that I could preview all my drives at once and then the EnScript will hash all the files and list all the duplicates for me. I can then decide if I want to manually delete the dupes or not. It also lists how much space is wasted by having all the dupes.

Most of my drives are formatted using NTFS. The NT file system has a feature called hard links. This basically allows you to have multiple directory entries for the same file, but it only takes up the space of one file. This is because all the directory entries can point to the same MFT record and the same data. For example, you can have a file named "lance mueller.doc" in a folder named "e:\documents\", and then have a file named "john smith.doc" in a folder named "e:\old stuff". There are two separate directory entries and the names don't even have to match, but they can point to the same exact MFT record number, and therefore to the same exact data, reducing the amount of spaces being wasted by having duplicates. Opening one of the files and editing it affects both directory entires.

I took the below listed EnScript one step further and actually have the EnScript locate an original file, then for all duplicates, it deletes the dupes and then creates a hardlink to the original file. This basically leaves the directory entry in place, but reduces the amount of space being wasted by pointing all the dupes to the same data as the original. I am not making that version of the EnScript available yet, but I am posting the EnScript that will list all the dupes in the console pane of EnCase and then you can decide what to do with the dupes.


Download Here

Computer Forensics, Malware Analysis & Digital Investigations

Random Articles