Friday, May 2, 2008

Find duplicate files

This post is not really "forensic" related but it may have use to others...

If you are like me, you have several hard drives laying around that have all sorts of "stuff" on them. Things you have acquired over time and save with the thought that the information, programs or files may come in handy someday in the future. Then after I fill that drive up, I end up upgrading to a larger drive, but I don't want to delete anything in case I need it! After reviewing several of these drives, I found I had lots of duplicate files scattered throughout the drives in various folders.

My solution to this was to write an EnScript so that I could preview all my drives at once and then the EnScript will hash all the files and list all the duplicates for me. I can then decide if I want to manually delete the dupes or not. It also lists how much space is wasted by having all the dupes.

Most of my drives are formatted using NTFS. The NT file system has a feature called hard links. This basically allows you to have multiple directory entries for the same file, but it only takes up the space of one file. This is because all the directory entries can point to the same MFT record and the same data. For example, you can have a file named "lance mueller.doc" in a folder named "e:\documents\", and then have a file named "john smith.doc" in a folder named "e:\old stuff". There are two separate directory entries and the names don't even have to match, but they can point to the same exact MFT record number, and therefore to the same exact data, reducing the amount of spaces being wasted by having duplicates. Opening one of the files and editing it affects both directory entires.

I took the below listed EnScript one step further and actually have the EnScript locate an original file, then for all duplicates, it deletes the dupes and then creates a hardlink to the original file. This basically leaves the directory entry in place, but reduces the amount of space being wasted by pointing all the dupes to the same data as the original. I am not making that version of the EnScript available yet, but I am posting the EnScript that will list all the dupes in the console pane of EnCase and then you can decide what to do with the dupes.


Download Here

7 comments:

Anonymous Saturday, 03 May, 2008  

Find Duplicate Files
i have the same problem few days ago... and i have found Duplicate Finder to solve it.

Duplicate Finder will look for duplicate files and list them with details than i can choose which one to delete.

you can download the trial here : http://www.ashisoft.com

Anonymous Wednesday, 07 May, 2008  

Great work! I also have the same issues with drive space so this is fantastic.

Robert

Anonymous Monday, 12 May, 2008  

Hey You!!!

So you have a blog too? Kinda geeky...mine is wacko. Check out my blog when you get a chance...btw, my blog name is "Random Chick" but you know me as Dana... hee hee. Don't tell anyone my real name or I'll get super pissed at you!

Anonymous Monday, 16 June, 2008  

Hard links on NTFS don't maintain their own MAC timestamps, do they?

For exporting files to NTFS, remember to use the \\?\ path prefix to get around MAX_PATH limitations. Unfortunately, if you want other software to deal with the exported files, you might be SOL. A *lot* of them can't deal with long paths.

Anonymous Saturday, 11 October, 2008  

Try Directory Report
http://www.file-utilities.com/downloads/wdir.zip

It can find duplicates based on the same name, size, CRC and/or comparing byte by byte

and it can replace duplicates with hard links

Anonymous Sunday, 08 February, 2009  

I find duplicate finder is the best tool to deal with duplicate files. Its fast in searching duplicates and have more options to deal with them.

http://www.duplicate-finder.net

Anonymous Sunday, 15 February, 2009  

Try this free tool
http://www.mindgems.com/products/Fast-Duplicate-File-Finder/Fast-Duplicate-File-Finder-About.htm
It is extremely fast and has an internal preview too.

Post a Comment

Computer Forensics, Malware Analysis & Digital Investigations

Random Articles