Find duplicate files
This post is not really "forensic" related but it may have use to others...
If you are like me, you have several hard drives laying around that have all sorts of "stuff" on them. Things you have acquired over time and save with the thought that the information, programs or files may come in handy someday in the future. Then after I fill that drive up, I end up upgrading to a larger drive, but I don't want to delete anything in case I need it! After reviewing several of these drives, I found I had lots of duplicate files scattered throughout the drives in various folders.
My solution to this was to write an EnScript so that I could preview all my drives at once and then the EnScript will hash all the files and list all the duplicates for me. I can then decide if I want to manually delete the dupes or not. It also lists how much space is wasted by having all the dupes.
Most of my drives are formatted using NTFS. The NT file system has a feature called hard links. This basically allows you to have multiple directory entries for the same file, but it only takes up the space of one file. This is because all the directory entries can point to the same MFT record and the same data. For example, you can have a file named "lance mueller.doc" in a folder named "e:\documents\", and then have a file named "john smith.doc" in a folder named "e:\old stuff". There are two separate directory entries and the names don't even have to match, but they can point to the same exact MFT record number, and therefore to the same exact data, reducing the amount of spaces being wasted by having duplicates. Opening one of the files and editing it affects both directory entires.
I took the below listed EnScript one step further and actually have the EnScript locate an original file, then for all duplicates, it deletes the dupes and then creates a hardlink to the original file. This basically leaves the directory entry in place, but reduces the amount of space being wasted by pointing all the dupes to the same data as the original. I am not making that version of the EnScript available yet, but I am posting the EnScript that will list all the dupes in the console pane of EnCase and then you can decide what to do with the dupes.
Download Here
7 comments:
Find Duplicate Files
i have the same problem few days ago... and i have found Duplicate Finder to solve it.
Duplicate Finder will look for duplicate files and list them with details than i can choose which one to delete.
you can download the trial here : http://www.ashisoft.com
Great work! I also have the same issues with drive space so this is fantastic.
Robert
Hey You!!!
So you have a blog too? Kinda geeky...mine is wacko. Check out my blog when you get a chance...btw, my blog name is "Random Chick" but you know me as Dana... hee hee. Don't tell anyone my real name or I'll get super pissed at you!
Hard links on NTFS don't maintain their own MAC timestamps, do they?
For exporting files to NTFS, remember to use the \\?\ path prefix to get around MAX_PATH limitations. Unfortunately, if you want other software to deal with the exported files, you might be SOL. A *lot* of them can't deal with long paths.
Try Directory Report
http://www.file-utilities.com/downloads/wdir.zip
It can find duplicates based on the same name, size, CRC and/or comparing byte by byte
and it can replace duplicates with hard links
I find duplicate finder is the best tool to deal with duplicate files. Its fast in searching duplicates and have more options to deal with them.
http://www.duplicate-finder.net
Try this free tool
http://www.mindgems.com/products/Fast-Duplicate-File-Finder/Fast-Duplicate-File-Finder-About.htm
It is extremely fast and has an internal preview too.
Post a Comment