The tool
The tool itself is a pretty simple one and it works on Linux, Unix and MacOSX and it’s designed to find duplicate files within a set of directories.
Installing
With Linux/Unix your chosen distro’s package manager should have a copy of it, just install it as you usually would.
apt-get install fdupes
emerge fdupes
pacman -S fdupes
With MacOSX you need Mac Ports installed and then you can just go ahead and install it.
sudo port install fdupes
Usage
Imagine the scenario, you have daily backups of a file system compressed and archived in the same format with the same parameters. Sometimes the file system changes every day and sometimes it doesn’t for weeks on end. The backups are taken if the system has changed or not. You have a year of backups and you’re sure that about 70% of them are complete duplicates, you’re archiving these backups but the archive media won’t take the current size of 200GB. Your archive media will take the 30% (60GB) which you believe to be true unique backups but how do you remove the duplicates?
The easiest way is to run fdupes with the delete and recursive options.
fdupes --delete --recurse /path/to/backups/
You can even run it so it skips the first entry and deletes all the rest automatically.
fdupes --delete --noprompt --recurse /path/to/backups/
Just remember automatic deleting can cause data loss and you can’t guarantee which copy will be deleted. Experiment first.
Regards,
Robert.