Finding duplicate files or the wonderful tool called fdupes

The tool

The tool itself is a pretty simple one and it works on Linux, Unix and MacOSX and it’s designed to find duplicate files within a set of directories.

Installing

With Linux/Unix your chosen distro’s package manager should have a copy of it, just install it as you usually would.
apt-get install fdupes
emerge fdupes
pacman -S fdupes

With MacOSX you need Mac Ports installed and then you can just go ahead and install it.
sudo port install fdupes

Usage

Imagine the scenario, you have daily backups of a file system compressed and archived in the same format with the same parameters. Sometimes the file system changes every day and sometimes it doesn’t for weeks on end. The backups are taken if the system has changed or not. You have a year of backups and you’re sure that about 70% of them are complete duplicates, you’re archiving these backups but the archive media won’t take the current size of 200GB. Your archive media will take the 30% (60GB) which you believe to be true unique backups but how do you remove the duplicates?

The easiest way is to run fdupes with the delete and recursive options.
fdupes --delete --recurse /path/to/backups/

You can even run it so it skips the first entry and deletes all the rest automatically.
fdupes --delete --noprompt --recurse /path/to/backups/

Just remember automatic deleting can cause data loss and you can’t guarantee which copy will be deleted. Experiment first.

Regards,
Robert.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s