Finding duplicate files and saving space in Unix filesytems

trimtrees.pl is a useful script to save space on unix file-systems. It works by looking at all the files in a list of directories and replacing duplicate files by hard linking to the first copy of the file. This has the advantage that the file will still appear at both locations in the file-system but only use up one place on the disk. This could lead to problems later if you modify one of the files and aren’t expecting the other to change, but for saving space from my static back files it’s ideal.

Below is example of saving space, one of my backup drives which contains multiple snapshots of my work became 100% full. trimtrees.pl needs a list of directories to trawl through so here I used the * to provide a list. (I could have also listed tjhe directories e.g. “BACKUP-JAN15 BACKUP-MAR15” etc.


prompt> perl trimtrees.pl *
tlds[6]cur[35]uniq[789_669]fils[3_613_975]spcused[528_432_303_246]saved[653_207_312_012]
DONE

I’m really happy this freed up a whole lot of space.


prompt> df -h
Filesystem Size Used Avail Use%
/dev/sdb1 1.4T 494G 812G 38%

Next question
What I would like to know next is how compatible this hard linking can be with rsync, I guess it’s not that compatible since it probably changes the time stamp to that of the oldest date on a unique file (I didn’t check this).

One Response

  1. tamam la Thank you for posting this blog. Rural Mexico is a very beautiful place I hope the recent violence is far from where you are.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: