Finding duplicate files and saving space in Unix filesytems

trimtrees.pl is a useful script to save space on unix file-systems. It works by looking at all the files in a list of directories and replacing duplicate files by hard linking to the first copy of the file. This has the advantage that the file will still appear at both locations in the file-system but only use up one place on the disk. This could lead to problems later if you modify one of the files and aren’t expecting the other to change, but for saving space from my static back files it’s ideal.

Below is example of saving space, one of my backup drives which contains multiple snapshots of my work became 100% full. trimtrees.pl needs a list of directories to trawl through so here I used the * to provide a list. (I could have also listed tjhe directories e.g. “BACKUP-JAN15 BACKUP-MAR15” etc.


prompt> perl trimtrees.pl *
tlds[6]cur[35]uniq[789_669]fils[3_613_975]spcused[528_432_303_246]saved[653_207_312_012]
DONE

I’m really happy this freed up a whole lot of space.


prompt> df -h
Filesystem Size Used Avail Use%
/dev/sdb1 1.4T 494G 812G 38%

Next question
What I would like to know next is how compatible this hard linking can be with rsync, I guess it’s not that compatible since it probably changes the time stamp to that of the oldest date on a unique file (I didn’t check this).

Advertisements

Fabrication of Magnesium diboride superconductor

Bhadesha123 has posted a nice video on youtube discussing the fabrication of Magnesium diboride superconductors.

This seems an interesting metallurgical case study about developing a processing route. Advantages of swagging and drawing are discussed as well as the importance of preventing grain growth to allow processing and for control of material properties.

The video was provided by Professor Bartek Glowaki of the University of Cambridge, who filmed, directed and edited the videos.