dupmerge overview
=================

Dupmerge reads a list of files from standard input (eg., as produced by 
"find . -print") and looks for identical files. When it finds two or more
identical files, all but one are unlinked to reclaim the disk space and 
recreated as hard links to the remaining copy.

Remarks: 
dumpmerge should be used only for backups or archives, where duplicate
files are not needed; it should never be used without nodo mode for /home,
/tmp, /var and most other directories.
The normal mode, hard linking of multiple files, causes no problems in backups
or archives and can also be used on CDs/DVDs. On filesystems without hard
links, e. g. FAT (FAT12, FAT16, FAT32, VFAT ...), it can not work.
The sparse mode never causes problems (on file systems which support sparse). 
The deletion mode can cause trouble e. g. with ebooks or html documents with
pictures which are multiple. Therefore the deletion mode should only be used
with files which are not assoziated, e. g. audio or video files. The deletion
mode works on all (writable) file systems.

Normal mode: Saves approx. 20 % space.

Sparse mode: Saves approx. 0.2 % space.

Deletion mode: Deletes approx. 10 % of the files.

All similar Programs look problematic: highlnk and FSlint do use md5sum and
therefore are vunerable to md5sum collsions. With the hashing they are O(n)
but not safe.
FSlint:
=======
FSlink also has another bad feature: "Non stripped binaries". Generally it's
a good idea to strip. I also do it with my programs. But stripping the binarys
in /bin, /usr/bin etc. the system gets corrupt so that e. g. not network is
availible and many other things do not work. The reason seems to be that strip
is not bug-free.
With prelinking (not in FSlint) it's the same.
A not usefull feature of FSlint is "redundant whitespace" because it depends
on the circumstances if a whitespce is redundant.
The features "Empty directories" only makes sense in archives, e. g. for ftp
servers. 
Bug: Bad filenames with newline(s) inside are not shown when the path to that
file is a bad name. Example:
ftp.lugcamp.de/chaas/1/b%FCcher/b%FCcher/books/Unix2/upt/ch09_14??tm
Here the two ?? are newlines.
=> Use Detox or other programs.
Other Bug: Used directories are reported as empty.
Example:
ftp.lugcamp.de/chaas/1/b%FCcher/b%FCcher/books/java/awt/examples/chap15/
=> Use e. g.
find ./ -type d -empty
for finding empty directorys
and, if you are sure you dont't need them, use 
find ./ -type d -empty -print0 | xargs -0 rmdir
to delete them.
And there is another problem: When several thousend files with bad names are
found it does not make sense to change them manually.
It only seems realistic to list them in a nondo mode and to modify 
them automatically in another mode.

RF, 2005-4-24
