Tool to automatically update symlinks when moving files

Tue Jul 7 15:27:49 EDT 2009

[on-list reply to messages sent off-list; with author's permission]

On Tue, Jul 7, 2009 at 7:29 AM, <VirginSnow at vfemail.net> wrote:
>> Using absolute symlinks may be appropriate, depending on what you're
>> trying to do.
>
> Do you mean "absolute symbolic links" or "hard links"?

  The former.  Symlinks (symbolic links) are very different beasts
from hard links.

  Here's the full story.  Some of this is likely to be review; if you
just want the punch line, skip to the next quoted message portion.

INODES AND HARD LINKS

  In the standard Unix filesystem model, the fundamental building
block is the inode.  Each inode has number, unique within that
filesystem.  File data (the "contents" of the file) is associated with
an inode, as are permissions, datestamps, and other metadata.

  (Traditionally, inodes were fixed-size records in a fixed-size
table, allocated when the filesystem was made.  Some newer filesystems
allocate inodes dynamically.  Some filesystems don't really have
inodes at all; to use them in a Unix-like system, the filesystem
driver has to synthesize inode numbers.)

  What we see as "file names" are just entries in a directory which
reference the inode.  Each such reference is a "hard link".  Most of
the time, a "data file" has just the one hard link.  But you can have
multiple directory entries linked to a single inode.  Those are
typically created with the ln(1) command (without the "-s" switch).
Since hard links are to inodes, they cannot cross filesystem
boundaries.

  Each inode has a reference count, which is the number of hard links
to the inode.  When a hard link is created, the count is incremented;
when a hard link is removed, the count is decremented.  When that
count drops to zero, the filesystem driver deallocates the storage
from the inode and marks it as free.  This is why the system call to
"delete a file" is named unlink(2).

  Traditionally, directories are themselves stored "in" inodes. When
the names of other directories appear in a directory, that's just a
hard link to the directory in question.  That includes "." and "..".
The "." entry in a directory is just a hard link to itself.  The ".."
entry is a link to the parent directory for each directory, except the
root directory, where ".." is another link to itself.  This assumption
is built-in to Unix; as you've discovered, many Unix tools depend on
this to navigate the filesystem properly.

  Hard links target inodes, and all hard links are created equal.  As
far as the filesystem is concerned, there is no "original file" -- one
hard link to the same inode is as good as another.  Thus, renaming
"other files" which also happen to link to the same inode doesn't
matter -- all you're really doing is changing the text in a directory
entry somewhere.

SYMLINKS

  Symlinks are totally different.  They exist only as directory
entries; they normally don't consume inodes (unless the target is very
long).  Symbolic links reference the name of another filesystem entity
-- another directory entry.  Since they don't use inode numbers, they
can cross filesystem boundaries.   Unlike hard links, symlinks are
"second class citizens" in the filesystem .  Each symlink has a clear
"source" and "target".  That target is the "real" "file"; the symlink
is just a reference.  If you move or rename the target, the symlink
will now point at something that doesn't exist.

  Symlinks can be relative or absolute.  Absolute symlinks specify the
taget by including the full path, all the way from the root.  You can
identify them because they start with a leading slash (/).  Relative
symlinks are relative to the directory that contains them, and do not
have a leading slash.  The can reference "upwards" in the directory
tree, though, by using "..".

  For example, suppose in directory "/bin", we have two symlinks, as
below.  "foo" is absolute; "bar" is relative:

	foo -> /etc/passwd
	bar -> ../etc/pasdwd

  Relative symlinks are useful because if a host's filesystem is
mounted beneath another filesystem, they keep pointing to the same
files.  In the above example, "bar" will always point to that host's
password file, even if that host's root is mounted on another system.
Conversely, absolute symlinks are useful because they keep pointing to
the same files, even if they are moved.  If I move "foo" to
"/usr/local/bin/", it will still point to the same system password
file, while "bar" would point to a presumably-non-existent
"/usr/local/etc/passwd".  This technique can also be used to have a
file which always references the host examining the file.

On Tue, Jul 7, 2009 at 1:34 PM, <VirginSnow at vfemail.net> wrote:
> I think I know where you were going... fixing-up absolute symlinks
> would be easier than fixing-up relative ones, right?

  If you're trying to maintain various partial forks of a directory
branch, or something like that, then absolute symlinks should be
easier to maintain.  For example, if you've got:

	/master/bin/baz

and "baz" is always going to be there, you could create endless
derivative forks:

	/fork1/bin/ding -> /master/bin/baz
	/fork2/blah/bin/dong -> /master/bin/baz

You can move, copy, and rename ding and dong all you want, as long as
the master stays untouched.

  Of course, maybe you're doing something else, in which case, this
isn't likely to help you.

  Maybe if we knew what you were doing, it would help.  There might be
better ways to do what you're doing in particular.

  But in general, yes, what you're asking about is problematic, and a
utility to move something while updating symlinks would be useful.

-- Ben