3

Study case:

An automatic backup system from all family members via OpenVPN.

A lot of files (especially photos) are common between family members.

So, with a script I replace identical files with hard links.

Then a problem arise: If a user change its file, file changes for all of users. Deletion of file is not a problem, nor rename. Only change of file contents.

So I want when a user changes its file which is a hard link, then hard link eliminated and a copy of original file created with applied changes.

Is this possible with any filesystem or any hack or feature?

Chameleon
  • 207
  • 2
  • 9
  • Also looking for something similar. Ideally is filesystem overlay which prevents change inodes having more than one link – x'ES Feb 19 '22 at 01:12

3 Answers3

7

You're looking for the reflink feature, which was introduced in 2009. It only works with certain filesystems – currently Btrfs, XFS, and the upcoming Bcachefs. (ZFS is still working on it.)

Use --reflink to create a CoW copy when possible (this is already the default as of coreutils 9.0), or --reflink=always if you want to make sure it'll never fall back to doing a full copy:

cp --reflink OLDFILE NEWFILE

The new file will have a different inode, but will initially share all data extents with the original (which can be compared using filefrag -v FILE or xfs_io -rc "fiemap -v" FILE).


An alternative is filesystem deduplication, which is supported by Btrfs and ZFS among others, and allows merging identical blocks underneath existing files. In ZFS this happens synchronously ("online" or as soon as the file is written), while in Btrfs it's done as a batch job (i.e. "offline", using tools such has Bees or 'duperemove'). Unfortunately, online deduplication in ZFS has a significant impact on resource usage. If you use Btrfs, however, you can just run duperemove -rd against the folders once in a while.

Finally, whether you use reflinks or dedupe, you'll also want to use backup tools that themselves perform deduplication (it is not enough to use a hardlink-aware backup tool, as reflinks don't look like hardlinks). For example, the archive formats used by Restic and Borg are content-addressed (much like Git), so identical blocks will automatically be stored only once per repository, even if they occur in separate files.


The OCFS2 cluster filesystem on Linux also has "reflinks" at least in name, but doesn't support the standard reflink creation API, so they have to be created using an OCFS2-specific tool.

On Windows, ReFS supports reflinks under the name "block cloning" (though it doesn't seem to come with a built-in CLI tool); NTFS does not. Finally on macOS, cp -c will create reflinks (CoW copies) as long as you're using APFS.

u1686_grawity
  • 452,512
1

Another possibility is to set up a shared directory with the sticky bit set.

On a Linux system, the /tmp directory has the permissions drwxrwxrwt, or 1777 in numeric terms, which ensures that anybody can write anything in there, but once that do those files belong to them and can't be modified or deleted by other users, so you maintain a concept of ownership of the files.

So, you can create a directory with these same permissions as a kind of group directory.

This doesn't exactly work in the way you describe above, but it does ensure that users can put whatever files they like in there, into a new directory which they themselves create or just into the root, and no other user can delete or modify any files belonging to other users. The sticky bit on the directory is what achieves the second part - normally if you give a directory world-writable permissions, then people can delete the files of others. With the sticky bit, people cannot delete or modify files not belonging to them.

For any common files that you want anyone to be able to read, you just put it in there yourself and leave it owned by you.

Note: unlike in your question, other people won't be able to modify or delete these files, but they would just need to be instructed that they can set up their own copy wherever they like and modify that instead.

The downside to this is that they can't transparently delete or modify whatever they like if it doesn't belong to them.

But the upside to this solution is that everything they put in there is shared with everyone else, who can all open it and read it.

thomasrutter
  • 1,938
0

With a bit of planning you can achieve what you want with overlayfs.

You would take all the files common to all instances and put them into the normally read-only lower directory.

Then, user overlayfs to mount a separate upper directory over the top of it for each differently modifiable copy you want. The upper directory can be empty to begin with meaning that each person just what's in the lower directory unmodified.

When an existing file is modified in any mounted overlay, it causes the file to be copied to the upper directory and overlay it for that instance only. This is completely transparent. Same with deletions and newly created files - which will only affect the upper directory currently in use. The user who did the changes will see it in their instance, but it won't affect the lower directory or other peoples' instances.

Over time if different people add different things in to their own instance of it, they'll end up getting more and more different but if you ever want to consolidate things you can periodically go through and determine anything that should be the same for all users and move it into the normally read-only lower directory.

The only issue I can forsee with this setup (which is the same for both solutions) is that if a user wants to share a file with the other users, anything they add won't be shared and will only be visible to them. If you want that, there's another possibility.

thomasrutter
  • 1,938