ZFS replication strategies with encryption
This is an idea proposed in 2025 as a good starter project, and is currently being worked on by Becky Terefe-Zenebe. It is co-supervised with Mark Elvers.
We are using ZFS in much of our Planetary Computing infrastructure due to its ease of remote replication. Therefore, its performance characteristics when used as a local filesystem are particularly interesting. Some questions that we need to answer about our uses of ZFS are:
- We intend to have an encrypted remote backups in several locations, but only a few of those hosts should have keys and the rest should use raw ZFS send streams.
- Does encryption add a significant overhead when used locally?
- Is replication faster if the source and target are both encrypted vs a raw send?
- We would typically have a snapshot schedule, such as hourly snapshots with a retention of 48 hours, daily snapshots with a retention of 14 days, and weekly snapshots with a retention of 8 weeks. As these snapshots build up over time, is there a performance degradation?
- Should we minimise the number of snapshots held locally, as this would allow faster purging of deleted files?
- How does ZFS send/receive compare to a peer-to-peer backup solution like Borg Backup, given that it gives a free choice of source and target backup file system and supports encryption?
- ZFS should have the advantage of knowing which blocks have changed between two backups, but potentially, this adds an overhead to day-to-day use.
- On the other hand, ZFS replicants can be brought online much more quickly, whereas Borg backup files need to be reconstructed into a usable filesystem.