I am searching for a backup/snapshot solution and I came to the conclusion that
git is maybe the best solution for me (although it was not intended for that : ).
I will use it for hourly backups on a notebook, which often is on battery power. It shall protect against a failure of the hard drive, but also if something gets deleted/changed accidentally by the user or by an application (backup operations will be done by root + chown/-grp root + chmod go-w -R). Overhead is not a relevant issue (at least to a large extent). RAID is not appropriate as it permanently increases the power consumption and does not protect against deletion/changes, too. LVM snapshots are too unreliable (e.g., when their size exceeds allocated storage) and don’t protect for drive failure.
Most solutions I used before, e.g., rsnapshot, are based upon
rsync: this always (in my case, hourly) creates chunks and hashes of both origin and target (all stored data!) to identify changes for backup. This consumes too much power if I am on battery - and it does it hourly of all data, even if there are no (or just minor) changes.
This is what led me to
commit creates a backup, including snapshots that can be restored, and it does not always create chunks and hashing of both origin and target: it does hashing+compressing+creating-its-structures once when creating a “backup” (=commit) but only of things that have changed (re-hash+…+… of unchanged data not necessary; that data remains stored).
Nevertheless, the process of git seems to take more time than I expected (I tested with a 700 mb file) because in the end, it still does sha1 hashing and additionally compressing. But I think a precise comparison is not needed because I expect that the files I will store will indeed need some storage (GB+, not MB, maybe more over time) but the changes per hour will be usually in KB or MB size ranges (seldomly more). Therefore, git will not hash/compress much each hour while it will not touch the remaining GB of unchanged data (rsync would…). Thus, with the assumption that there will be a lot of data stored (which all would be always chunked & hashed by rsync) while there will be just minor changes per hour (or not even that), I thought git is a good solution that is on average less power consuming than rsync-solutions. Also, git is a well proven and reliable technology.
My test: I make
git init in the folder I want to backup, then I move the content of
.git to the second drive and mount it on the
.git folder. In my test, I simulated a loss of all data in the drive, except the
.git folder (which is then on another drive). However, I could still do a
git clone of the “empty” folder (because still containing
.git; root-owned). The clone contained then again all data. The backup process will be automated with a python script.
However, git is not intended for that. So, I would like to know what you think of it. Have I missed something? Does the compromise against rsync make sense (given my assumption: much stored data, not much hourly changes)? Let me know if something is not clear. I worry about thinking errors
I am aware that I will need to delete from time to time old commits. But this is fine.