Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!brl-adm!rutgers!ames!cit-vax!mangler From: mangler@cit-vax.Caltech.Edu (System Mangler) Newsgroups: comp.unix.wizards Subject: Re: Backup of a live filesystem revisited Message-ID: <1392@cit-vax.Caltech.Edu> Date: Sat, 20-Dec-86 14:56:08 EST Article-I.D.: cit-vax.1392 Posted: Sat Dec 20 14:56:08 1986 Date-Received: Sat, 20-Dec-86 22:09:09 EST References: <4760002@hpirs.HP> <1226@ho95e.UUCP> Organization: California Institute of Technology Lines: 55 Summary: All live backup methods have problems In article <1226@ho95e.UUCP>, wcs@ho95e.UUCP (#Bill.Stewart) writes: > File-system based programs can work on live systems as long as the individual > files are not changing. They are slow but flexible, and do incremental dumps > well. > > Disk-based backup programs are normally much faster, but are unsafe on live > file systems; I claim that both types are unsafe, for the SAME reasons. In both cases, a file's inode is read (either by read, or by stat), and based on that information the rest of the file is read. Reading the inode is an atomic operation, because the inode is completely contained in one disk sector, so the inode will always be internally consistent. However, after the inode is read, the information that it points to may be freed by a creat(), and scribbled upon, before the backup program reads it. The program will either get garbage, or EOF, but in either case it has to write SOMETHING on the tape now that it has committed itself by writing out a header saying that the next st_size bytes are the contents of the file. That's one kind of corruption, and probably not that bad. It doesn't matter that you got garbage, the file was being zapped anyway, and will appear on the next backup tape. The important thing is to not bomb on it. Another is when the file is removed/renamed between the time that it's selected for backup and the time it actually gets read. This is simple to handle; just skip that file. The insidious case, though, is when subdirectories get moved out of a directory that hasn't been backed up yet, and into one that has already been done or was being skipped. That subtree won't be restored at all, and won't be on a subsequent incremental tape either, because the files didn't change. Filesystem-based backup programs won't even know that they missed something; disk-based programs will at least have a way to know that something happened, because they will come up with all these orphaned inodes. Presumably, these should get linked into lost+found. (I haven't looked to see what *actually* happens). Dump has the additional advantage that all the directories are read very early, so the window of vulnerability is smaller. Sure, I've gotten bad dumps. In large part I think this happened because the system mangler before me changed dump to wait for a tape mount between pass II and pass III, and at that time tape mounts often took hours - creating a very large window of vulnerability. > Disk-based backup programs are normally much faster, Making it feasible to keep one's backups more up-to-date. Don Speck speck@vlsi.caltech.edu {seismo,rutgers,ames}!cit-vax!speck