Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ut-sally!std-unix From: ka@gatech.UUCP@opus.uucp (Kenneth Almquist) Newsgroups: comp.std.unix Subject: Re: tar vs. cpio Message-ID: <8440@ut-sally.UUCP> Date: Sat, 4-Jul-87 20:49:11 EDT Article-I.D.: ut-sally.8440 Posted: Sat Jul 4 20:49:11 1987 Date-Received: Sat, 11-Jul-87 04:09:11 EDT Sender: std-unix@ut-sally.UUCP Reply-To: ka@gatech.UUCP@opus.uucp (Kenneth Almquist) Lines: 73 Approved: jsq@sally.utexas.edu (Moderator, John Quarterman) From: ka@gatech.UUCP@opus.uucp (Kenneth Almquist) OK, here is a simple, backward compatible fix to the tar format. When tar encounters a file name which is a link to a file that it previously dumped, it should first write out a header for the file indicating that it is a link to a previously dumped file. It should then write out another header for the file, this time without linkflag set, and follow this header with the contents of the file. This way, if the first link to a file is not dumped, its contents will be available later when subsequent links are dumped. This is backward compatible because an old version of tar would make the link when it read the first header, and then dump the contents of the file when it read the second header. Dumping the contents of the file does no harm because is will not modify the contents of the file. Of course, new implementations of tar might want to recognize this situation and avoid dumping the contents of the file, but only for reasons of efficiency. I noted Marc Mengel's suggestion that tar write out the contents of a file when the last link to a file is encountered, rather than the first. This would be nice, but I don't see how it could be done in a way that is backward compatible with the current tar format. I also read Michael Gersten's article suggesting that tar could rewind raw magnetic tapes by closing them and openning them again. This proposal doesn't deal with the question of how cpio could be made to use the tar format, since cpio reads from its standard input, which it has no way of closing and openning again, and it also ignores the case where tar is reading from a pipe because the tape drive is not on the same machine that tar is running on. So I feel that the above change to the tar format is necessary. The remaining problem with the tar format is the limit on the file name size. If memory serves, cpio originally limited file names to 127 char- acters, and this was recognized as inadequate and increased to 255 char- acters. The current maximum file name in tar is 99 characters. However, the maximum file name supported by tar can be increased while still allowing files whose names are not more than 99 characters long to be read by existing implementations. I will suggest one possibility here. Increase the size of the linkname field to 200 characters. Since this field is at the end of the header structure, this will not alter the location of any of the other fields. Place a 100 character name exten- tion field after the linkname field. If the file name field does not contain a nul terminator, the remainder of the file name is assumed to be in the file name extention field. This scheme allows file names of up to 199 characters to represented, which comes close to the 255 character limit of the current cpio implimentation. It leaves 55 bytes of the header free for future expansion. These changes to the tar format would make it possible to write a program which used the tar format, but otherwise behaved exactly like cpio except for a slight decrease in the maximum file name length. I still don't like it, mind you. I receive a lot more programs over the net than I do via tape, and here tar fails miserably because it has nul characters in the header which news and mail programs cannot handle. It is hard to get excited over a standard that fails to handle the most common case (or more accurately, what is the most common case for me). But I agree with Henry Spencer's statement that the role of standards committees should be to standardize existing practice, with at most minor changes. So we should either forget about developing a standard now, or standardize on the most widely available format (which is tar) after fixing the major problems with it. I could go for either approach. Kenneth Almquist P.S. A lot of nonsense has appeared in this group about the supposed deficiencies of cpio, which I won't rebut since I don't support using the cpio format as a standard. Just please take it all with a grain of salt. [ I'd rather have details than innuendo, thanks. -mod ] Volume-Number: Volume 11, Nuopw w w