Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ut-sally!std-unix
From: ka@gatech.UUCP@opus.uucp (Kenneth Almquist)
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <8440@ut-sally.UUCP>
Date: Sat, 4-Jul-87 20:49:11 EDT
Article-I.D.: ut-sally.8440
Posted: Sat Jul  4 20:49:11 1987
Date-Received: Sat, 11-Jul-87 04:09:11 EDT
Sender: std-unix@ut-sally.UUCP
Reply-To: ka@gatech.UUCP@opus.uucp (Kenneth Almquist)
Lines: 73
Approved: jsq@sally.utexas.edu (Moderator, John Quarterman)

From: ka@gatech.UUCP@opus.uucp (Kenneth Almquist)

OK, here is a simple, backward compatible fix to the tar format.  When
tar encounters a file name which is a link to a file that it previously
dumped, it should first write out a header for the file indicating that
it is a link to a previously dumped file.  It should then write out
another header for the file, this time without linkflag set, and follow
this header with the contents of the file.  This way, if the first link
to a file is not dumped, its contents will be available later when
subsequent links are dumped.

This is backward compatible because an old version of tar would make the
link when it read the first header, and then dump the contents of the
file when it read the second header.  Dumping the contents of the file
does no harm because is will not modify the contents of the file.  Of
course, new implementations of tar might want to recognize this situation
and avoid dumping the contents of the file, but only for reasons of
efficiency.

I noted Marc Mengel's suggestion that tar write out the contents of a file
when the last link to a file is encountered, rather than the first.  This
would be nice, but I don't see how it could be done in a way that is
backward compatible with the current tar format.  I also read Michael
Gersten's article suggesting that tar could rewind raw magnetic tapes by
closing them and openning them again.  This proposal doesn't deal with
the question of how cpio could be made to use the tar format, since cpio
reads from its standard input, which it has no way of closing and openning
again, and it also ignores the case where tar is reading from a pipe
because the tape drive is not on the same machine that tar is running on.
So I feel that the above change to the tar format is necessary.

The remaining problem with the tar format is the limit on the file name
size.  If memory serves, cpio originally limited file names to 127 char-
acters, and this was recognized as inadequate and increased to 255 char-
acters.  The current maximum file name in tar is 99 characters.

However, the maximum file name supported by tar can be increased while
still allowing files whose names are not more than 99 characters long to
be read by existing implementations.  I will suggest one possibility here.
Increase the size of the linkname field to 200 characters.  Since this
field is at the end of the header structure, this will not alter the
location of any of the other fields.  Place a 100 character name exten-
tion field after the linkname field.  If the file name field does not
contain a nul terminator, the remainder of the file name is assumed to
be in the file name extention field.  This scheme allows file names of
up to 199 characters to represented, which comes close to the 255
character limit of the current cpio implimentation.  It leaves 55 bytes
of the header free for future expansion.

These changes to the tar format would make it possible to write a program
which used the tar format, but otherwise behaved exactly like cpio except
for a slight decrease in the maximum file name length.

I still don't like it, mind you.  I receive a lot more programs over the
net than I do via tape, and here tar fails miserably because it has nul
characters in the header which news and mail programs cannot handle.  It
is hard to get excited over a standard that fails to handle the most
common case (or more accurately, what is the most common case for me).
But I agree with Henry Spencer's statement that the role of standards
committees should be to standardize existing practice, with at most minor
changes.  So we should either forget about developing a standard now, or
standardize on the most widely available format (which is tar) after fixing
the major problems with it.  I could go for either approach.
				Kenneth Almquist

P.S.  A lot of nonsense has appeared in this group about the supposed
      deficiencies of cpio, which I won't rebut since I don't support
      using the cpio format as a standard.  Just please take it all with
      a grain of salt.

[ I'd rather have details than innuendo, thanks.  -mod ]

Volume-Number: Volume 11, Nuopw
w
w