Path: utzoo!attcan!uunet!fernwood!decwrl!wam.UMD.EDU!djm
From: djm@wam.UMD.EDU
Newsgroups: comp.sources.d
Subject: Re: An idea for safer and portable unshar-ing
Message-ID: <8910020054.AA08811@cscwam.UMD.EDU>
Date: 2 Oct 89 00:54:08 GMT
References: <1989Sep30.171114.12550@chance.UUCP>
Organization: University of Maryland
Lines: 64

In article <1989Sep30.171114.12550@chance.UUCP> john@chance.UUCP (John R. MacMillan) writes:
>In order to make it easier for unshar programs to work without
>using /bin/sh, perhaps we should agree (hah!) upon some keyword
>directives that shar programs would include as comments. Eg.

This suggestion seems to be moving in the direction of making archives
that plain old /bin/sh can't unpack at all.  Perhaps it's not a bad
idea.  An easier to parse, more standardized pure-ASCII archiving
format than a shell archive would certainly be more appropriate for
Amiga, MS-DOS, VMS, etc. postings, and would allow the packing and
unpacking programs more versatility, security and control on Unix
systems as well.

Right now there is a profusion of shar programs that generate all kinds
of codes to split up the included files, using sed, cat, wc, etc. and
starting some or all lines with 'X' or '|' or '\tx' or who knows what
else; secure unshar programs written in C have to simulate that, and as is
becoming clear in this discussion, interpreting all of those formats
requires implementing a substantial subset of the /bin/sh syntax -- a
task which is much more difficult than required by the task of unpacking
an ASCII archive.  Of course, only one unshar program really need exist,
as long as it is comprehensive and portable.  Perhaps Rich Saltz's new
release will satisfy everyone.

I would like to see a replacement for shell archives that would have a
simple to parse format similar to the one John suggested.  It would have
a header section for the whole archive, giving information like:

# PARTS total number of parts
# PART number of the this part
# CREATED date of creation (ctime format would do, I guess)
# CONTAINS names of the files it contains

There would be another header section for each file extracted, with
information like:

# FILE file name
	or
# DIRECTORY directory name
# OFFSET starting offset of this part, to allow continuation of long files
# BYTES file length
# CHECKSUM checksum for original file
# MODIFIED last modification date of file
# ENCODING encoding method: ASCII, atob, others?

Comments could have the same format as shell comments.
The ASCII encoding could be accomplished by adding an extra '#' at the
start of all lines in enclosed files that start with a '#', and then
changing an initial '##' back to '#' when unpacking.

I haven't decided whether this format should require the presence of
external programs to do part of the work, like atob, compress, and sum.

>The tough part would be getting people to make their shar
>programs generate it.

I think the harder part would be getting the programs that generate the
suggested format into the hands of everyone who wants to distribute
source code, and the programs that decode it into the hands of everyone
who wants to use programs distributed in that format.  In addition to
tar, cpio, uu*code, [ab]to[ba], compress, arc, zip, and perhaps unshar,
people would need to have another archiver/unarchiver.  Tower of Babel!
-- 
David J. MacKenzie