Path: utzoo!attcan!uunet!fernwood!decwrl!wam.UMD.EDU!djm From: djm@wam.UMD.EDU Newsgroups: comp.sources.d Subject: Re: An idea for safer and portable unshar-ing Message-ID: <8910020054.AA08811@cscwam.UMD.EDU> Date: 2 Oct 89 00:54:08 GMT References: <1989Sep30.171114.12550@chance.UUCP> Organization: University of Maryland Lines: 64 In article <1989Sep30.171114.12550@chance.UUCP> john@chance.UUCP (John R. MacMillan) writes: >In order to make it easier for unshar programs to work without >using /bin/sh, perhaps we should agree (hah!) upon some keyword >directives that shar programs would include as comments. Eg. This suggestion seems to be moving in the direction of making archives that plain old /bin/sh can't unpack at all. Perhaps it's not a bad idea. An easier to parse, more standardized pure-ASCII archiving format than a shell archive would certainly be more appropriate for Amiga, MS-DOS, VMS, etc. postings, and would allow the packing and unpacking programs more versatility, security and control on Unix systems as well. Right now there is a profusion of shar programs that generate all kinds of codes to split up the included files, using sed, cat, wc, etc. and starting some or all lines with 'X' or '|' or '\tx' or who knows what else; secure unshar programs written in C have to simulate that, and as is becoming clear in this discussion, interpreting all of those formats requires implementing a substantial subset of the /bin/sh syntax -- a task which is much more difficult than required by the task of unpacking an ASCII archive. Of course, only one unshar program really need exist, as long as it is comprehensive and portable. Perhaps Rich Saltz's new release will satisfy everyone. I would like to see a replacement for shell archives that would have a simple to parse format similar to the one John suggested. It would have a header section for the whole archive, giving information like: # PARTS total number of parts # PART number of the this part # CREATED date of creation (ctime format would do, I guess) # CONTAINS names of the files it contains There would be another header section for each file extracted, with information like: # FILE file name or # DIRECTORY directory name # OFFSET starting offset of this part, to allow continuation of long files # BYTES file length # CHECKSUM checksum for original file # MODIFIED last modification date of file # ENCODING encoding method: ASCII, atob, others? Comments could have the same format as shell comments. The ASCII encoding could be accomplished by adding an extra '#' at the start of all lines in enclosed files that start with a '#', and then changing an initial '##' back to '#' when unpacking. I haven't decided whether this format should require the presence of external programs to do part of the work, like atob, compress, and sum. >The tough part would be getting people to make their shar >programs generate it. I think the harder part would be getting the programs that generate the suggested format into the hands of everyone who wants to distribute source code, and the programs that decode it into the hands of everyone who wants to use programs distributed in that format. In addition to tar, cpio, uu*code, [ab]to[ba], compress, arc, zip, and perhaps unshar, people would need to have another archiver/unarchiver. Tower of Babel! -- David J. MacKenzie