Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!lll-lcc!ptsfa!amdahl!kim
From: kim@amdahl.UUCP (Kim DeVaughn)
Newsgroups: comp.sys.amiga
Subject: Re: Flight Simulator II (and uuencode mess-ups) [long]
Message-ID: <5090@amdahl.UUCP>
Date: Sun, 11-Jan-87 07:22:47 EST
Article-I.D.: amdahl.5090
Posted: Sun Jan 11 07:22:47 1987
Date-Received: Sun, 11-Jan-87 22:38:05 EST
References: <2280@well.UUCP> <340@oliveb.UUCP> <8232@topaz.RUTGERS.EDU> <2880@j.cc.purdue.edu>
Organization: Amdahl Corporation,  Sunnyvale, CA 94086
Lines: 138

In article <2880@j.cc.purdue.edu>, doc@j.cc.purdue.edu (Craig Norborg) writes:
> In article <5017@amdahl.UUCP> kim@amdahl.UUCP (Kim DeVaughn) writes:
> > Yes.  ARC would save about 18K in this case, plus the two postings
> > required for the above two files (after uuencoding the executable)
> > could have been reduced to one posting, and still been under the 64K
> > "limit".
>     There is one problem I think you all are missing.  For reliable 
> transfer of binary files across USENET (read both mail and news), you
> MUST convert them to readable characters of some format or another.
> Sending binary files (which ANY arc'd file is), is not reliable, so
> you would still have to uuencode the arc'd file.

No Craig, I think you misunderstood my earlier posting.  I am very
well aware of the necessity to ship binary around the net in a "readable
character" form ... typically uuencoded.  This is why I was taking the
uu format to task, and suggesting some form of improvement that would
provide error detection, or possibly even error correction.

The two test files mentioned previously were a 34K binary, and a 20K text
file.  For transmission purposes, the binary would need to be uuencoded.
It would then "mass" 48K.  The document file needs no such encoding, and
would go at 20K.  Thus there's 68K to send, and since it's over the 64K
limit (which causes some braindamaged mailers to truncate the file), it
*should* be posted in two parts.

The arc'd file was stated to be 36K, which will uuencode to 49620 bytes,
or approximately 50K.  Ergo ... 68K - 50K is 18K (and 50K is less than
the 64K limit, so it could all go at once).

In this particular case, the size of the binary (34K) and the text file
(20K) together is 54K, which coincidentally is also 18K larger than the
arc'd file (36K), but this is not what I was refering to.


>                                                   Sure this may save
> some time, but not that much!

In this particular test case, arcing the files and then uuencoding them
saves 25% ... not an insignificant amount.  On the other hand, the amount
of compaction varies widely depending on the kind of file(s) one is
dealing with at any given time.  Typically, binary files compact the least,
and source code the most (due to the large quantities of "white space"
found in most program source).  Text/document files usually fall in between.

The amount of compaction is also very dependent upon the compression
algorithm chosen.  As I pointed out with the Juggler's movie.data file,
Compress 4.0 (which uses 12 or 16 bit Lempel-Zev encoding, depending on
the size of machine one runs it on) could only squeeze it by 1005 *bytes*
(about a third of one percent)!  Pack(1) (which uses Huffman encoding) did
far better in this case, getting about a 17% reduction.  This surprised me,
because Compress usually does alot better than Pack.

This is one of the nice things about ARC ... it will choose the "best"
algorithm to use on each *individual* file it processes at arc-time:  NONE,
Run Length Encoding, Huffman (similar to "pack" or "squeeze"), or 12-bit
Lempel-Zev (similar to "compress").  Thus, within any give .arc file, any or
all of the four formats may be present.


>                                And then, it would probably be more
> preferable to use compress instead of arc, since alot more people have
> access to compress, that do not have access to arc.

I'm not so sure we should use either after having read Mike Meyer's recent
posting where he talks about the possibility that running some form of
compression prior to uuencoding can actually result in larger files being
transmitted from site-to-site.  I'd like to see any more information that
anyone has on this.

[ Mike:  I'd like to give your Amiga "tar" program a try ... it would be
         nice to be able to handle directories!  Any chance of you'll be
         posting it? ]

ARC is readily available from Fish Disk #40 for the Amiga version, and
Compress 4.0 is on Fish Disk #6.  Also, someone (sorry, I don't remember
who offhand) is currently porting Compress 4.0 to use Manx.

There are some other possible problems in using either ARC or Compress.
Compress is usually compiled in 12-bit mode for use on micros due to
memory size limitations.  On UNIX(R) boxes like VAXen (and Amdahl's :-) ),
Compress is compiled for 16-bit compression mode.  I do not believe you
can uncompress a 16-bit compressed file with a 12-bit compiled program.
So one may run into some problems using compress depending on where one
does the compress and uncompress.  Also, Compress is not part of the
standard System V release package to my knowledge, so a significant number
of sites mat *not* have it.

ARC, on the other hand, is only starting to become available on UNIX machines.
There have been a couple of postings to net/mod sources, and quite a few bug
reports and hacks to get the posted code up and working on various systems.
Hopefully, a "clean" version will emerge sometime soon ... hard for me to
trust ARC right now on other than an MS-DOS machine (or an Amiga?).  BTW,
the ARC on Fish Disk #40 still has a few bugs with wild-cards and using RAM:.
It also requires the use of MS-DOS style file-names, which is why I'm not
currently using it.


Getting back to the root of the problem that started this discussion ...
as Mike pointed out, the solution, really, is to fix the news s/w and
mailers.  All of them.  All versions.  Everywhere.  This may occur in
1997, but not in 1987!

Barring that, and after some additional thought, I think a reasonable
solution is to enhance uuencode/uudecode to provide error detection, with
isolation to the bad line(s).  Wayne Hamilton has suggested an approach
that would be backward compatible with the existing versions of the uu
programs (thanks, Wayne!)

Correction will still require a repost/remail, but since the bad line(s)
will be identified, only they will require reposting.  Hopefully, this
will meet with the approval of the backbone sites, etc.  Of course this
won't help much on a really badly mutilated posting or file that's truncated
(the checksumming will be in a block at the end of the file, following the
current uu's "end" line).  Hopefully these cases will be in the minority.

Currently, I estimate the cost will be an increase in file size of about
5%, which seems reasonable to me, so I plan to go ahead with this.  I've
no idea of the cost of the additional processing required as yet.  Any
comments will be appreciated.

/kim


> BTW: Arc (uuencoded and probably split) will be one of my next postings...
                                                            ^^^^
Glad to see that you finally got your news s/w back up.  Have you actually
made any postings yet ... none have arrived here.



-- 
UUCP:  {sun,decwrl,hplabs,pyramid,ihnp4,seismo,oliveb,cbosgd}!amdahl!kim
DDD:   408-746-8462
USPS:  Amdahl Corp.  M/S 249,  1250 E. Arques Av,  Sunnyvale, CA 94086
CIS:   76535,25

[  Any thoughts or opinions which may or may not have been expressed  ]
[  herein are my own.  They are not necessarily those of my employer. ]