Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!amdahl!kim
From: kim@amdahl.UUCP (Kim DeVaughn)
Newsgroups: comp.sys.amiga
Subject: Re: Flight Simulator II (and uuencode mess-ups) [long]
Message-ID: <5017@amdahl.UUCP>
Date: Wed, 7-Jan-87 22:07:11 EST
Article-I.D.: amdahl.5017
Posted: Wed Jan  7 22:07:11 1987
Date-Received: Thu, 8-Jan-87 02:36:35 EST
References: <2280@well.UUCP> <340@oliveb.UUCP> <8232@topaz.RUTGERS.EDU>
Organization: Amdahl Corporation,  Sunnyvale, CA 94086
Lines: 115

In article <8232@topaz.RUTGERS.EDU>, lachac@topaz.RUTGERS.EDU (Gerard Lachac) writes:
> In article <340@oliveb.UUCP> ses@oliveb.UUCP (Dean Brunette) writes:
> >About the recent Juggler demo postings arriving at most sites corrupted:
> >Can't the files be ARCed, and since ARC has a checksum utility, it would point
> >out corrupted files on their creation?
> 
> >file -> ARC -> uuencode -> mail
> 
> 	Here Here!!  I vote for this!
> 
> 	Definately shortens the posting, for example, I recently ARCed a
> backup copy of Matt's new shell he posted last week for backup purposes.
> 	
> 		Executable  	~34k
> 		Docs	    	~20k
> 	
> 		Arced file of 	
> 		both		~36k

Yes.  ARC would save about 18K in this case, plus the two postings
required for the above two files (after uuencoding the executable)
could have been reduced to one posting, and still been under the 64K
"limit".

I'm not sure it would have helped the Juggler's movie.data files very
much though ... on the entire movie.data file in binary form, all
295610 bytes of it, "compress" (v4.0) only reduced it by 0.34%, down to
294605 bytes.  Wheeee ... we saved a whole 1005 bytes!  (This was using
16-bit compression on an Amdahl 5860).  Strangely enough, pack(1)
managed to squish it down to 244062 bytes, or a 17.4% reduction.  This
is one of the few times that I've seen "pack" beat-out "compress".

How well ARC would do on movie.data would depend on which algorithm
it picked as "best" (should it pick the compress style L-Z algorithm,
it would likely be using 12-bit compression, and the results would be
worse than above).  I'm assuming here that the arcing is done on
an Amiga, as I have yet to see a version of ARC that *always* works
correctly on *all* UNIX(R) machines.

There is another problem (for the movie.data file) as well ... it is
BIG.  I'm not sure you *can* arc it on a 512K machine.  Anyone want
to give it a try (I'll see what ARC does with it on a 1.5 Meg machine).
I suppose one could split the binary movie.data file into several
pieces, and arc each one, but I wouldn't recommend going to such lengths
just to save 10% or so ... much easier to split the uuencoded file,
as Andy did.

With more conventional (and smaller) binary files, I do recommend using
ARC ... the only problem is, is that not everyone has it (but if you
don't, you should ... it's on Fish Disk #40, and probably alot of Amiga
BBS's as well).

As for the checksumming that ARC does ... unless you can do your arcing
and dearcing on your host system (good luck!), you'll still have to
download the file(s) to the Amiga before you'll be able to find out if the
file is corrupted or not.

The problem (as far as errors that occur during net propagation, anyway)
is with the uu-twins ... uuencode/uudecode.  Binary data and executables
*must* be encoded in some way that will keep the various mailers and news
programs happy.  Because they are widely available, and very simple to
implement/port, the uu's are what are usually used.  But the uu's do
NOTHING about error checking.

I've been considering adding a rudimentary form of error checking to the
uu programs ... something like what is done with the Intel Hex or Motorola
S-rec formats that are used to download binary files to PROM burners.
Simply stated, these formats add a checksum to each line of the encoded
file.  This would at least tell you if the file was corrupted at uudecode
time (usually).

I haven't done anything yet, because I'm not sure this is "enough".  Given
that binaries will continue to be posted to the net, knowing that one has
received a corrupted file will usually result in a request for a reposting,
or an email copy.  This would increase the volume of traffic on the net,
and we all know how well received that would be!

Seems to me that it would be better to have some error CORRECTION
capability, than merely error DETECTION.  Of course the better the detection
and/or correction capability, the more bits are required (and the bigger
the file).

How much is enough?  Is the most common problem a single-bit hit, or is
it a "run" of characters?  Or is it (shudder) a truncated file (which
nothing short of a reposting is going to cure)?  If it's the latter, then
maybe simple error detection *is* the most cost-effective improvement.
Does anybody know the frequency of the various failure modes in this type
of network?  Anybody have any data?

Then of course there is the logistic problem of getting *any* kind of
improved encode/decode program distributed to everybody, and *then* to
get everybody using it ... "people resist changes", someone once said :-)!

I am willing to do some work on such an improvement, but I'd really
like to see some discussion before I run out and invent Yet Another
Protocol.  My gut feel is that a simple checksum on a per-line basis,
and an end-of-file indicator (checksummed, of course) is probably the
best compromise, as my experience has been that *most* of the time,
a file either makes it OK, has a "hunk" of a line missing, or gets
truncated.  But what do you think?

/kim

P.S.  A last comment on the Juggler files ... had they been arc'd before
      being uuencoded, I doubt that I would have been able to get him
      (her ?) juggling again.

-- 
UUCP:  {sun,decwrl,hplabs,pyramid,ihnp4,seismo,oliveb,cbosgd}!amdahl!kim
DDD:   408-746-8462
USPS:  Amdahl Corp.  M/S 249,  1250 E. Arques Av,  Sunnyvale, CA 94086
CIS:   76535,25

[  Any thoughts or opinions which may or may not have been expressed  ]
[  herein are my own.  They are not necessarily those of my employer. ]