Xref: utzoo comp.binaries.ibm.pc.d:142 comp.sources.d:2045
Path: utzoo!mnetor!uunet!husc6!cmcl2!nrl-cmf!mailrus!tut.cis.ohio-state.edu!bloom-beacon!mit-eddie!killer!csccat!loci
From: loci@csccat.UUCP (Chuck Brunow)
Newsgroups: comp.binaries.ibm.pc.d,comp.sources.d
Subject: Re: Standard for file transmission
Message-ID: <563@csccat.UUCP>
Date: 6 May 88 23:43:14 GMT
References: <299@cullsj.UUCP> <2096@epimass.EPI.COM>
Organization: Computer Support Corporation. Dallas,Texas
Lines: 52
Keywords: compression, archive, UUCP

In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
>In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes:
>>
>>   I stand corrected.  Since Lem-Ziv was DESIGNED for text compression, and
>>the authors do not mention its use for binaries, i never considered using it.
>>I tried it on an executable under UNIX and obtained a good reduction, for 
>>reasons which are not apparent.  I'm sure that there are cases where this does

	This is actually partially true. The first "compress" to appear
	on the net (several years ago) only worked on text files and
	dumped core on binary files. The reason you get good compressions
	on binary files is probably that they haven't been stripped of
	the relocation info. Strip them first and I doubt that the
	compression will be so good (otherwise, throw your optimizer
	into the bit bucket). Typical (large) text compression is about
	67%, whereas binaries are closer to 20%. (I use 16-bit compress).

>
>A Unix file is just a stream of bytes, and so is an MS-DOS file
>except that it has extra attributes as well.  Compress replaces byte
>strings with codes whose lengths are between 9 and 16 bits.  It will
>work well on any file in which some byte sequences are more common
>than others.  An executable file consists of instructions, which, for
>almost all processors are integral numbers of bytes, and some are
>much more common than others.  So compress works fine, and will give
>good compression for just about any executable file.  There are

	This is doubtful. There's a good description of the workings
	of LZW in the GIF docs (recently posted). Bytes aren't the
	key feature here, but rather sequences of repeated bytes
	which should be rare in an optimized executable (on Unix
	at least).

>several types of graphics files: bitmaps are HIGHLY compressible;

	If they have lots of blank space, or other repeated sequences.
	Otherwise, they can be very similar to executables: 10-20%.

>other types of files act like a program for an imaginary computer and
>consist of byte codes, some much more common than others.  These
>compress well also.

	You must mean Huffman coding. These comments are true in that
	case, not LZW.
>
>There are only three types of files I've ever given to compress that
>haven't been reduced in size as a result: random binary data,
>floating point binary data, and files that have already been
>compressed.
>
	The point being that there is little redundancy.
>--