Xref: utzoo comp.binaries.ibm.pc.d:142 comp.sources.d:2045 Path: utzoo!mnetor!uunet!husc6!cmcl2!nrl-cmf!mailrus!tut.cis.ohio-state.edu!bloom-beacon!mit-eddie!killer!csccat!loci From: loci@csccat.UUCP (Chuck Brunow) Newsgroups: comp.binaries.ibm.pc.d,comp.sources.d Subject: Re: Standard for file transmission Message-ID: <563@csccat.UUCP> Date: 6 May 88 23:43:14 GMT References: <299@cullsj.UUCP> <2096@epimass.EPI.COM> Organization: Computer Support Corporation. Dallas,Texas Lines: 52 Keywords: compression, archive, UUCP In article <2096@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes: >In article <299@cullsj.UUCP> jeff@cullsj.UUCP (Jeffrey C. Fried) writes: >> >> I stand corrected. Since Lem-Ziv was DESIGNED for text compression, and >>the authors do not mention its use for binaries, i never considered using it. >>I tried it on an executable under UNIX and obtained a good reduction, for >>reasons which are not apparent. I'm sure that there are cases where this does This is actually partially true. The first "compress" to appear on the net (several years ago) only worked on text files and dumped core on binary files. The reason you get good compressions on binary files is probably that they haven't been stripped of the relocation info. Strip them first and I doubt that the compression will be so good (otherwise, throw your optimizer into the bit bucket). Typical (large) text compression is about 67%, whereas binaries are closer to 20%. (I use 16-bit compress). > >A Unix file is just a stream of bytes, and so is an MS-DOS file >except that it has extra attributes as well. Compress replaces byte >strings with codes whose lengths are between 9 and 16 bits. It will >work well on any file in which some byte sequences are more common >than others. An executable file consists of instructions, which, for >almost all processors are integral numbers of bytes, and some are >much more common than others. So compress works fine, and will give >good compression for just about any executable file. There are This is doubtful. There's a good description of the workings of LZW in the GIF docs (recently posted). Bytes aren't the key feature here, but rather sequences of repeated bytes which should be rare in an optimized executable (on Unix at least). >several types of graphics files: bitmaps are HIGHLY compressible; If they have lots of blank space, or other repeated sequences. Otherwise, they can be very similar to executables: 10-20%. >other types of files act like a program for an imaginary computer and >consist of byte codes, some much more common than others. These >compress well also. You must mean Huffman coding. These comments are true in that case, not LZW. > >There are only three types of files I've ever given to compress that >haven't been reduced in size as a result: random binary data, >floating point binary data, and files that have already been >compressed. > The point being that there is little redundancy. >--