Megalextoria
Retro computing and gaming, sci-fi books, tv and movies and other geeky stuff.

Home » Digital Archaeology » Computer Arcana » Apple » Apple II » compression
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
compression [message #373801] Thu, 20 September 2018 15:52 Go to next message
Anonymous
Karma:
Originally posted by: Anthony Adverse

Does anyone know historically, how we ended up with NuFX, SHK, LZW based compression rather than Huffman style compression which is so prevalent on the PeeCee


A
Re: compression [message #373807 is a reply to message #373801] Thu, 20 September 2018 19:04 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: John Brooks

Huffman is bit-based and slow on the 6502 & 65816 compared to byte-based LZ (dictionary) compression algorithms.

Modern LZ4 is a hybrid codec which is byte-based, but packs two 4-bit nibbles per byte to improve efficiency. However LZ4 is still a dictionary codec and not an entropy codec like huffman, arithmetic, or ANS.

-JB
@JBrooksBSI
Re: compression [message #373879 is a reply to message #373807] Sat, 22 September 2018 01:15 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: fadden

On Thursday, September 20, 2018 at 4:04:58 PM UTC-7, John Brooks wrote:
> Huffman is bit-based and slow on the 6502 & 65816 compared to byte-based LZ (dictionary) compression algorithms.

The LZW used in ShrinkIt uses a variable-width code (9-12 bits), so it's doing its share of shifting. The LZSS codec in HardPressed was slower when compressing than ShrinkIt because of the dictionary string matching, but much faster to decompress because it's byte-oriented. The lack of a barrel shift was a constant source of annoyance for me. :-)

The predecessors to ShrinkIt, notably BLU (Binary II Library Utility) and DDD, used a Huffman scheme. ShrinkIt's LZW compressed things more tightly, which was a significant advantage with the modem speeds of the day (9600bps? 14.4K?)

Perhaps relevant: https://fadden.com/apple2/hdc/index.html
Re: compression [message #373885 is a reply to message #373879] Sat, 22 September 2018 03:24 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Anthony Adverse

I always found it a little odd... Huffman appears to manage higher compression than LZW although as always it depends on the size and type of file being compressed. But for interests sake back in the dark ages, I tried feeding .shk files into ARJ on the pc. It could still manage to compress them significantly. Which always made me think, that the LZW in use was fairly inefficient. Maybe its shrinkit rather than just LZW per se... I'm not sure.
Re: compression [message #373886 is a reply to message #373885] Sat, 22 September 2018 04:23 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Tom Porter

Are there any binaries that are 'ready' made and compiled... all I have been finding is source code... the only real binaries I have found are hard-coded hires images compressors and decompressors... they are unsuitable for things like Double-Lores or general data.
Re: compression [message #373889 is a reply to message #373879] Sat, 22 September 2018 06:17 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: John Brooks

On Friday, September 21, 2018 at 10:15:05 PM UTC-7, fadden wrote:
> On Thursday, September 20, 2018 at 4:04:58 PM UTC-7, John Brooks wrote:
>> Huffman is bit-based and slow on the 6502 & 65816 compared to byte-based LZ (dictionary) compression algorithms.
>
> The LZW used in ShrinkIt uses a variable-width code (9-12 bits), so it's doing its share of shifting. The LZSS codec in HardPressed was slower when compressing than ShrinkIt because of the dictionary string matching, but much faster to decompress because it's byte-oriented. The lack of a barrel shift was a constant source of annoyance for me. :-)
>
> The predecessors to ShrinkIt, notably BLU (Binary II Library Utility) and DDD, used a Huffman scheme. ShrinkIt's LZW compressed things more tightly, which was a significant advantage with the modem speeds of the day (9600bps? 14.4K?)
>
> Perhaps relevant: https://fadden.com/apple2/hdc/index.html

Nice writeup of all those compression formats Andy.

Here are a few other important codecs:

aPLib - a dictionary codec with bit-shifting. Used in qkumba's single-file games.
https://groups.google.com/forum/#!searchin/comp.sys.apple2/a plib%7Csort:date/comp.sys.apple2/h8GcbRm1bic/M5d9NsCW9FkJ
http://pferrie.host22.com/misc/appleii.htm

LZ4 - faster than aPLib but less compression.
https://www.brutaldeluxe.fr/products/crossdevtools/lz4/index .html
https://github.com/lz4/lz4
http://pferrie.host22.com/misc/appleii.htm

ANS/FSE - byte-based entropy coder which largely replaces huffman & arithmetic coding. Used in zstd & Apple's lzfse,
https://github.com/facebook/zstd
https://github.com/lzfse/lzfse

Other good codecs which are not that practical for AppleII platforms:
Deflate (zip)
LZMA
Brotli
PAQ

-JB
@JBrooksBSI
Re: compression [message #373910 is a reply to message #373885] Sat, 22 September 2018 14:22 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: fadden

On Saturday, September 22, 2018 at 12:24:35 AM UTC-7, Anthony Adverse wrote:
> I always found it a little odd... Huffman appears to manage higher compression than LZW although as always it depends on the size and type of file being compressed. But for interests sake back in the dark ages, I tried feeding .shk files into ARJ on the pc. It could still manage to compress them significantly. Which always made me think, that the LZW in use was fairly inefficient. Maybe its shrinkit rather than just LZW per se... I'm not sure..


It depends on what you're compressing. Adaptive Huffman does extremely well on base64-encoded files, but string-matching algorithms will beat it easily on English text. For Apple II disk images -- the thing ShrinkIt was originally designed to operate on -- LZW wins consistently. (CiderPress can create DDD images if you want to experiment.)

ShrinkIt has a bunch of stuff in the headers, and 12-bit LZW restarts its dictionary fairly often. It doesn't really get near the entropy of the source. I tried an experiment with an old spam mailbox file just for fun:

Original: 49012729
SHK: 37585096
SHK+ARJ: 32558597
ARJ: 25596612
bzip2: 23475441
bzip2+ARJ: nope

ncompress, which uses 16-bit LZW, only reduced the file to 37280057. ARJ took that down to 31047576. So I'd blame LZW.

(LZ78 algorithms like LZW will theoretically reach the entropy of the input once you've fed enough input, assuming you have infinite memory and a great deal of patience. LZ77 algorithms like LZH don't reach the same theoretic limit, but get close a whole lot faster.)
Re: compression [message #373911 is a reply to message #373910] Sat, 22 September 2018 14:50 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: John Brooks

On Saturday, September 22, 2018 at 11:22:40 AM UTC-7, fadden wrote:
> On Saturday, September 22, 2018 at 12:24:35 AM UTC-7, Anthony Adverse wrote:
>> I always found it a little odd... Huffman appears to manage higher compression than LZW although as always it depends on the size and type of file being compressed. But for interests sake back in the dark ages, I tried feeding .shk files into ARJ on the pc. It could still manage to compress them significantly. Which always made me think, that the LZW in use was fairly inefficient. Maybe its shrinkit rather than just LZW per se... I'm not sure.
>
>
> It depends on what you're compressing. Adaptive Huffman does extremely well on base64-encoded files, but string-matching algorithms will beat it easily on English text. For Apple II disk images -- the thing ShrinkIt was originally designed to operate on -- LZW wins consistently. (CiderPress can create DDD images if you want to experiment.)
>
> ShrinkIt has a bunch of stuff in the headers, and 12-bit LZW restarts its dictionary fairly often. It doesn't really get near the entropy of the source. I tried an experiment with an old spam mailbox file just for fun:
>
> Original: 49012729
> SHK: 37585096
> SHK+ARJ: 32558597
> ARJ: 25596612
> bzip2: 23475441
> bzip2+ARJ: nope
>
> ncompress, which uses 16-bit LZW, only reduced the file to 37280057. ARJ took that down to 31047576. So I'd blame LZW.
>
> (LZ78 algorithms like LZW will theoretically reach the entropy of the input once you've fed enough input, assuming you have infinite memory and a great deal of patience. LZ77 algorithms like LZH don't reach the same theoretic limit, but get close a whole lot faster.)


Good info/comparison of leading codecs:

Pareto frontier:
http://richg42.blogspot.com/2015/11/the-lossless-decompressi on-pareto.html

Hutter prize:
http://mattmahoney.net/dc/text.html

-JB
@JBrooksBSI
Re: compression [message #373934 is a reply to message #373885] Sun, 23 September 2018 00:58 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: kegs

In article <fedec7f2-739b-4454-b149-88ba689df162@googlegroups.com>,
Anthony Adverse <the.ertceps@gmail.com> wrote:
> I always found it a little odd... Huffman appears to manage higher
> compression than LZW although as always it depends on the size and type
> of file being compressed. But for interests sake back in the dark ages,
> I tried feeding .shk files into ARJ on the pc. It could still manage to
> compress them significantly. Which always made me think, that the LZW in
> use was fairly inefficient. Maybe its shrinkit rather than just LZW per
> se... I'm not sure.
>

If ARJ uses Huffman, it's as a minor additional step. I believe ARJ is
in the LZ77 family. This is much more advanced than a practical Apple II
compression program could use. ARJ is not open source, and reverse-
engineering these things is very tedious, so I'm not exactly sure.

The Apple II can only practically run algorithms where the program plus
the data structures it uses is less than 32KB. That's just not that much
memory, PCs (and the IIgs) can run much more complex algorithms. I think
12-bit LZW is a pretty good fit.

The reason LZW compressed data can be compressed even further comes down to the
patterns in the upper bits. The first 256 symbols are encoded as 9-bits, but
most of those extra bits are 0, (assuming not much compression at first).
Looking at the output as bytes, 9 output bytes will contain exactly 8 symbols,
so there's a pattern to be extracted at the byte level. Even if LZW is
compressing the data in an OK way, there's patterns in these upper bits (and
lower bits) of the symbols. The next 512 output symbols are 10-bits, and
again, most of those upper bits are not very "random" looking, and now the
pattern repeats every 5 bytes. So an algorithm taking more time and memory
can find this redundant info and squeeze a little more out of it.
My guess is better compression algorithms can make up 10-20% of the difference
when operating on LZW data. So, if LZW compressed 100KB to 50KB, and gzip/etc.
can compress it to 25KB directly, then gzip on LZW should give 10-20% of
25KB of saving off the 50KB, which is a final size of around 40-45KB.

The compression algorithm in ShrinkIt was designed to meet the following goals:

1. Be able to compress the cracked version of Hitchhiker's Guide to the Galaxy
so the compressed file could fit on a regular ProDOS 5.25" disk.
2. Be fast enough that on compressing, the disk drive would stay on
(so it had to compress at more than 4K bytes/second).
3. Run on any Apple II.

For #2, it didn't meet the goal on a normal Apple II, but does on a IIgs
on Fast (2.8MHz) mode, running the 6502 code.

Straight 12-bit LZW didn't achieve #1, so RLE was used as a pre-pass to get
a little extra. Let's look at a 4KB track which is all 0's. LZW's maximum
compression is roughly n symbols to encode n*(n+1)/2 bytes, so n=90 symbols for
4KB. At 9 bits for each symbol, thats 101 bytes. RLE can compress 4096 bytes
to 16*3=48 bytes. And then LZW on top of RLE gets a little more, about 18
symbols of 9 bits each = 21 bytes total. And, some tracks of HGTTG didn't
compress at all, since that data was already commpressed and so not a good fit
for LZW, and actually got longer after LZW. So the algorithm just passes that
data through raw and uncompressed. In the end, HGTTG compressed fits on a
normal ProDOS 5.25" disk, but it needed all of these tricks to get there.

The previous common compressor used for disks was Dalton's Disk Disintegrator,
which used a clever Huffman trick. This is off the top of my head, and it's
30 years old, so the details are likely slightly wrong. DDD had a fixed
Huffman tree for 32 bytes, (or maybe 16), such that it could use 2-8 bits to
encode those. Then all other bytes would take 9 bits each. It had to figure
out which were the most common bytes on a track, and then use its fixed-tree
to encode those to be smaller, and all other bytes get a little longer.
Basically, the most common byte would get the 2-bit encoding, the next most
common byte would get a 3-bit encoding, etc. Huffman's big problem is in
sending the tree to the decompressor, and DDD got it down to just these 32
bytes. But it's compression wasn't that great.

Kent
Re: compression [message #373953 is a reply to message #373934] Sun, 23 September 2018 11:49 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: fadden

On Saturday, September 22, 2018 at 9:58:45 PM UTC-7, Kent Dickey wrote:
[...]

Kent!

(For those who may have forgotten, Kent Dickey developed the LZW compression code used in ShrinkIt. He also provided the C implementation used in NuLib.)


> The previous common compressor used for disks was Dalton's Disk Disintegrator,
> which used a clever Huffman trick. [...] DDD had a fixed
> Huffman tree for 32 bytes, (or maybe 16), such that it could use 2-8 bits to
> encode those.

Oddly enough, it's 20 bytes. There's a fairly literal translation of the assembly code here:

https://github.com/fadden/ciderpress/blob/master/diskimg/DDD .cpp#L399

> Huffman's big problem is in
> sending the tree to the decompressor, and DDD got it down to just these 32
> bytes. But it's compression wasn't that great.

IIRC, gzip uses a clever scheme where they output the characters in order, followed by the width of the tree at each level. The alternative is adaptive Huffman, but the overhead of maintaining the model slows everything down.

DDD Deluxe did a slightly better job of compressing things, but I never did dig into that one.
Re: compression [message #373965 is a reply to message #373953] Sun, 23 September 2018 15:28 Go to previous messageGo to next message
Michael J. Mahon is currently offline  Michael J. Mahon
Messages: 1767
Registered: October 2012
Karma: 0
Senior Member
fadden <thefadden@gmail.com> wrote:
> On Saturday, September 22, 2018 at 9:58:45 PM UTC-7, Kent Dickey wrote:
> [...]
>
> Kent!
>
> (For those who may have forgotten, Kent Dickey developed the LZW
> compression code used in ShrinkIt. He also provided the C implementation used in NuLib.)

Not to mention KEGS (oops, I mentioned it ;-).

--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com
Re: compression [message #373970 is a reply to message #373965] Sun, 23 September 2018 19:24 Go to previous messageGo to next message
David Schmidt is currently offline  David Schmidt
Messages: 993
Registered: October 2012
Karma: 0
Senior Member
On 9/23/2018 3:28 PM, Michael J. Mahon wrote:
> fadden <thefadden@gmail.com> wrote:
>> On Saturday, September 22, 2018 at 9:58:45 PM UTC-7, Kent Dickey wrote:
>> [...]
>>
>> Kent!
>>
>> (For those who may have forgotten, Kent Dickey developed the LZW
>> compression code used in ShrinkIt. He also provided the C implementation used in NuLib.)
>
> Not to mention KEGS (oops, I mentioned it ;-).

Which forms the foundation and just about all of the guts for pretty
much every GS emulator out there besides Bernie. Thanks again, Kent.
Re: compression [message #373972 is a reply to message #373970] Sun, 23 September 2018 22:26 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: kegs

In article <po97ae$9e2$1@dont-email.me>,
David Schmidt <schmidtd@my-deja.com> wrote:
> On 9/23/2018 3:28 PM, Michael J. Mahon wrote:
>> fadden <thefadden@gmail.com> wrote:
>>> On Saturday, September 22, 2018 at 9:58:45 PM UTC-7, Kent Dickey wrote:
>>> [...]
>>>
>>> Kent!
>>>
>>> (For those who may have forgotten, Kent Dickey developed the LZW
>>> compression code used in ShrinkIt. He also provided the C
> implementation used in NuLib.)
>>
>> Not to mention KEGS (oops, I mentioned it ;-).
>
> Which forms the foundation and just about all of the guts for pretty
> much every GS emulator out there besides Bernie. Thanks again, Kent.

I'm looking into fixing up KEGS so it builds on Mac OS X again (that is, not a
Carbon app any more). I work pretty slowly, so don't expect anything soon.
My first step is delete a lot of useless code--there's a lot in KEGS to make
it fast on 1995 era machines which is all pointless now, and it complicates
my new simpler interface between *driver.c and video.c. And Linux has gone
and removed /dev/dsp, so I have to figure that out, too.

Kent
Re: compression [message #373981 is a reply to message #373972] Mon, 24 September 2018 09:46 Go to previous messageGo to next message
David Schmenk is currently offline  David Schmenk
Messages: 374
Registered: December 2012
Karma: 0
Senior Member
On Sunday, 23 September 2018 19:29:36 UTC-7, Kent Dickey wrote:
>
> I'm looking into fixing up KEGS so it builds on Mac OS X again (that is, not a
> Carbon app any more). I work pretty slowly, so don't expect anything soon.
> My first step is delete a lot of useless code--there's a lot in KEGS to make
> it fast on 1995 era machines which is all pointless now, and it complicates
> my new simpler interface between *driver.c and video.c. And Linux has gone
> and removed /dev/dsp, so I have to figure that out, too.
>
> Kent

Don't worry, whatever you figure out will get changed in 6 months anyway.

Thanks for all the work you've done for our community!

Dave...
Re: compression [message #374014 is a reply to message #373934] Mon, 24 September 2018 20:50 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: kegs

In article <T5mdnaK_L8TigDrGnZ2dnUU7-I3NnZ2d@giganews.com>,
Kent Dickey <kegs@provalid.com> wrote:
> The compression algorithm in ShrinkIt was designed to meet the following goals:
>
> 1. Be able to compress the cracked version of Hitchhiker's Guide to the Galaxy
> so the compressed file could fit on a regular ProDOS 5.25" disk.

I know this totally doesn't matter, but it's bugging me that I said the
wrong game--the game was Bureaucracy.

Kent
Re: compression [message #374015 is a reply to message #374014] Mon, 24 September 2018 21:17 Go to previous message
Steve Nickolas is currently offline  Steve Nickolas
Messages: 2036
Registered: October 2012
Karma: 0
Senior Member
On Mon, 24 Sep 2018, Kent Dickey wrote:

> In article <T5mdnaK_L8TigDrGnZ2dnUU7-I3NnZ2d@giganews.com>,
> Kent Dickey <kegs@provalid.com> wrote:
>> The compression algorithm in ShrinkIt was designed to meet the following goals:
>>
>> 1. Be able to compress the cracked version of Hitchhiker's Guide to the Galaxy
>> so the compressed file could fit on a regular ProDOS 5.25" disk.
>
> I know this totally doesn't matter, but it's bugging me that I said the
> wrong game--the game was Bureaucracy.
>
> Kent

Right author, wrong game, wrong version of the Z-machine... xD

-uso.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: wondering if anyone might have an Option Key for an Apple IIc+ keyboard?
Next Topic: C/PM Apple II Boot Disk on proFile Hard Drive?
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Mar 29 03:41:38 EDT 2024

Total time taken to generate the page: 0.00348 seconds