Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/5/84; site randvax.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!mtuxo!mtunh!mtung!mtunf!ariel!vax135!cornell!uw-beaver!tektronix!hplabs!sdcrdcf!randvax!jim
From: jim@randvax.UUCP (Jim Gillogly)
Newsgroups: net.crypt
Subject: Re: Encryption using compression
Message-ID: <2586@randvax.UUCP>
Date: Sun, 7-Jul-85 20:36:57 EDT
Article-I.D.: randvax.2586
Posted: Sun Jul  7 20:36:57 1985
Date-Received: Sat, 13-Jul-85 10:14:41 EDT
References: <5992@duke.UUCP>
Distribution: net
Organization: Banzai Institute
Lines: 72

In article <5992@duke.UUCP> bet@ecsvax.UUCP (Bennett E. Todd III) writes:
>                             Wouldn't a simpleminded substitution or
>transposition algorithm be beefed up to the point of requiring search
>of the key space by applying a good compression program to the
>plaintext first?

The algorithm would certainly be strengthened (although probably not to
the point of requiring search of the entire key space).  However, you need
to be a little careful.  Some compression programs (e.g. "pack", a
Huffman-coding program) will take a first pass through the data to assign
the variable-length codes, put out the decoding tree at the beginning of
the output file, then follow with the compressed data.  To attack a simple
subsititution of this (or perhaps even the result of running an XOR'ed
shift register (or other "random number" stream) across it), the structure
of the decoding tree could be observed from the program, and compared with
the result.

>                If I were to use a simple substitution cypher with an
>arbitrary premutation of bytes on the output of a good compression
>program how could the resulting file be attacked?

I'm not sure what you mean by an "arbitrary" permutation of the bytes.
If you mean a fixed permutation like the initial and final (bitwise)
permutations of the DES, there wouldn't be any additional security in
the permutation, since we assume it's known to the cryptanalyst.  If you
mean picking a permutation based on some key-controlled "random number"
stream, that will certainly add strength.  Let's take the simple sub first.

Simple substitution alone might be crackable given some assumptions about
the underlying text.  For example, if we assume English written in ASCII
we can try a number of decoding trees based on standard English, including
branches for situations where there are close choices.  I believe there are
a lot fewer likely decoding trees than possible keys.  Spaces, for example,
would have a short bit string and would happen frequently, so when we
start getting close on the high frequency letters and digraphs we'd start
getting reasonable-looking output.  There will be more choices for low-freq
letters and digraphs, but they will also show up in the text less often to
mess us up; and when they do, they're likely to have similar-length encoding
strings.

A transposition of the kind you describe would probably be attackable with
a chosen-plaintext attack, or maybe even known-plaintext.  The chosen-
plaintext approach is the nastiest possible attack for the cryptanalyst to
make, since it assumes that not only does he know the correspondence between
plaintext and ciphertext, but he can also control what plaintext is to
be enciphered.  This is sometimes the case in a database application, for
example:  I send an invoice to somebody who will be putting it into a
database, and I include in my address (perhaps) some information whose
encryption will tell me what I need to know.  In _The Codebreakers_ somewhere
David Kahn tells about a code system that was giving trouble ... the
cryptanalysts produced a memo that included some words that were in doubt,
leaked it to the target agents, and then read the encryption as it got
sent verbatim to home base.  All this is to point out that chosen-plaintext
is not out of the question as an attack.

In any case, if one is trying to figure out how a transposition works and
has the luxury of chosen-plaintext, one can put zeroes everywhere except
in one location, and see where it goes; put it everywhere possible and
you've unwound the transposition for that block.  If all blocks use the
same transposition, you're done.  If not, knowing how the transposition
is produced may give some insight into the random number stream, which may
be broken by as few numbers as have been used to produce this particular
transposition.

So compression alone won't be enough to turn a weak system into an
"unbreakable" (modulo exhaustive key search) one.  Note that the DES is
not known (by me, anyway) to be subject to any of these attacks (including
the most powerful chosen-plaintext attack).
-- 
	Jim Gillogly
	{decvax, vortex}!randvax!jim
	jim@rand-unix.arpa