Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!dnwiebe
From: dnwiebe@cis.ohio-state.edu (Dan N Wiebe)
Newsgroups: comp.dsp
Subject: Re: sound data compression
Message-ID: <62572@tut.cis.ohio-state.edu>
Date: 25 Sep 89 19:21:32 GMT
Sender: dnwiebe@tut.cis.ohio-state.edu
Lines: 38


	The technique you speak of is properly called "Differential Pulse
Code Modulation" (at least it is in Tanenbaum's networking book).  The problem
is not compounded error, at least not with audio (a signal that varies
between -3V and 3V will sound exactly the same if you give it a positive bias
of one volt so that it varies between -2V and 4V, provided no component limits
are exceeded).  The problem is that when you are dealing either with sharp
spikes or high frequencies, the compression can't keep up with it.  As a
simple example, consider the case of a 16-bit environment where you have an
instantaneous square transition from 0 to 65535.  If you're sending, say,
eight-bit deltas, for a 50% compression ratio, you can only go up by 256 at
a time, and it'll take you 256 sample times to get clear to 65535, rather than
just one if you use 16-bit absolute values.  This is admittedly an extreme
example, and real-world examples of distortion based on this are much less
severe; enough less, apparently, that it's considered a viable compression
method...
	It seems to me that a viable alternative might be to have a protocol
where you used a seven-bit delta, with a high bit of zero, in most cases, and
if you needed an absolute number, you could send two bytes, where the high
bit of the first one was 1 and the rest of the bits were the absolute value
shifted right one bit.  Of course, this means an irregular data-transfer
rate, which means buffering, and some intermediate sample processing (which,
though, is simple enough to be done quickly by hardware), and maybe that's
too much trouble to go to for a mere 50.5% (or whatever) compression ratio;
I don't know.  Depending on the sample width, the specifics could be changed
to increase the compression ratio.
	As a matter of fact, this could be viewed as a form of adaptive
compression, and possibly could be extended to more than two levels (say, a
two-bit delta, a six-bit delta, a ten-bit delta, or a twelve-bit delta,
whichever was the smallest you could get away with based on the highest
frequency component--that would give you a 16-bit sample space with
four-bit, eight-bit, twelve-bit, or sixteen-bit samples, where most would
be four- and eight-bit and only a few would be sixteen-bit, resulting
possibly in an average compression ratio of between 50% and 75%--I don't know
enough to do the statistics).  The actual average compression ratio could be
easily worked out if I had access to a 'typical' piece of digitized sound...
	I'm probably reinventing the wheel here; has anybody heard any of this
before?