Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!dnwiebe From: dnwiebe@cis.ohio-state.edu (Dan N Wiebe) Newsgroups: comp.dsp Subject: Re: sound data compression Message-ID: <62572@tut.cis.ohio-state.edu> Date: 25 Sep 89 19:21:32 GMT Sender: dnwiebe@tut.cis.ohio-state.edu Lines: 38 The technique you speak of is properly called "Differential Pulse Code Modulation" (at least it is in Tanenbaum's networking book). The problem is not compounded error, at least not with audio (a signal that varies between -3V and 3V will sound exactly the same if you give it a positive bias of one volt so that it varies between -2V and 4V, provided no component limits are exceeded). The problem is that when you are dealing either with sharp spikes or high frequencies, the compression can't keep up with it. As a simple example, consider the case of a 16-bit environment where you have an instantaneous square transition from 0 to 65535. If you're sending, say, eight-bit deltas, for a 50% compression ratio, you can only go up by 256 at a time, and it'll take you 256 sample times to get clear to 65535, rather than just one if you use 16-bit absolute values. This is admittedly an extreme example, and real-world examples of distortion based on this are much less severe; enough less, apparently, that it's considered a viable compression method... It seems to me that a viable alternative might be to have a protocol where you used a seven-bit delta, with a high bit of zero, in most cases, and if you needed an absolute number, you could send two bytes, where the high bit of the first one was 1 and the rest of the bits were the absolute value shifted right one bit. Of course, this means an irregular data-transfer rate, which means buffering, and some intermediate sample processing (which, though, is simple enough to be done quickly by hardware), and maybe that's too much trouble to go to for a mere 50.5% (or whatever) compression ratio; I don't know. Depending on the sample width, the specifics could be changed to increase the compression ratio. As a matter of fact, this could be viewed as a form of adaptive compression, and possibly could be extended to more than two levels (say, a two-bit delta, a six-bit delta, a ten-bit delta, or a twelve-bit delta, whichever was the smallest you could get away with based on the highest frequency component--that would give you a 16-bit sample space with four-bit, eight-bit, twelve-bit, or sixteen-bit samples, where most would be four- and eight-bit and only a few would be sixteen-bit, resulting possibly in an average compression ratio of between 50% and 75%--I don't know enough to do the statistics). The actual average compression ratio could be easily worked out if I had access to a 'typical' piece of digitized sound... I'm probably reinventing the wheel here; has anybody heard any of this before?