Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site decwrl.UUCP Path: utzoo!linus!philabs!cmcl2!seismo!harvard!talcott!panda!genrad!decvax!ucbvax!decwrl!dec-rhea!dec-bergil!lauck From: lauck@bergil.DEC Newsgroups: net.audio Subject: Correct Double Blind Testing Message-ID: <3521@decwrl.UUCP> Date: Thu, 8-Aug-85 14:56:26 EDT Article-I.D.: decwrl.3521 Posted: Thu Aug 8 14:56:26 1985 Date-Received: Mon, 12-Aug-85 03:09:11 EDT Sender: daemon@decwrl.UUCP Organization: Digital Equipment Corporation Lines: 33 <> When performing double blind tests of audio components one critical factor is often overlooked: the SAME MUSIC must be played when testing. The common practice of synchonizing the sources and matching levels is not good enough to evaluate subtle differences. Consider switching between two notes of a piece. The two notes may be played on different instruments. The two notes may be on the same instrument but different pitches. Even when the pitch is the same the attack, amplitude, etc. may be different. What good then is all the fancy level matching to .05db? A while back I compared two CD players with one of these synchronized listening tests. It was very frustrating. I kept trying to tell whether the Sony reproduced the violins better than the Nak reproduced the violas. The result was predictable, no statistical significance. I had previously compared the players by repeated playing of the same musical selections on each (not blind, BUT level matched to .05db). In these tests my wife and I both prefered the Nak. (I guess we're audio snobs.) A proper scientific test would have involved double blind playing of identical material. With the equipment and program material available, this would have meant hours and hours of testing. Does anyone have any opinions, or better scientific evidence, on the choice of program material to maximize success (discrimination) of double-blind testing? For example, I'd like to know what is the optimum length of test selections. Short selections have the obvious advantage that bigger statistical samples are practical. Can they be too short to perceive holistic effects, like the soundstaging of complex orchestral material? Tony Lauck ...decvax!decwrl!rhea!bergil!lauck