Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site decwrl.UUCP
Path: utzoo!linus!philabs!cmcl2!seismo!harvard!talcott!panda!genrad!decvax!ucbvax!decwrl!dec-rhea!dec-bergil!lauck
From: lauck@bergil.DEC
Newsgroups: net.audio
Subject: Correct Double Blind Testing
Message-ID: <3521@decwrl.UUCP>
Date: Thu, 8-Aug-85 14:56:26 EDT
Article-I.D.: decwrl.3521
Posted: Thu Aug  8 14:56:26 1985
Date-Received: Mon, 12-Aug-85 03:09:11 EDT
Sender: daemon@decwrl.UUCP
Organization: Digital Equipment Corporation
Lines: 33


<>
When performing double blind tests of audio components one critical factor 
is often overlooked:  the SAME MUSIC must be played when testing.  The
common practice of synchonizing the sources and matching levels is not good 
enough to evaluate subtle differences.  Consider switching between two notes
of a piece.  The two notes may be played on different instruments.  The two
notes may be on the same instrument but different pitches.  Even when the 
pitch is the same the attack, amplitude, etc. may be different.  What good 
then is all the fancy level matching to .05db?

A while back I compared two CD players with one of these synchronized 
listening tests.  It was very frustrating.  I kept trying to tell whether 
the Sony reproduced the violins better than the Nak reproduced the violas.  
The result was predictable, no statistical significance.  I had previously 
compared the players by repeated playing of the same musical selections on 
each (not blind, BUT level matched to .05db).  In these tests my wife and I 
both prefered the Nak.  (I guess we're audio snobs.)

A proper scientific test would have involved double blind playing of 
identical material.  With the equipment and program material available, this
would have meant hours and hours of testing.  

Does anyone have any opinions, or better scientific evidence, on the choice 
of program material to maximize success (discrimination) of double-blind 
testing?  For example, I'd like to know what is the optimum length of test 
selections.  Short selections have the obvious advantage that bigger 
statistical samples are practical.  Can they be too short to perceive holistic 
effects, like the soundstaging of complex orchestral material?


                  Tony Lauck
                       ...decvax!decwrl!rhea!bergil!lauck