From: utzoo!decvax!ucbvax!C70:editor-people Newsgroups: fa.editor-p Title: Re: Voice driven editing. Article-I.D.: ucb.1267 Posted: Fri Jun 4 00:33:36 1982 Received: Sat Jun 5 01:11:36 1982 >From gaines@RAND-UNIX Fri Jun 4 00:30:54 1982 Henry, I've been waiting to see if you would get any response to your request for information about voice-driven editing. So far, I've seen no replies, but if you have any that weren't circulated to the list, please forward them to me. I have been interested in the subject for some time, but have done nothing other than think some about the problems. I have not heard of much that is happening, either. My impression is that the speech recognition people have not yet realized that this is a prime application area for them, and are not sensitive to its advantages. There are important elements present in voice-driven editing that are not present in other speech recognition situations. The user has a second input device (the keyboard) available, and it is a feedback situation. The user can correct the errors of the speech recognizer. There are two important consequences which make this a good task for the study of speech recognition. One is that a continuous learning approach can be taken to speech recognition, since feedback on errors will always be available. Most speech recognition situations provide only an initial learning period. The second is that the task can be divided between the keyboard and the speech recognizer, so that speech input need not be used for everything until it has advanced far enough. We might, for example, use voice to control the cursor and for commands, while continuing to type most words, at least until word recognition gets much better than it is now. Another avenue to be explored is stylized speach. The hardest problem area in speech recognition, as I understand it, is to recognize continuous speech. If there is even the slightest pause between words, recognition is much easier. While in many applications restrictions on the speaker would be unacceptable, it might be acceptable to many when entering text, since there would still be a substantial efficiency gain over other forms of text entry. Also, we could devise sounds quite different from english for commands (a la Victor Borge!). The speaker can become trained, as well as the speech recognizer. I discussed this recently with Bea Oshika at SDC, who has been active in speech recognition for many years. She pointed out some cognitive problems with mixed mode input. People, she claims, don't do well at talking and carrying out manual tasks at the same time. But I suspect that training in voice + keyboard input to an editor could produce a more efficient result for most people. At least it is an interesting question to investigate. Stock Gaines