Peter Cochrane's Hard Drive 1999 Speech to text? Words fail me THIS column was to be the first dictated by me directly into my laptop computer using a speech-to-text program. It was also going to herald a change in my life, whereby I would no longer spend hours typing; instead, I would simply talk to my computer. Of course, I expected to have to make occasional keyboard corrections and commands; after all, no technology is ever perfect. The primary reason for my desire to change my operating mode was a series of small accidents that resulted in a minor carpal tunnel injury to my left hand. This necessitated some minor surgery, and a couple of weeks of digital incapacitation. So, a month before the operation, I looked through catalogues, visited computer stores, watched numerous demonstrations, and read the advertising. Like a dummy, I believed what was on the box and purchased a text-to-speech program. After five hours of continuous training (of the computer, not me), this expensive package was capable of creating complete gibberish. A quick search on the Net found many alternatives, one of which was a tenth of the price. Having purchased this cheaper package, I found it equally qualified in the gibberish department. Why is it taking speech-to-text such a long time to become a practical reality? And how come some people seem to be able to get them to work . . . or do they? With military systems, banking and telephone operations, it is now possible to embark on an adventure of human-machine voice interaction. Even cars and television sets can be reliably controlled and commanded by speech. So why can't my laptop understand what I say? Well, there seem to be a number of key problems with all generalised speech-to-text technologies. First, the acoustic environment, noise and echoes in the room play a critical part in disrupting performance. It also seems to be vital to get the microphone precisely positioned and the computer set up just so. It is also imperative to be clear in your diction and dictation, leaving adequate spaces between words, which means adopting the monotonic regularity of a robot. It also pays not to have a cold or to be thirsty. Even worse, the problems associated with our wide vocabulary and use of several words for one meaning or different contexts seem to kill developers' efforts to create a generalised environment. What chance is there of a breakthrough soon? As far as I can estimate, we will need computers of at least an order of magnitude more powerful than those we use today, and software improvements of a similar order to get products that work. Our technology also needs to be far more intelligent with respect to the context of our verbal inputs. I need to dictate at a conversational speed for speech input to become a practical proposition. I have seen many demonstrations by people who have clearly trained their machines, and presumably know the idiosyncrasies of their systems well. On public platforms, they realise incredible performance and stand up to all manner of challenges. Unfortunately, for me and Joe Public in the home and office, they seem to be something of a nightmare. So for the next couple of weeks, I am reduced to typing with one hand. But worse, it looks as though I am condemned to typing for at least another four to five years. Peter Cochrane holds the Collier Chair for the Public Understanding of Science & Technology at the University of Bristol. His home page is: |
Telegraph Group Limited endeavours to ensure that the information is correct but does not accept any liability for error or omission.
Users are permitted to copy some material for their personal use, but may not republish any substantial part of the data either on another website or as part of any commercial service without the prior written permission of Telegraph Group Limited.