Technology Primer
History of Computer Speech Recognition Solutions
The first speech recognizer appeared way back in 1952 and was, basically, a device for the recognition of single spoken digits. Another early device, in 1961, was the IBM Shoebox which was an IBM computer that was able to perform mathematical functions and could recognise 16 spoken words and the digits 0 through 9.
In 1982, Kurzweil Applied Intelligence and Dragon Systems released speech recognition products which, by 1985, had a vocabulary of 1,000 words—if uttered one word at a time. Two years later, in 1987, its lexicon reached 20,000 words, entering the realm of human vocabularies, which range from 10,000 to 150,000 words. But recognition accuracy was poor - achieving only 10% in 1993 and only, finally, coming under 50% only two years later in 1995.
Dragon Systems released "Naturally Speaking" in 1997 which recognized normal human speech.
IBM were also in the field, with their VoiceType and ViaVoice product range. IBM had developed a product named VoiceType. This was to be replaced, in 1997, when ViaVoice was first introduced to the general public. Two years later, in 1999, IBM released a free of charge version of ViaVoice.
IBM eventually sold ViaVoice, in 2003, to ScanSoft – which eventually became Nuance. In 2003, IBM awarded ScanSoft, which owned the competing Dragon Naturally Speaking, exclusive global distribution rights to ViaVoice Desktop products for Windows and Mac OS X. Scansoft subsequently merged with Nuance in May 2005.
Another Player in the Speech Recognition Systems arena was Philips Speech Recognition Systems of Vienna, Austria with their SpeechMagic Speech Recognition Solution. SpeechMagic tended to be used mainly in the Healthcare and Legal Sectors. SpeechMagic supported over 30 Speech Recognition languages and over 150 industry-specific vocabularies.
On October 1, 2008 Nuance announced that it had acquired Philips Speech Recognition Systems.
Nuance Dragon Naturally Speaking, today, at over 99% accuracy can truly claim to be at the pinnacle of Continuous Speech Recognition Technology.
Why did it take so long?
For the past thirty years, speech recognition research has been characterized by the steady accumulation of small incremental improvements. There has also been a trend continually to change focus to more difficult tasks due both to progress in speech recognition performance and to the availability of faster computers.
Commercial research and other academic research also continue to focus on increasingly difficult problems. One key area is to improve robustness of speech recognition performance, not just robustness against noise but robustness against any condition that causes a major degradation in performance.
The fact that Speech Recognition Solutions were wrongly sold as a way to completely eliminate transcription rather than make the transcription process more efficient hindered universal acceptance.
Fundamentally, the nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system. People can filter out noise, allowing us to understand each other even in a very noisy environment. We also have the benefit of the associated body language and gestures to help us to understand what the other person is saying. Apart from the primary task of filtering out background noise (using noise cancellation technology) and not having the benefit of observing body language, gestures etc. computers don’t actually understand what words mean. All the computer is doing is doing statistical calculations and predictions on sound. Also, when people talk, they often hesitate, mumble, slur their words, or leave words out altogether. But we are still able to understand each other. For example, we use our experience and common sense to decide whether someone said “I Scream” or “Ice Cream”. Because Speech Recognition Systems don’t understand what words mean, they can’t use common sense the way people do.
Speech Recognition Systems, generally, recognise words that appear next to other words. They learn this from the way you speak to them. You need to help them understand how you speak and the context of the words you use. They then calculate how frequently words and phrases are used and can offer you suggestions when they make mistakes.
Any of you who used to use the older Speech Recognition Systems can probably remember the long, long time taken to train and adapt the software - only then have to go through the same exhaustive process when you changed location or background noise changed. Good News! You can train Dragon to adapt to your particular way of speaking in as little as 4 minutes! !
A new, exciting key area of research is focused on an opportunity rather than a problem. This research attempts to take advantage of the fact that in many applications there is a large quantity of speech data available, up to millions of hours. It is too expensive to have humans transcribe such large quantities of speech, so the research focus is on developing new methods of machine learning that can effectively utilize large quantities of unlabeled data.
Digital Dictation Systems
The actual voice data ( voice data ) can be delivered to Dragon Naturally Speaking in two ways:
- Dictate directly into the computer via microphone / headset and interact with Dragon, in real-time, as it converts voice to text on your screen and executes voice computer control commands, at your discretion.
- You can also pre-record the dictation onto a Digital Recorder for later processing.
Digital Voice Recorders are in totally different league to Tradition Tape Dictation Machines. There are no re-used tapes to deteriorate the original crisp, clear quality of the sound so listening to Digital Dictation is as near to directly listening to the person speaking as is possible. This is why Digital Dictation and Speech Recognition go hand in hand. Also:
- The user can instantly rewind or fast forward to any point within the dictation file to review or edit.
- The random access ability of digital audio allows a user to insert audio at any point without overwriting the following text.
- The file you dictate can then be then be transferred, instantly, directly over the LAN to your secretary, by eMail to anybody anywhere with a computer. No need for swapping tapes, wiping tapes etc. You can even dictate while out of the office and send the work back for dictating instantly.
- From you computer, you can track transcription progress, re-order / re-prioritise jobs as required.
- Work can be prioritised electronically, so that urgent matters jump the queue or transfer to another secretary.
We provide full training to operate the devices effectively along with additional Olympus accessories such as transcription kits that streamline your transcription workflow and reduce document production times.
Latest News
Some statistics from Medical Practives and their Managers that Nuance Voice Recognition delivers results!
Read more...
Central user administration for Dragon Group products
Read more...
Next generation speech engine with Deep Learning - even faster and more accurate than ever.
Read more...
Software Support for DS and AS-4000 ceasing on 30th April 2012
Read more...
Improved accuracy and faster performance with an intuitive new interface,Naturally Speaking 11 has arrived!
Read more...
Latest version of the medical desktop, real-time speech recognition software was released today.
Read more...