SPEECH RECOGNITION by Jaime Fatás Cabeza

Speech recognition (SR) is a computer application that allows a computer to recognize sound waves and synthesize them as an audio stream, transcribe them as written text, or process them as recognizable commands, thus allowing control of the computer by voice. Usually, users dictate to a computer through a microphone. The application then samples and digitizes the sound waves created by the voice. It examines phonemes in the appropriate language in the context of other phonemes around them, assesses the possibilities through a complex statistical model, and compares the outcome to a vast library of known words, phrases and sentences. The software is adaptive: it records the user’s speech characteristics and then factors them into the speech recognition algorithms, allowing the application to improve future performance. This minimizes the need for user intervention and training and allows automatic incorporation of corrections and editing. SR software can be used for almost all areas of computer use, including capturing thoughts and ideas, creating and editing documents, launching and navigating applications and files, navigating the internet, playing games, etc.

SR has come of age. It has been integrated in HTML5, featured in speech-to-text and text-to-speech features in mobile platforms like Android and iPhone, and released commercially in applications like Dragon Naturally Speaking. It is a reliable and commonplace application and relatively easy to use. SR tools can significantly boost to productivity and can be easily integrated with computer assisted translation software (like SDL TRADOS) and transcription equipment. It is an indispensable tool for people with hand or visual disabilities. The application can be customized and trained to recognize almost any command, term and acronym. Some companies offer SR programs in several languages paired with English, as well as professional editions for the medical and legal fields that include related terminology. They also include automatic formatting functions for standard environments. Some come with their own editors or offer macros to develop customized functionality in non-standard environments.

Technical requirements and work environment  SR applications are technically demanding. Since the application needs to “hear” the speaker with clarity, a quiet environment and a high quality microphone works best. SR applications used for dictation use continuous speech. They works best when speaking with clear enunciation and in long, uninterrupted segments and require an initial training period where the user gives the program a sample of his or her speech (this is sometimes dubbed as “voice recognition”). The computer must be equipped with a sound card able to process seamlessly the acoustic input provided by the microphone, and with powerful microprocessors that can handle the massive statistical and database running required, as well as substantial hard disk storage.

Uses for interpreters and translators SR allows translations to be dictated instead of typed into a word processing application. This can mean a substantial increase in processing speed and output volume, and therefore higher revenue. Transcripts can be prepared by shadowing the audio and dictating it to the computer, or by playing the recording stored in a digital dictation machine through a transcription device that recognizes and integrates voice recognition and converts the audio data into text. SR can be used with  phone and videoconferencing systems, internet and application navigation, and general document preparation including database creation, glossary compilation, invoicing, correspondence. It also can be very useful as a study tool to improve pronunciation, shadowing, and preparation of materials for study, teaching, and research.

Shortcomings SR commercial applications allow users to use only one language at a time. Switching back and forth between languages requires unloading and uploading languages, which is time-consuming and unpractical. Nevertheless, there is already experimental software that can provide instantaneous two-way spoken communication integrating speech-to-speech and machine translation technology, which undoubtedly will be soon incorporated into commercial applications. SR is taxing on the voice, a matter of concern for interpreters. Finally, to determine the practicability of using SR efficiently for a given project it is crucial to assess the interaction, compatibility, and formatting possibilities of the tools and applications to be used. Sometimes it may be better to type it out.

Conclusion SR is already an indispensable tool for sophisticated legal interpreters and translators that want to boost productivity and incorporate state-of-the-art technology to the practice. With rapid advancement in their ability to interpret and translate language, virtual interlingual assistants will soon become standard features in the realm of legal interpretation and translation. The field already boasts impressive advances in areas that have to do with recognition of variation in accent and speech patterns, as well as unlimited expansion of commands and customization options, multilingual automatic document recognition, classification, analysis and translation, handwriting recognition and translation, real time speech-to-speech and translation systems, and lip-reading by computers. SR is a truly revolutionary approach to communication, a superb tool with an unlimited number of possibilities that can greatly expand and improve professional performance–provided that it is programmed, integrated, and used efficiently. It is a new frontier in intra- and interlingual communication that someday may allow us to grasp the meaning behind the words, perhaps the most direct line to true artificial intelligence. It would be in your interest to invest and get acquainted with these new technologies.

BIO> Jaime Fatás Cabeza (Profesor Superior de Música, MMA) is Assistant Professor of the Practice of Translation and Interpretation at the University of Arizona in Tucson and director of the Undergraduate Program in Translation and Interpretation in the Department of Spanish and Portuguese and the Department of MAS. He is a faculty member at the UA National Center for Interpretation, Testing, Research, and Policy. Jaime is accredited as a Federal Court Certified judicial interpreter and translator, as a medical and conference interpreter, and as a translator (Eng. to Spa.) by the American Translators Association.