Speech synthesizers

From HCE Wiki - The Human Cognitive Enhancement Wiki
Revision as of 14:49, 18 February 2016 by Haustein (Talk | contribs) (added pictures)

Jump to: navigation, search

Speech synthesis is the methods of generating artificial speech by mechanical means or by a computer algorithm. It it used when there is a need to communicate information acoustically, and nowadays is found in text-to-speech applications (screen text reading, assistance for the visually impaired) and virtual assistants (GPS navigation, mobile assistants such as Microsoft Cortana or Apple Siri), or in any other situation where the information usually available in text has to be passed acoustically. It often comes paired with voice recognition.

These applications require speech that is intelligible and natural-sounding. Today speech synthesis systems achieve great deal of naturalness compare to a real human voice. Yet they are still perceived as non-human because minor audible glitches still remain present in the outputted utterances. It may very well be that modern speech synthesis has reached the point in the Uncanny Valley where the closeness of the artificial speech is so near perfection that humans find it unnatural.

Historical overview

Schematics of von Kempelen's pneumatic speech synthesizer

The history of speech synthesis date back to the 18th century when Hungarian civil servant and inventor Wolfgan von Kempelen created a machine of pipes and elbows and assorted parts of musical instruments. He achieved a sufficient imitation of the human vocal tract with the third iteration. He published a comprehensive description of the design in his book entitled The Mechanism of Human Speech, with a Description of a Speaking Machine in 1791.

File:Voder schematic.gif
Voder developed by Bell Laboratories

The design was picked up by Sir Charles Wheatstone, a Victorian Era English inventor, who improved on the Kemplelen's design. The newly sparked interest into the research of phonetics and the Whetstone's work inspired Alexander Graham Bell to do his own research into the matter and eventually arrive at the idea of the telephone. Bell laboratories created the Voice Operating Demonstrator, or Voder for short, a human speech synthesizer in the 1930s. Voder was a model of a human vocal tract, basically an early version of a formant synthesizer, and required a trained operator who had to manually create utterances using a console with fifteen touch-sensitive keys and a pedal.

Digital speech synthesizer begun to emerge in the 1980s, following the MIT-developed DECtalk text-to-speech synthesizer. This synthesizer was notably used by the physicist Stephen Hawking. His website claims that he uses it even today.

In the beginnings of the modern day speech synthesis, two main approaches appeared. Formant synthesis that tries to model the entire vocal tract of a human, and articulatory synthesis which aims to create the sounds speech is made of from scratch. However, both methods are gradually replaced by concatenation synthesis nowadays. This form of synthesis uses a large set, speech corpus, of high-quality pre-recorded audio samples. These samples can be assembled together and form a new utterance.

Main characteristics

Speech synthesizers create synthetic speech by various ways. They are used to mimic how the human vocal tract works and how the air passes through it, they are used to manipulate sounds to create the basic building blocks of speech, or they assemble new utterances from a large database of pre-recorded audio samples.

Purpose

The purpose of speech synthesis is to model, research, and create synthetic speech for applications where communicating information via text is undesirable or cumbersome. It is used in 'giving voice' to virtual assistants and in text-to-speech, most notably as a speech aid for visually impaired people or for those who lost their own voice.

Company & People

Important Dates

Enhancement/Therapy/Treatment

Ethical & Health Issues

Public & Media Impact and Presentation

Public Policy

Related Technologies, Projects or Scientific Research

References