Difference between revisions of "Speech synthesizers"

From HCE Wiki - The Human Cognitive Enhancement Wiki
Jump to: navigation, search
(summary)
(summary)
Line 2: Line 2:
  
 
These applications require speech that is intelligible and natural-sounding. Today speech synthesis systems achieve great deal of naturalness compare to a real human voice. Yet they are still perceived as non-human because minor audible glitches still remain present in the outputted utterances. It may very well be that modern speech synthesis has reached the point in the Uncanny Valley where the closeness of the artificial speech is so near perfection that humans find it unnatural.
 
These applications require speech that is intelligible and natural-sounding. Today speech synthesis systems achieve great deal of naturalness compare to a real human voice. Yet they are still perceived as non-human because minor audible glitches still remain present in the outputted utterances. It may very well be that modern speech synthesis has reached the point in the Uncanny Valley where the closeness of the artificial speech is so near perfection that humans find it unnatural.
 +
 +
=== Historical overview ===
  
 
The history of speech synthesis date back to the 18th century when Hungarian civil servant and inventor Wolfgan von Kempelen created a machine of pipes and elbows and assorted parts of musical instruments. He achieved a sufficient imitation of the human vocal tract with the third iteration. He published a comprehensive description of the design in his book entitled The Mechanism of Human Speech, with a Description of a Speaking Machine in 1791.
 
The history of speech synthesis date back to the 18th century when Hungarian civil servant and inventor Wolfgan von Kempelen created a machine of pipes and elbows and assorted parts of musical instruments. He achieved a sufficient imitation of the human vocal tract with the third iteration. He published a comprehensive description of the design in his book entitled The Mechanism of Human Speech, with a Description of a Speaking Machine in 1791.
Line 7: Line 9:
 
The design was picked up by Sir Charles Wheatstone, a Victorian Era English inventor, who improved on the Kemplelen's design. The newly sparked interest into the research of phonetics and the Whetstone's work inspired Alexander Graham Bell to do his own research into the matter and eventually arrive at the idea of the telephone. Bell laboratories created the Voice Operating Demonstrator, or Voder for short, a human speech synthesizer in the 1930s. Voder was a model of a human vocal tract, basically an early version of a formant synthesizer, and required a trained operator who had to manually create utterances using a console with fifteen touch-sensitive keys and a pedal.
 
The design was picked up by Sir Charles Wheatstone, a Victorian Era English inventor, who improved on the Kemplelen's design. The newly sparked interest into the research of phonetics and the Whetstone's work inspired Alexander Graham Bell to do his own research into the matter and eventually arrive at the idea of the telephone. Bell laboratories created the Voice Operating Demonstrator, or Voder for short, a human speech synthesizer in the 1930s. Voder was a model of a human vocal tract, basically an early version of a formant synthesizer, and required a trained operator who had to manually create utterances using a console with fifteen touch-sensitive keys and a pedal.
  
In the beginnings of the moder day speech synthesis, two main approaches appeared. [[Formant synthesis]] that tries to model the entire vocal tract of a human, and [[articulatory synthesis]] which aims to create the sounds speech is made of from scratch. However, both methods are gradually replaced by [[concatenation synthesis]] nowadays. This form of synthesis uses a large set, speech corpus, of high-quality pre-recorded audio samples. These samples can be assembled together and for a new utterance.
+
In the beginnings of the modern day speech synthesis, two main approaches appeared. [[Formant synthesis]] that tries to model the entire vocal tract of a human, and [[articulatory synthesis]] which aims to create the sounds speech is made of from scratch. However, both methods are gradually replaced by [[concatenation synthesis]] nowadays. This form of synthesis uses a large set, speech corpus, of high-quality pre-recorded audio samples. These samples can be assembled together and for a new utterance.
  
 
== Main characteristics ==
 
== Main characteristics ==

Revision as of 13:18, 18 February 2016

Speech synthesis is the methods of generating artificial speech by mechanical means or by a computer algorithm. It it used when there is a need to communicate information acoustically, and nowadays is found in text-to-speech applications (screen text reading, assistance for the visually impaired) and virtual assistants (GPS navigation, mobile assistants such as Microsoft Cortana or Apple Siri), or in any other situation where the information usually available in text has to be passed acoustically. It often comes paired with voice recognition.

These applications require speech that is intelligible and natural-sounding. Today speech synthesis systems achieve great deal of naturalness compare to a real human voice. Yet they are still perceived as non-human because minor audible glitches still remain present in the outputted utterances. It may very well be that modern speech synthesis has reached the point in the Uncanny Valley where the closeness of the artificial speech is so near perfection that humans find it unnatural.

Historical overview

The history of speech synthesis date back to the 18th century when Hungarian civil servant and inventor Wolfgan von Kempelen created a machine of pipes and elbows and assorted parts of musical instruments. He achieved a sufficient imitation of the human vocal tract with the third iteration. He published a comprehensive description of the design in his book entitled The Mechanism of Human Speech, with a Description of a Speaking Machine in 1791.

The design was picked up by Sir Charles Wheatstone, a Victorian Era English inventor, who improved on the Kemplelen's design. The newly sparked interest into the research of phonetics and the Whetstone's work inspired Alexander Graham Bell to do his own research into the matter and eventually arrive at the idea of the telephone. Bell laboratories created the Voice Operating Demonstrator, or Voder for short, a human speech synthesizer in the 1930s. Voder was a model of a human vocal tract, basically an early version of a formant synthesizer, and required a trained operator who had to manually create utterances using a console with fifteen touch-sensitive keys and a pedal.

In the beginnings of the modern day speech synthesis, two main approaches appeared. Formant synthesis that tries to model the entire vocal tract of a human, and articulatory synthesis which aims to create the sounds speech is made of from scratch. However, both methods are gradually replaced by concatenation synthesis nowadays. This form of synthesis uses a large set, speech corpus, of high-quality pre-recorded audio samples. These samples can be assembled together and for a new utterance.

Main characteristics

Purpose

Company & People

Important Dates

Enhancement/Therapy/Treatment

Ethical & Health Issues

Public & Media Impact and Presentation

Public Policy

Related Technologies, Projects or Scientific Research

References