Speech-to-speech translation using deep learning
No Thumbnail Available
Date
2017-03-17
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Current state-of-the-art translation systems for speech-to-speech rely heavily on a
text representation for the translation. By transcoding speech to text we lose important
information about the characteristics of the voice such as the emotion, pitch
and accent. This thesis examine the possibility of using an LSTM neural network
model to translate speech-to-speech without the need of a text representation. That
is by translating using the raw audio data directly in order to persevere the characteristics
of the voice that otherwise get lost in the text transcoding part of the
translation process. As part of this research we create a data set of phrases suitable
for speech-to-speech translation tasks. The thesis result in a proof of concept system
which need to scale the underlying deep neural network in order to work better.
Description
Keywords
Neural Networks, Deep Learning, LSTM, RNN, Speech-to-speech translation