Speech-to-speech translation using deep learning
Sammanfattning
Current state-of-the-art translation systems for speech-to-speech rely heavily on a
text representation for the translation. By transcoding speech to text we lose important
information about the characteristics of the voice such as the emotion, pitch
and accent. This thesis examine the possibility of using an LSTM neural network
model to translate speech-to-speech without the need of a text representation. That
is by translating using the raw audio data directly in order to persevere the characteristics
of the voice that otherwise get lost in the text transcoding part of the
translation process. As part of this research we create a data set of phrases suitable
for speech-to-speech translation tasks. The thesis result in a proof of concept system
which need to scale the underlying deep neural network in order to work better.
Examinationsnivå
Student essay
Samlingar
Fil(er)
Datum
2017-03-17Författare
Bredmar, Fredrik
Nyckelord
Neural Networks
Deep Learning
LSTM
RNN
Speech-to-speech translation
Språk
eng