dc.contributor.author | Bredmar, Fredrik | |
dc.date.accessioned | 2017-03-17T13:06:45Z | |
dc.date.available | 2017-03-17T13:06:45Z | |
dc.date.issued | 2017-03-17 | |
dc.identifier.uri | http://hdl.handle.net/2077/51978 | |
dc.description.abstract | Current state-of-the-art translation systems for speech-to-speech rely heavily on a
text representation for the translation. By transcoding speech to text we lose important
information about the characteristics of the voice such as the emotion, pitch
and accent. This thesis examine the possibility of using an LSTM neural network
model to translate speech-to-speech without the need of a text representation. That
is by translating using the raw audio data directly in order to persevere the characteristics
of the voice that otherwise get lost in the text transcoding part of the
translation process. As part of this research we create a data set of phrases suitable
for speech-to-speech translation tasks. The thesis result in a proof of concept system
which need to scale the underlying deep neural network in order to work better. | sv |
dc.language.iso | eng | sv |
dc.subject | Neural Networks | sv |
dc.subject | Deep Learning | sv |
dc.subject | LSTM | sv |
dc.subject | RNN | sv |
dc.subject | Speech-to-speech translation | sv |
dc.title | Speech-to-speech translation using deep learning | sv |
dc.type | text | |
dc.setspec.uppsok | Technology | |
dc.type.uppsok | H2 | |
dc.contributor.department | Göteborgs universitet/Institutionen för data- och informationsteknik | swe |
dc.contributor.department | University of Gothenburg/Department of Computer Science and Engineering | eng |
dc.type.degree | Student essay | |