LSTM Revolutionizes Speech and Language Processing
Sprache des Vortragstitels:
Englisch
Original Kurzfassung:
Recently, Long Short-Term Memory (LSTM) recurrent neural networks have emerged as the best-performing techniques in speech and language processing. LSTM has facilitated recent benchmark records in language identification (Google), speech recognition (Baidu), text-to-speech synthesis (Microsoft), translation (Google), sentence summarization (facebook), image caption generation (Google), and video-to-text description. Since 2012 LSTM is used in Google?s Android speech recognizer, since 2015 in Google Voice transcription, and since 2016 in Google?s Allo. At 13.06.2016 Apple announced that LSTM is used to improve iOS 10 QuickType function.
LSTM, by its constant error flow, avoids vanishing gradients and, hence, facilitates uniform credit assignment, i.e. all input signals obtain a similar error signal. Uniform credit assignment enabled LSTM networks to excel in speech and language tasks: if a sentence is analyzed, then the first word can be as important as the last word. I will describe successful deep network architectures that also allow uniform credit assignment like ResNets and Highway Networks and their essential properties. Finally, I will describe projects where we use LSTM to analyze patents, fashion blogs, and to allow attention for self-driving cars.