Gehaltener Vortrag - Details

Original Vortragstitel:	xLSTM: An Architecture Much Faster Than Transformers
Sprache des Vortragstitels:	Englisch
Original Kurzfassung:	The currently most successful deep learning architecture is the Transformer, however its computational complexity scales quadratically with the sequence length, its memory requirements grow linearly with the sequence length, its is only based on pairwise associations of sequence elements. In contrast, recurrent neural networks such as LSTMs have a computational complexity that is linear with the sequence length, they have a memory of constant size, and associate a sequence element with a representation of all previous sequence elements. We asked ourselves: how far can we get in language modeling if we scale LSTMs to billions of parameters, using the latest techniques of Transformer architectures, but mitigating the known limitations of LSTMs? This question is answered by xLSTM, which extends LSTM with exponential gating, a matrix memory with a covariance update rule, and full parallelizability like Transformers. xLSTM compares favorably to Transformers and state-space models in terms of both performance and scaling laws. Most importantly, our Trition-Kernels make xLSTM faster than FlashAttention and Mamba, both in training and inference. xLSTM is perfectly suited for embedded and edge applications due to its speed and low, constant memory footprint.
Sprache der Kurzfassung:	Englisch
Vortragstyp:	Andere Vorträge und Präsentationen
Vortragsdatum:	21.01.2025
Vortragsort:	Niederlande
Details zum Vortragsort:	Amsterdam
Vortragende:	Sepp Hochreiter
Forschungseinheiten:	Institut für Machine Learning

Wissenschaftsgebiete:	Approximationstheorie (ÖSTAT:101031) Artificial Intelligence (ÖSTAT:102001) Bildverarbeitung (ÖSTAT:102003) Bioinformatik (ÖSTAT:102004) Bioinformatik (ÖSTAT:106005) Biomathematik (ÖSTAT:101004) Biostatistik (ÖSTAT:106007) Computational Intelligence (ÖSTAT:102032) Computerunterstützte Diagnose und Therapie (ÖSTAT:305901) Data Mining (ÖSTAT:102033) Dynamische Systeme (ÖSTAT:101027) Embedded Systems (ÖSTAT:202017) Human-Computer Interaction (ÖSTAT:102013) Informatik (ÖSTAT:102) Künstliche Neuronale Netze (ÖSTAT:102018) Machine Learning (ÖSTAT:102019) Mathematische Modellierung (ÖSTAT:101028) Mathematische Statistik (ÖSTAT:101029) Medizinische Informatik (ÖSTAT:305905) Medizinische Statistik (ÖSTAT:305907) Numerische Mathematik (ÖSTAT:101014) Operations Research (ÖSTAT:101015) Optimierung (ÖSTAT:101016) Robotik (ÖSTAT:202035) Sensorik (ÖSTAT:202036) Signalverarbeitung (ÖSTAT:202037) Spieltheorie (ÖSTAT:101017) Statistik (ÖSTAT:101018) Statistische Physik (ÖSTAT:103029) Stochastik (ÖSTAT:101019) Wahrscheinlichkeitstheorie (ÖSTAT:101024) Zeitreihenanalyse (ÖSTAT:101026)

fodok.jku.at

Benutzerbetreuung: Sandra Winzer, letzte Änderung:

Johannes Kepler Universität (JKU) Linz, Altenbergerstr. 69, A-4040 Linz, Austria
Telefon + 43 732 / 2468 - 9121, Fax + 43 732 / 2468 - 29121, Internet www.jku.at, Impressum