Publikationsdetails

Zitat:	Vihang Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose Arjona Medina, Sepp Hochreiter, "Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution" : Proceedings of the 39th International Conference on Machine Learning, 2022
Original Titel:	Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
Sprache des Titels:	Englisch
Original Buchtitel:	Proceedings of the 39th International Conference on Machine Learning
Original Kurzfassung:	Reinforcement Learning algorithms require a large number of samples to solvecomplex tasks with sparse and delayed rewards. Complex tasks can often be hierar-chically decomposed into sub-tasks. A step in theQ-function can be associatedwith solving a sub-task, where the expectation of the return increases. RUDDERhas been introduced to identify these steps and then redistribute reward to them,thus immediately giving reward if sub-tasks are solved. Since the problem ofdelayed rewards is mitigated, learning is considerably sped up. However, forcomplex tasks, current exploration strategies as deployed in RUDDER strugglewith discovering episodes with high rewards. Therefore, we assume that episodeswith high rewards are given as demonstrations and do not have to be discoveredby exploration. Typically the number of demonstrations is small and RUDDER?sLSTM model as a deep learning method does not learn well. Hence, we introduceAlign-RUDDER, which is RUDDER with two major modifications. First, Align-RUDDER assumes that episodes with high rewards are given as demonstrations,replacing RUDDER?s safe exploration and lessons replay buffer. Second, we re-place RUDDER?s LSTM model by a profile model that is obtained from multiplesequence alignment of demonstrations. Profile models can be constructed fromas few as two demonstrations as known from bioinformatics. Align-RUDDERinherits the concept of reward redistribution, which considerably reduces the delayof rewards, thus speeding up learning. Align-RUDDER outperforms competitorson complex artificial tasks with delayed reward and few demonstrations. On theMineCraftObtainDiamondtask, Align-RUDDER is able to mine a diamond,though not frequently.
Sprache der Kurzfassung:	Englisch
Erscheinungsjahr:	2022
Anzahl der Seiten:	42
URL zu weiteren Infos:	https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiI5sO98eD_AhWYraQKHa55C1EQFnoECAwQAQ&url=https%3A%2F%2Fproceedings.mlr.press%2Fv162%2Fpatil22a%2Fpatil22a.pdf&usg=AOvVaw1jPlOBfGN1YZjm5uUWApJY&opi=89978449
Reichweite:	international
Publikationstyp:	Aufsatz / Paper in Tagungsband (referiert)
Autoren:	Vihang Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose Arjona Medina, Sepp Hochreiter
Forschungseinheiten:	LIT Artificial Intelligence Lab Institut für Machine Learning Institut für Signalverarbeitung

Wissenschaftsgebiete:	Biomathematik (ÖSTAT:101004) Numerische Mathematik (ÖSTAT:101014) Operations Research (ÖSTAT:101015) Optimierung (ÖSTAT:101016) Spieltheorie (ÖSTAT:101017) Statistik (ÖSTAT:101018) Stochastik (ÖSTAT:101019) Wahrscheinlichkeitstheorie (ÖSTAT:101024) Zeitreihenanalyse (ÖSTAT:101026) Dynamische Systeme (ÖSTAT:101027) Mathematische Modellierung (ÖSTAT:101028) Mathematische Statistik (ÖSTAT:101029) Approximationstheorie (ÖSTAT:101031) Informatik (ÖSTAT:102) Artificial Intelligence (ÖSTAT:102001) Bildverarbeitung (ÖSTAT:102003) Bioinformatik (ÖSTAT:102004) Human-Computer Interaction (ÖSTAT:102013) Künstliche Neuronale Netze (ÖSTAT:102018) Machine Learning (ÖSTAT:102019) Computational Intelligence (ÖSTAT:102032) Data Mining (ÖSTAT:102033) Statistische Physik (ÖSTAT:103029) Bioinformatik (ÖSTAT:106005) Biostatistik (ÖSTAT:106007) Embedded Systems (ÖSTAT:202017) Robotik (ÖSTAT:202035) Sensorik (ÖSTAT:202036) Signalverarbeitung (ÖSTAT:202037) Computerunterstützte Diagnose und Therapie (ÖSTAT:305901) Medizinische Informatik (ÖSTAT:305905) Medizinische Statistik (ÖSTAT:305907)

Forschungsprojekte:	JKU LIT SAL eSPML Lab (Anfangsjahr: 2020)

fodok.jku.at

Benutzerbetreuung: Sandra Winzer, letzte Änderung:

Johannes Kepler Universität (JKU) Linz, Altenbergerstr. 69, A-4040 Linz, Austria
Telefon + 43 732 / 2468 - 9121, Fax + 43 732 / 2468 - 29121, Internet www.jku.at, Impressum