Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

Garrett Warnell; Nicholas R. Waytowich; Vernon J. Lawhern; P. Stone

DOI:10.1609/aaai.v32i1.11485
Corpus ID: 4130751

Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

@inproceedings{Warnell2017DeepTI,
  title={Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces},
  author={Garrett Warnell and Nicholas R. Waytowich and Vernon J. Lawhern and Peter Stone},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2017},
  url={https://api.semanticscholar.org/CorpusID:4130751}
}

Garrett WarnellNicholas R. Waytowich P. Stone
Published in AAAI Conference on Artificial… 1 September 2017
Computer Science

An extension of the TAMER framework that leverages the representational power of deep neural networks in order to learn complex tasks in just a short amount of time with a human trainer, and demonstrates its success by using it and just 15 minutes of human-provided feedback to train an agent that performs better than humans on the Atari game of Bowling.

[PDF] Semantic Reader

285 Citations

Highly Influential Citations

Background Citations

148

Methods Citations

Results Citations

Figures from this paper

Topics

Deep TAMER TAMER Framework Human Trainer Training An Agent Manually Via Evaluative Reinforcement Deep Learning Atari Games Deep Reinforcement Learning Real-Time Deep Neural Networks

Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach

Rodrigo Pérez-DattariC. CeleminJavier Ruiz-del-SolarJens Kober

Computer Science, Engineering

2019 International Conference on Robotics and…

2019

Experimental results validate the efficiency of the D-COACH framework in three different problems, and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.

[PDF]

Deep Reinforcement Learning from Policy-Dependent Human Feedback

Dilip ArumugamJun Ki LeeS. SaskinM. Littman

Computer Science

ArXiv

2019

The effectiveness of the Deep COACH algorithm is demonstrated in the rich 3D world of Minecraft with an agent that learns to complete tasks by mapping from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction.

[PDF]

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

Baicen XiaoQifan LuBhaskar RamasubramanianAndrew ClarkL. BushnellR. Poovendran

Computer Science

AAMAS

2020

This paper seeks to effectively integrate feedback signals supplied by a human operator with deep reinforcement learning algorithms in high-dimensional state spaces and uses an ensemble of neural networks with a shared network architecture to represent model uncertainty and the confidence of the neural network in its output.

[PDF]

Hierarchical learning from human preferences and curiosity

Nicolas BougieR. Ichise

Computer Science

Applied Intelligence

2021

A novel hierarchical reinforcement learning method that introduces non-expert human preferences at the high-level, and curiosity to drastically speed up the convergence of subpolicies to reach any sub-goals, which drastically reduces the amount of human effort required over standard imitation learning approaches.

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Riku ArakawaSosuke KobayashiY. UnnoYuta TsuboiS. Maeda

Computer Science, Engineering

ArXiv

2018

This work demonstrates a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze and proposes an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards.

[PDF]

Goal-driven active learning

Nicolas BougieR. Ichise

Computer Science

Autonomous Agents and Multi-Agent Systems

2021

This work proposes a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process and introduces the concept of active goal- driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space.

Using LLMs for Augmenting Hierarchical Agents with Common Sense Priors

Bharat PrakashTim OatesT. Mohsenin

Computer Science

FLAIRS

2024

This paper exploits the planning capabilities of LLMs while using RL to provide learning from the environment, resulting in a hierarchical agent that uses LLMs to solve long-horizon tasks and that agents trained using this approach outperform other baselines methods and, once trained, don't need access to LLMs during deployment.

GUIDE: Real-Time Human-Shaped Agents

Lingyu ZhangZhengran JiNicholas R. WaytowichBoyuan Chen

Computer Science

NeurIPS

2024

GUIDE, a framework for real-time human-guided reinforcement learning is introduced by enabling continuous human feedback and grounding such feedback into dense rewards to accelerate policy learning by reducing the need for human input while allowing continual training.

[PDF]

Shared Autonomy via Deep Reinforcement Learning

S. ReddyS. LevineA. Dragan

Computer Science

Robotics: Science and Systems

2018

This paper uses human-in-the-loop reinforcement learning with neural network function approximation to learn an end-to-end mapping from environmental observation and user input to agent action, with task reward as the only form of supervision.

[PDF]

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

Kimin LeeLaura M. SmithP. Abbeel

Computer Science

ICML

2021

This work presents an off-policy, interactive RL algorithm that capitalizes on the strengths of both feedback and off- policy learning, and is able to utilize real-time human feedback to effectively prevent reward exploitation and learn new behaviors that are difficult to specify with standard reward functions.

[PDF]

Learning from Demonstrations for Real World Reinforcement Learning

Todd HesterMatej Vecerík A. Gruslys

Computer Science

ArXiv

2017

This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages this data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.

[PDF]

Deep Reinforcement Learning from Human Preferences

P. ChristianoJan LeikeTom B. BrownMiljan MarticS. LeggDario Amodei

Computer Science

NIPS

2017

This work explores goals defined in terms of (non-expert) human preferences between pairs of trajectory segments in order to effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion.

[PDF]

Interactively shaping agents via human reinforcement: the TAMER framework

W. B. KnoxP. Stone

Computer Science

K-CAP '09

2009

Results from two domains demonstrate that lay users can train TAMER agents without defining an environmental reward function (as in an MDP) and indicate that human training within the TAMER framework can reduce sample complexity over autonomous learning algorithms.

Human-level control through deep reinforcement learning

Volodymyr MnihK. Kavukcuoglu D. Hassabis

Computer Science

Nature

2015

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Reinforcement learning from simultaneous human and MDP reward

W. B. KnoxP. Stone

Computer Science

AAMAS

2012

A novel algorithm is introduced that shares the same spirit as tamer+rl but learns simultaneously from both reward sources, enabling the human feedback to come at any time during the reinforcement learning process.

A Large-Scale Study of Agents Learning from Human Reward

Guangliang LiH. HungS. Whiteson

Computer Science

AAMAS

2015

The results show for the first time that an agent using TAMER can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem based on the popular video game, given feedback from both adult and child trainers.

Maximum Entropy Deep Inverse Reinforcement Learning

Markus WulfmeierPeter OndruskaI. Posner

Computer Science

2015

It is shown that the Maximum Entropy paradigm for IRL lends itself naturally to the efficient training of deep architectures, and the approach achieves performance commensurate to the state-of-the-art on existing benchmarks while exceeding on an alternative benchmark based on highly varying reward structures.

[PDF]

A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans

Bei PengJ. MacGlashanR. LoftinM. LittmanD. RobertsMatthew E. Taylor

Computer Science

AAMAS

2016

This work aims to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer's target policy.

Deep Reinforcement Learning with Double Q-Learning

H. V. HasseltA. GuezDavid Silver

Computer Science

AAAI

2016

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

8,023

[PDF]

Asynchronous Methods for Deep Reinforcement Learning

Volodymyr MnihAdrià Puigdomènech Badia K. Kavukcuoglu

Computer Science

ICML

2016

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

[PDF]

Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

Figures from this paper

Topics

285 Citations

Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach

Deep Reinforcement Learning from Policy-Dependent Human Feedback

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

Hierarchical learning from human preferences and curiosity

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Goal-driven active learning

Using LLMs for Augmenting Hierarchical Agents with Common Sense Priors

GUIDE: Real-Time Human-Shaped Agents

Shared Autonomy via Deep Reinforcement Learning

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

37 References

Learning from Demonstrations for Real World Reinforcement Learning

Deep Reinforcement Learning from Human Preferences

Interactively shaping agents via human reinforcement: the TAMER framework

Human-level control through deep reinforcement learning

Reinforcement learning from simultaneous human and MDP reward

A Large-Scale Study of Agents Learning from Human Reward

Maximum Entropy Deep Inverse Reinforcement Learning

A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans

Deep Reinforcement Learning with Double Q-Learning

Asynchronous Methods for Deep Reinforcement Learning

Related Papers