Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

@inproceedings{Warnell2017DeepTI,
  title={Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces},
  author={Garrett Warnell and Nicholas R. Waytowich and Vernon J. Lawhern and Peter Stone},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2017},
  url={https://api.semanticscholar.org/CorpusID:4130751}
}
An extension of the TAMER framework that leverages the representational power of deep neural networks in order to learn complex tasks in just a short amount of time with a human trainer, and demonstrates its success by using it and just 15 minutes of human-provided feedback to train an agent that performs better than humans on the Atari game of Bowling.

Figures from this paper

Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach

Experimental results validate the efficiency of the D-COACH framework in three different problems, and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.

Deep Reinforcement Learning from Policy-Dependent Human Feedback

The effectiveness of the Deep COACH algorithm is demonstrated in the rich 3D world of Minecraft with an agent that learns to complete tasks by mapping from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction.

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

This paper seeks to effectively integrate feedback signals supplied by a human operator with deep reinforcement learning algorithms in high-dimensional state spaces and uses an ensemble of neural networks with a shared network architecture to represent model uncertainty and the confidence of the neural network in its output.

Hierarchical learning from human preferences and curiosity

A novel hierarchical reinforcement learning method that introduces non-expert human preferences at the high-level, and curiosity to drastically speed up the convergence of subpolicies to reach any sub-goals, which drastically reduces the amount of human effort required over standard imitation learning approaches.

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

This work demonstrates a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze and proposes an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards.

Goal-driven active learning

This work proposes a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process and introduces the concept of active goal- driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space.

Using LLMs for Augmenting Hierarchical Agents with Common Sense Priors

This paper exploits the planning capabilities of LLMs while using RL to provide learning from the environment, resulting in a hierarchical agent that uses LLMs to solve long-horizon tasks and that agents trained using this approach outperform other baselines methods and, once trained, don't need access to LLMs during deployment.

GUIDE: Real-Time Human-Shaped Agents

GUIDE, a framework for real-time human-guided reinforcement learning is introduced by enabling continuous human feedback and grounding such feedback into dense rewards to accelerate policy learning by reducing the need for human input while allowing continual training.

Shared Autonomy via Deep Reinforcement Learning

This paper uses human-in-the-loop reinforcement learning with neural network function approximation to learn an end-to-end mapping from environmental observation and user input to agent action, with task reward as the only form of supervision.

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

This work presents an off-policy, interactive RL algorithm that capitalizes on the strengths of both feedback and off- policy learning, and is able to utilize real-time human feedback to effectively prevent reward exploitation and learn new behaviors that are difficult to specify with standard reward functions.
...

Learning from Demonstrations for Real World Reinforcement Learning

This paper presents an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages this data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstrationData while learning thanks to a prioritized replay mechanism.

Deep Reinforcement Learning from Human Preferences

This work explores goals defined in terms of (non-expert) human preferences between pairs of trajectory segments in order to effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion.

Interactively shaping agents via human reinforcement: the TAMER framework

Results from two domains demonstrate that lay users can train TAMER agents without defining an environmental reward function (as in an MDP) and indicate that human training within the TAMER framework can reduce sample complexity over autonomous learning algorithms.

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Reinforcement learning from simultaneous human and MDP reward

A novel algorithm is introduced that shares the same spirit as tamer+rl but learns simultaneously from both reward sources, enabling the human feedback to come at any time during the reinforcement learning process.

A Large-Scale Study of Agents Learning from Human Reward

The results show for the first time that an agent using TAMER can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem based on the popular video game, given feedback from both adult and child trainers.

Maximum Entropy Deep Inverse Reinforcement Learning

It is shown that the Maximum Entropy paradigm for IRL lends itself naturally to the efficient training of deep architectures, and the approach achieves performance commensurate to the state-of-the-art on existing benchmarks while exceeding on an alternative benchmark based on highly varying reward structures.

A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans

This work aims to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer's target policy.

Deep Reinforcement Learning with Double Q-Learning

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Asynchronous Methods for Deep Reinforcement Learning

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.