GAIL

Overview

GAIL (Generative Adversarial Imitation Learning) was first proposed in Generative Adversarial Imitation Learning, which deduced the optimization objective of GAIL from the perspective of occupancy measure. Compared to other learning methods, GAIL neither suffers from the compounding error problem in imitation learning, nor needs to expensively learn the inter-mediate reward function as in inverse reinforcement learning. But similar to other methods, GAIL is also exposed to “the curse of dimensionality”, which makes the scalability much valuable in high-dimension-space problems.

Quick Facts

  1. GAIL consists of a generator and a discriminator, trained in an adversarial manner.

  2. The generator is optimized for a surrogate reward, usually by policy-gradient reinforcement learning methods, like TRPO, for its sampling nature.

  3. The discriminator can be simply optimized by typical gradient descent methods, like Adam, to distinguish expert and generated data.

Key Equations or Key Graphs

The objective function in GAIL’s adversarial training is as below:

../_images/gail_loss.png

where pi is the generator policy, D is the discriminator policy, while \(H(\pi)\) is the causal entropy of policy pi. This is a min-max optimization process, and the objective is optimized in an iterative adversarial manner.

Pseudo-Code

Extensions

Reference

  1. Ho, Jonathan, and Stefano Ermon. “Generative adversarial imitation learning.” Advances in neural information processing systems 29 (2016): 4565-4573.

  2. Song, Jiaming, et al. “Multi-agent generative adversarial imitation learning.” arXiv preprint arXiv:1807.09936 (2018).

  3. Finn, Chelsea, et al. “A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models.” arXiv preprint arXiv:1611.03852 (2016).