model-based.md 2.8 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
### Model-based RL
1. **Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion** NIPS2018. [paper](https://arxiv.org/abs/1807.01675)

    *Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee*

2. **Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning**  ICML2018.[paper](https://arxiv.org/abs/1803.00101)

    *Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine* 
    
3. **Value Prediction Network** NIPS2017. [paper](https://arxiv.org/abs/1707.03497)

    *Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine*
    
4. **Imagination-Augmented Agents for Deep Reinforcement Learning** NIPS2017. [paper](https://arxiv.org/abs/1707.06203)

    *Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra*
    
5. **Continuous Deep Q-Learning with Model-based Acceleration** ICML2016. [paper](https://arxiv.org/abs/1603.00748)
    
    *Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine*

6. **Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning** CoRL2017. [paper](http://proceedings.mlr.press/v78/kalweit17a/kalweit17a.pdf)
    
    *Gabriel Kalweit, Joschka Boedecker*
    
7. **Model-Ensemble Trust-Region Policy Optimization** ICLR2018. [paper](https://arxiv.org/abs/1802.10592)
    
    *Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel*
B
Bo Zhou 已提交
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

8. **Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models** NIPS2018. [paper](https://arxiv.org/abs/1805.12114)
    
    *Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine*
    
9. **Dyna, an integrated architecture for learning, planning, and reacting** ACM1991. [paper](https://dl.acm.org/citation.cfm?id=122377)
    
    *Sutton, Richard S*
    
10. **Learning Continuous Control Policies by Stochastic Value Gradients** NIPS 2015. [paper](https://arxiv.org/abs/1510.09142)
    
    *Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez*
    
11. **Imagination-Augmented Agents for Deep Reinforcement Learning** NIPS 2017. [paper](https://arxiv.org/abs/1707.06203)

    *Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra*
    
12. **Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks** ICLR 2017. [paper](https://arxiv.org/abs/1605.07127)
    
    *Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, Steffen Udluft*