"multilayer feedforward networks with as few as one hidden layer are indeed capable of universal approximation in a very precise and satisfactory sense."
This recipe is based on the original DQN paper, Playing Atari with Deep Reinforcement Learning by DeepMind. In the paper, they used a concept called **experience replay**, which involved randomly sampling the previous game moves (state, action reward, next state).
*Our experiments reveal several surprising results about large-scale nonconvex optimization. Firstly, asynchronous SGD, rarely applied to nonconvex problems, works very well for training deep networks, particularly when combined with Adagrad adaptive learning rates.*