There are some fine differences between Adadelta and RMSprop. To find out more about them, you can refer to [http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) and [https://arxiv.org/pdf/1212.5701.pdf](https://arxiv.org/pdf/1212.5701.pdf).
An example of distributed gradient descent with a parameter server as taken from [https://research.google.com/archive/large_deep_networks_nips2012.html](https://research.google.com/archive/large_deep_networks_nips2012.html)
An example of TensorFlow graph as taken from [http://download.tensorflow.org/paper/whitepaper2015.pdf](http://download.tensorflow.org/paper/whitepaper2015.pdf)
An example of distributed TensorFlow graph computation as taken from [http://download.tensorflow.org/paper/whitepaper2015.pdf](http://download.tensorflow.org/paper/whitepaper2015.pdf)
可以通过集中方式(下图的左侧)或分布式方式(右侧)来计算梯度下降和所有主要的优化器算法。 后者涉及一个主进程,该主进程与调配 GPU 和 CPU 的多个工作人员进行对话:
![](img/96de5e4c-3af3-4ba3-9afa-6d1ad513b03c.png)
An example of single machine and distributed system structure as taken from An example of distributed TensorFlow graph computation as taken from [http://download.tensorflow.org/paper/whitepaper2015.pdf](http://download.tensorflow.org/paper/whitepaper2015.pdf)