From 39e010357cc161fb503b300060c06fcfebf74006 Mon Sep 17 00:00:00 2001
From: HongyingG <44694098+HongyingG@users.noreply.github.com>
Date: Wed, 20 Feb 2019 20:44:26 +0800
Subject: [PATCH] sparse_update_en (#605)

* sparse_update_en

* Review

* file path review x2
---
 .../low_level/layers/sparse_update_en.rst     | 45 +++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100755 doc/fluid/api_guides/low_level/layers/sparse_update_en.rst

diff --git a/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst b/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst
new file mode 100755
index 000000000..75e541103
--- /dev/null
+++ b/doc/fluid/api_guides/low_level/layers/sparse_update_en.rst
@@ -0,0 +1,45 @@
+.. _api_guide_sparse_update_en:
+
+###############
+Sparse update
+###############
+
+Fluid's :ref:`api_fluid_layers_embedding` layer supports "sparse updates" in both single-node and distributed training, which means gradients are stored in a sparse tensor structure where only rows with non-zero gradients are saved.
+In distributed training, for larger embedding layers, sparse updates reduce the amount of communication data and speed up training.
+
+In paddle, we use lookup_table to implement embedding. The figure below illustrates the process of embedding in the forward and backward calculations:
+
+As shown in the figure: two rows in a Tensor are not 0. In the process of forward calculation, we use ids to store rows that are not 0, and use the corresponding two rows of data for calculation; the process of backward update is only to update the two lines.
+
+.. image:: ../../../images/lookup_table_training.png
+   :scale: 50 %
+
+Example
+--------------------------
+
+API reference :ref:`api_fluid_layers_embedding` . Here is a simple example:
+
+.. code-block:: python
+
+   DICT_SIZE = 10000 * 10
+   EMBED_SIZE = 64
+   IS_SPARSE = False
+   def word_emb(word, dict_size=DICT_SIZE, embed_size=EMBED_SIZE):
+       embed = fluid.layers.embedding(
+           input=word,
+           size=[dict_size, embed_size],
+           dtype='float32',
+           param_attr=fluid.ParamAttr(
+               initializer=fluid.initializer.Normal(scale=1/math.sqrt(dict_size))),
+           is_sparse=IS_SPARSE,
+           is_distributed=False)
+       return embed
+
+The parameters:
+
+- :code:`is_sparse` : Whether the gradient is a sparse tensor in the backward calculation. If not set, the gradient is a `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/prepare_data/lod_tensor.md>`_ . The default is False.
+
+- :code:`is_distributed` : Whether the current training is in a distributed scenario. Generally, this parameter can only be set in large-scale sparse updates (the 0th dimension of embedding is very large, such as several million or more). For details, please refer to the large-scale sparse API guide :ref:`api_guide_async_training`. The default is False.
+
+- API :
+   - :ref:`api_fluid_layers_embedding`
\ No newline at end of file
-- 
GitLab