`SelectedRows` is a kind of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:
```cpp
class SelectedRows {
private:
vector<int> rows_;
Tensor value_;
int height_;
};
```
The field `height_` shows the first dimension of `SelectedRows`. The `rows` are the indices of which rows of `SelectedRows` are non-zeros. The `value_` field is an N-dim tensor and shape is `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be:
```
x = SelectedRow {
rows = [73, 84],
value = [[1, 2], [3,4]]
}
```
## SelectedRows in Protobuf
`SelectedRows` is a kind of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time since the `rows_` and `value_` are related to training data.
So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description.
```proto
message TensorDesc {
required DataType data_type = 1;
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
Just like `LoD` information, `InferShape` method will inference output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following
There are several operators should be written to support `SelectedRows`. They are:
1. Operators which generates `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`.
<liclass="toctree-l2"><aclass="reference internal"href="../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
<liclass="toctree-l3"><aclass="reference internal"href="../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
<liclass="toctree-l3"><aclass="reference internal"href="../getstarted/build_and_install/build_from_source_en.html">Installing from Sources</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/dev/build_en.html">Build PaddlePaddle from Source Code and Run Unit Test</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/dev/new_layer_en.html">Write New Layers</a></li>
<spanid="design-doc-selected-rows"></span><h1>Design Doc: Selected Rows<aclass="headerlink"href="#design-doc-selected-rows"title="Permalink to this headline">¶</a></h1>
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a kind of sparse tensor data type, which is designed to support <codeclass="docutils literal"><spanclass="pre">embedding</span></code> operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:</p>
<p>The field <codeclass="docutils literal"><spanclass="pre">height_</span></code> shows the first dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. The <codeclass="docutils literal"><spanclass="pre">rows</span></code> are the indices of which rows of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> are non-zeros. The <codeclass="docutils literal"><spanclass="pre">value_</span></code> field is an N-dim tensor and shape is <codeclass="docutils literal"><spanclass="pre">[rows.size()</span><spanclass="pre">/*</span><spanclass="pre">NUM_ROWS</span><spanclass="pre">*/,</span><spanclass="pre">...]</span></code>, which supplies values for each row. The dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> satisfies <codeclass="docutils literal"><spanclass="pre">[height_]</span><spanclass="pre">+</span><spanclass="pre">value_.shape[1:]</span></code>.</p>
<p>Suppose that a SelectedRows-typed variable <codeclass="docutils literal"><spanclass="pre">x</span></code> has many rows, but only two of them have values – row 73 is <codeclass="docutils literal"><spanclass="pre">[1,</span><spanclass="pre">2]</span></code> and row 84 is <codeclass="docutils literal"><spanclass="pre">[3,</span><spanclass="pre">4]</span></code>, the <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> representation would be:</p>
<spanid="selectedrows-in-protobuf"></span><h2>SelectedRows in Protobuf<aclass="headerlink"href="#selectedrows-in-protobuf"title="Permalink to this headline">¶</a></h2>
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a kind of <codeclass="docutils literal"><spanclass="pre">Variable</span></code>. <codeclass="docutils literal"><spanclass="pre">VarDesc</span></code> in protobuf should describe the <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> information. Only the tensor dimension of a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> will be described in compile-time since the <codeclass="docutils literal"><spanclass="pre">rows_</span></code> and <codeclass="docutils literal"><spanclass="pre">value_</span></code> are related to training data.
So we use <codeclass="docutils literal"><spanclass="pre">TensorDesc</span></code> to unify <codeclass="docutils literal"><spanclass="pre">data_type</span></code> and <codeclass="docutils literal"><spanclass="pre">dims</span></code>. A LodTensorDesc contains a <codeclass="docutils literal"><spanclass="pre">TensorDesc</span></code> and <codeclass="docutils literal"><spanclass="pre">lod_level</span></code>. The description of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a Tensor description.</p>
<spanclass="k">repeated</span><spanclass="kt">int64</span><spanclass="na">dims</span><spanclass="o">=</span><spanclass="mi">2</span><spanclass="p">;</span><spanclass="c1">// [UNK, 640, 480] is saved as [-1, 640, 480]</span>
<spanid="infershape-for-selected-rows"></span><h2>InferShape for Selected Rows<aclass="headerlink"href="#infershape-for-selected-rows"title="Permalink to this headline">¶</a></h2>
<p>Just like <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information, <codeclass="docutils literal"><spanclass="pre">InferShape</span></code> method will inference output tensor type as well. The operator should decide whether its output is a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> or <codeclass="docutils literal"><spanclass="pre">Dense</span></code> tensor.</p>
<p>For example, the gradient operator of <codeclass="docutils literal"><spanclass="pre">TableLookup</span></code> will always generate <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. Its <codeclass="docutils literal"><spanclass="pre">InferShape</span></code> method should be like following</p>
<spanid="sparse-operators"></span><h2>Sparse Operators<aclass="headerlink"href="#sparse-operators"title="Permalink to this headline">¶</a></h2>
<p>There are several operators should be written to support <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. They are:</p>
<olclass="simple">
<li>Operators which generates <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> gradient. e.g. Gradient of <codeclass="docutils literal"><spanclass="pre">TableLookupOp</span></code>.</li>
<li>Optimize operators which support <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> gradient. e.g. <codeclass="docutils literal"><spanclass="pre">SGD</span></code> or <codeclass="docutils literal"><spanclass="pre">AdaGrad</span></code> for <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. However, there should be only one <codeclass="docutils literal"><spanclass="pre">SGD</span></code> operator. <codeclass="docutils literal"><spanclass="pre">OpWithKernel::Run</span></code> should select a suitable kernel for both <codeclass="docutils literal"><spanclass="pre">dense</span></code> tensor or <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>.</li>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
`SelectedRows` is a kind of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:
```cpp
class SelectedRows {
private:
vector<int> rows_;
Tensor value_;
int height_;
};
```
The field `height_` shows the first dimension of `SelectedRows`. The `rows` are the indices of which rows of `SelectedRows` are non-zeros. The `value_` field is an N-dim tensor and shape is `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be:
```
x = SelectedRow {
rows = [73, 84],
value = [[1, 2], [3,4]]
}
```
## SelectedRows in Protobuf
`SelectedRows` is a kind of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time since the `rows_` and `value_` are related to training data.
So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description.
```proto
message TensorDesc {
required DataType data_type = 1;
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
Just like `LoD` information, `InferShape` method will inference output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following
There are several operators should be written to support `SelectedRows`. They are:
1. Operators which generates `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`.