`SelectedRows` is a kind of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:
`SelectedRows` is a type of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure:
```cpp
class SelectedRows {
...
...
@@ -11,7 +11,7 @@ class SelectedRows {
};
```
The field `height_` shows the first dimension of `SelectedRows`. The `rows` are the indices of which rows of `SelectedRows` are non-zeros. The `value_` field is an N-dim tensor and shape is `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
The field `height_` is the first dimension of `SelectedRows`. The `rows` are the indices of the non-zero rows of `SelectedRows`. The `value_` field is an N-dim tensor of shape `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be:
...
...
@@ -25,7 +25,7 @@ x = SelectedRow {
## SelectedRows in Protobuf
`SelectedRows` is a kind of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time since the `rows_` and `value_` are related to training data.
`SelectedRows` is a type of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time because the `rows_` and `value_` are dependent on the training data.
So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description.
```proto
...
...
@@ -54,7 +54,7 @@ message VarDesc {
## InferShape for Selected Rows
Just like `LoD` information, `InferShape` method will inference output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
Just like `LoD` information, `InferShape` method will infer the output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following
There are several operators should be written to support `SelectedRows`. They are:
There are several operators that need to be written to support `SelectedRows`. These are:
1. Operators which generates `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
1. Operators which generate `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`.
<spanid="design-doc-selected-rows"></span><h1>Design Doc: Selected Rows<aclass="headerlink"href="#design-doc-selected-rows"title="Permalink to this headline">¶</a></h1>
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a kind of sparse tensor data type, which is designed to support <codeclass="docutils literal"><spanclass="pre">embedding</span></code> operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:</p>
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a type of sparse tensor data type, which is designed to support <codeclass="docutils literal"><spanclass="pre">embedding</span></code> operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure:</p>
<p>The field <codeclass="docutils literal"><spanclass="pre">height_</span></code>shows the first dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. The <codeclass="docutils literal"><spanclass="pre">rows</span></code> are the indices of which rows of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> are non-zeros. The <codeclass="docutils literal"><spanclass="pre">value_</span></code> field is an N-dim tensor and shape is<codeclass="docutils literal"><spanclass="pre">[rows.size()</span><spanclass="pre">/*</span><spanclass="pre">NUM_ROWS</span><spanclass="pre">*/,</span><spanclass="pre">...]</span></code>, which supplies values for each row. The dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> satisfies <codeclass="docutils literal"><spanclass="pre">[height_]</span><spanclass="pre">+</span><spanclass="pre">value_.shape[1:]</span></code>.</p>
<p>The field <codeclass="docutils literal"><spanclass="pre">height_</span></code>is the first dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. The <codeclass="docutils literal"><spanclass="pre">rows</span></code> are the indices of the non-zero rows of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. The <codeclass="docutils literal"><spanclass="pre">value_</span></code> field is an N-dim tensor of shape<codeclass="docutils literal"><spanclass="pre">[rows.size()</span><spanclass="pre">/*</span><spanclass="pre">NUM_ROWS</span><spanclass="pre">*/,</span><spanclass="pre">...]</span></code>, which supplies values for each row. The dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> satisfies <codeclass="docutils literal"><spanclass="pre">[height_]</span><spanclass="pre">+</span><spanclass="pre">value_.shape[1:]</span></code>.</p>
<p>Suppose that a SelectedRows-typed variable <codeclass="docutils literal"><spanclass="pre">x</span></code> has many rows, but only two of them have values – row 73 is <codeclass="docutils literal"><spanclass="pre">[1,</span><spanclass="pre">2]</span></code> and row 84 is <codeclass="docutils literal"><spanclass="pre">[3,</span><spanclass="pre">4]</span></code>, the <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> representation would be:</p>
<spanid="selectedrows-in-protobuf"></span><h2>SelectedRows in Protobuf<aclass="headerlink"href="#selectedrows-in-protobuf"title="Permalink to this headline">¶</a></h2>
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a kind of <codeclass="docutils literal"><spanclass="pre">Variable</span></code>. <codeclass="docutils literal"><spanclass="pre">VarDesc</span></code> in protobuf should describe the <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> information. Only the tensor dimension of a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> will be described in compile-time since the <codeclass="docutils literal"><spanclass="pre">rows_</span></code> and <codeclass="docutils literal"><spanclass="pre">value_</span></code> are related to training data.
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a type of <codeclass="docutils literal"><spanclass="pre">Variable</span></code>. <codeclass="docutils literal"><spanclass="pre">VarDesc</span></code> in protobuf should describe the <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> information. Only the tensor dimension of a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> will be described in compile-time because the <codeclass="docutils literal"><spanclass="pre">rows_</span></code> and <codeclass="docutils literal"><spanclass="pre">value_</span></code> are dependent on the training data.
So we use <codeclass="docutils literal"><spanclass="pre">TensorDesc</span></code> to unify <codeclass="docutils literal"><spanclass="pre">data_type</span></code> and <codeclass="docutils literal"><spanclass="pre">dims</span></code>. A LodTensorDesc contains a <codeclass="docutils literal"><spanclass="pre">TensorDesc</span></code> and <codeclass="docutils literal"><spanclass="pre">lod_level</span></code>. The description of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a Tensor description.</p>
<spanid="infershape-for-selected-rows"></span><h2>InferShape for Selected Rows<aclass="headerlink"href="#infershape-for-selected-rows"title="Permalink to this headline">¶</a></h2>
<p>Just like <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information, <codeclass="docutils literal"><spanclass="pre">InferShape</span></code> method will inference output tensor type as well. The operator should decide whether its output is a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> or <codeclass="docutils literal"><spanclass="pre">Dense</span></code> tensor.</p>
<p>Just like <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information, <codeclass="docutils literal"><spanclass="pre">InferShape</span></code> method will infer the output tensor type as well. The operator should decide whether its output is a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> or <codeclass="docutils literal"><spanclass="pre">Dense</span></code> tensor.</p>
<p>For example, the gradient operator of <codeclass="docutils literal"><spanclass="pre">TableLookup</span></code> will always generate <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. Its <codeclass="docutils literal"><spanclass="pre">InferShape</span></code> method should be like following</p>
@@ -237,9 +237,9 @@ So we use <code class="docutils literal"><span class="pre">TensorDesc</span></co
</div>
<divclass="section"id="sparse-operators">
<spanid="sparse-operators"></span><h2>Sparse Operators<aclass="headerlink"href="#sparse-operators"title="Permalink to this headline">¶</a></h2>
<p>There are several operators should be written to support <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. They are:</p>
<p>There are several operators that need to be written to support <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. These are:</p>
<olclass="simple">
<li>Operators which generates<codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> gradient. e.g. Gradient of <codeclass="docutils literal"><spanclass="pre">TableLookupOp</span></code>.</li>
<li>Operators which generate <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> gradient. e.g. Gradient of <codeclass="docutils literal"><spanclass="pre">TableLookupOp</span></code>.</li>
<li>Optimize operators which support <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> gradient. e.g. <codeclass="docutils literal"><spanclass="pre">SGD</span></code> or <codeclass="docutils literal"><spanclass="pre">AdaGrad</span></code> for <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. However, there should be only one <codeclass="docutils literal"><spanclass="pre">SGD</span></code> operator. <codeclass="docutils literal"><spanclass="pre">OpWithKernel::Run</span></code> should select a suitable kernel for both <codeclass="docutils literal"><spanclass="pre">dense</span></code> tensor or <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>.</li>
`SelectedRows` is a kind of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:
`SelectedRows` is a type of sparse tensor data type, which is designed to support `embedding` operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure:
```cpp
class SelectedRows {
...
...
@@ -11,7 +11,7 @@ class SelectedRows {
};
```
The field `height_` shows the first dimension of `SelectedRows`. The `rows` are the indices of which rows of `SelectedRows` are non-zeros. The `value_` field is an N-dim tensor and shape is `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
The field `height_` is the first dimension of `SelectedRows`. The `rows` are the indices of the non-zero rows of `SelectedRows`. The `value_` field is an N-dim tensor of shape `[rows.size() /* NUM_ROWS */, ...]`, which supplies values for each row. The dimension of `SelectedRows` satisfies `[height_] + value_.shape[1:]`.
Suppose that a SelectedRows-typed variable `x` has many rows, but only two of them have values -- row 73 is `[1, 2]` and row 84 is `[3, 4]`, the `SelectedRows` representation would be:
...
...
@@ -25,7 +25,7 @@ x = SelectedRow {
## SelectedRows in Protobuf
`SelectedRows` is a kind of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time since the `rows_` and `value_` are related to training data.
`SelectedRows` is a type of `Variable`. `VarDesc` in protobuf should describe the `SelectedRows` information. Only the tensor dimension of a `SelectedRows` will be described in compile-time because the `rows_` and `value_` are dependent on the training data.
So we use `TensorDesc` to unify `data_type` and `dims`. A LodTensorDesc contains a `TensorDesc` and `lod_level`. The description of `SelectedRows` is a Tensor description.
```proto
...
...
@@ -54,7 +54,7 @@ message VarDesc {
## InferShape for Selected Rows
Just like `LoD` information, `InferShape` method will inference output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
Just like `LoD` information, `InferShape` method will infer the output tensor type as well. The operator should decide whether its output is a `SelectedRows` or `Dense` tensor.
For example, the gradient operator of `TableLookup` will always generate `SelectedRows`. Its `InferShape` method should be like following
There are several operators should be written to support `SelectedRows`. They are:
There are several operators that need to be written to support `SelectedRows`. These are:
1. Operators which generates `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
1. Operators which generate `SelectedRows` gradient. e.g. Gradient of `TableLookupOp`.
2. Optimize operators which support `SelectedRows` gradient. e.g. `SGD` or `AdaGrad` for `SelectedRows`. However, there should be only one `SGD` operator. `OpWithKernel::Run` should select a suitable kernel for both `dense` tensor or `SelectedRows`.
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a kind of sparse tensor data type, which is designed to support <codeclass="docutils literal"><spanclass="pre">embedding</span></code> operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in that tensor. It is straightforward to represent the sparse tensor by the following sparse tensor data structure:</p>
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a type of sparse tensor data type, which is designed to support <codeclass="docutils literal"><spanclass="pre">embedding</span></code> operators. The gradient of embedding table is a sparse tensor. Only a few rows are non-zero values in this tensor. It is straight-forward to represent a sparse tensor by the following sparse tensor data structure:</p>
<p>The field <codeclass="docutils literal"><spanclass="pre">height_</span></code>shows the first dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. The <codeclass="docutils literal"><spanclass="pre">rows</span></code> are the indices of which rows of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> are non-zeros. The <codeclass="docutils literal"><spanclass="pre">value_</span></code> field is an N-dim tensor and shape is<codeclass="docutils literal"><spanclass="pre">[rows.size()</span><spanclass="pre">/*</span><spanclass="pre">NUM_ROWS</span><spanclass="pre">*/,</span><spanclass="pre">...]</span></code>, which supplies values for each row. The dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> satisfies <codeclass="docutils literal"><spanclass="pre">[height_]</span><spanclass="pre">+</span><spanclass="pre">value_.shape[1:]</span></code>.</p>
<p>The field <codeclass="docutils literal"><spanclass="pre">height_</span></code>is the first dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. The <codeclass="docutils literal"><spanclass="pre">rows</span></code> are the indices of the non-zero rows of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. The <codeclass="docutils literal"><spanclass="pre">value_</span></code> field is an N-dim tensor of shape<codeclass="docutils literal"><spanclass="pre">[rows.size()</span><spanclass="pre">/*</span><spanclass="pre">NUM_ROWS</span><spanclass="pre">*/,</span><spanclass="pre">...]</span></code>, which supplies values for each row. The dimension of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> satisfies <codeclass="docutils literal"><spanclass="pre">[height_]</span><spanclass="pre">+</span><spanclass="pre">value_.shape[1:]</span></code>.</p>
<p>Suppose that a SelectedRows-typed variable <codeclass="docutils literal"><spanclass="pre">x</span></code> has many rows, but only two of them have values – row 73 is <codeclass="docutils literal"><spanclass="pre">[1,</span><spanclass="pre">2]</span></code> and row 84 is <codeclass="docutils literal"><spanclass="pre">[3,</span><spanclass="pre">4]</span></code>, the <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> representation would be:</p>
<spanid="selectedrows-in-protobuf"></span><h2>SelectedRows in Protobuf<aclass="headerlink"href="#selectedrows-in-protobuf"title="永久链接至标题">¶</a></h2>
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a kind of <codeclass="docutils literal"><spanclass="pre">Variable</span></code>. <codeclass="docutils literal"><spanclass="pre">VarDesc</span></code> in protobuf should describe the <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> information. Only the tensor dimension of a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> will be described in compile-time since the <codeclass="docutils literal"><spanclass="pre">rows_</span></code> and <codeclass="docutils literal"><spanclass="pre">value_</span></code> are related to training data.
<p><codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a type of <codeclass="docutils literal"><spanclass="pre">Variable</span></code>. <codeclass="docutils literal"><spanclass="pre">VarDesc</span></code> in protobuf should describe the <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> information. Only the tensor dimension of a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> will be described in compile-time because the <codeclass="docutils literal"><spanclass="pre">rows_</span></code> and <codeclass="docutils literal"><spanclass="pre">value_</span></code> are dependent on the training data.
So we use <codeclass="docutils literal"><spanclass="pre">TensorDesc</span></code> to unify <codeclass="docutils literal"><spanclass="pre">data_type</span></code> and <codeclass="docutils literal"><spanclass="pre">dims</span></code>. A LodTensorDesc contains a <codeclass="docutils literal"><spanclass="pre">TensorDesc</span></code> and <codeclass="docutils literal"><spanclass="pre">lod_level</span></code>. The description of <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> is a Tensor description.</p>
<spanid="infershape-for-selected-rows"></span><h2>InferShape for Selected Rows<aclass="headerlink"href="#infershape-for-selected-rows"title="永久链接至标题">¶</a></h2>
<p>Just like <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information, <codeclass="docutils literal"><spanclass="pre">InferShape</span></code> method will inference output tensor type as well. The operator should decide whether its output is a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> or <codeclass="docutils literal"><spanclass="pre">Dense</span></code> tensor.</p>
<p>Just like <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information, <codeclass="docutils literal"><spanclass="pre">InferShape</span></code> method will infer the output tensor type as well. The operator should decide whether its output is a <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> or <codeclass="docutils literal"><spanclass="pre">Dense</span></code> tensor.</p>
<p>For example, the gradient operator of <codeclass="docutils literal"><spanclass="pre">TableLookup</span></code> will always generate <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. Its <codeclass="docutils literal"><spanclass="pre">InferShape</span></code> method should be like following</p>
<p>There are several operators should be written to support <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. They are:</p>
<p>There are several operators that need to be written to support <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. These are:</p>
<olclass="simple">
<li>Operators which generates<codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> gradient. e.g. Gradient of <codeclass="docutils literal"><spanclass="pre">TableLookupOp</span></code>.</li>
<li>Operators which generate <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> gradient. e.g. Gradient of <codeclass="docutils literal"><spanclass="pre">TableLookupOp</span></code>.</li>
<li>Optimize operators which support <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code> gradient. e.g. <codeclass="docutils literal"><spanclass="pre">SGD</span></code> or <codeclass="docutils literal"><spanclass="pre">AdaGrad</span></code> for <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>. However, there should be only one <codeclass="docutils literal"><spanclass="pre">SGD</span></code> operator. <codeclass="docutils literal"><spanclass="pre">OpWithKernel::Run</span></code> should select a suitable kernel for both <codeclass="docutils literal"><spanclass="pre">dense</span></code> tensor or <codeclass="docutils literal"><spanclass="pre">SelectedRows</span></code>.</li>