Currently, for each C++ operator class definition, there registers a *gradient operator creator* function, which takes a C++ operator instance and returns the corresponding gradient operator instance.
Currently, for each C++ operator class definition, a *gradient operator creator* function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.
However, we noticed two problems with the current deisgn:
However, we noticed two problems with the current design:
1. As we decided to separate the *compilation* and *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
1. As we decided to separate the *compilation* and the *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
1. Some operator's gradient computation requires more than one gradient operators. For example, the gradient of *minus* consists of two operators -- an identity operaotr and a scale operator. So we need to make the registration mechanism to support the mapping from an operator to a set of operators for gradient computation.
1. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of *minus* operator consists of two operators -- an *identity* operator followed by a *scale* operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.
## The Current Implementation
The C++ class `OpInfos` store in a association map which key is the operator type. The `grad_op_type` indicate associated gradient operator type. Operator can create gradient operator by `OpInfo::creator_` of gradient. The pseudo code is
Instances of the C++ class `OpInfo` are stored an associative map whose key is the operator type. The `grad_op_type` indicates the associated gradient operator type. An operator can create the gradient operator by invoking `OpInfo::creator_` of the gradient operator. The pseudo code is as follows
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for protobuf message `OpDesc` to manipulate `OpDesc` fast.
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for the protobuf message `OpDesc` for rapid manipulation of `OpDesc`.
The `GradOpDescMaker` will be registered in `OpInfo`, to replace `grad_op_type_` field. The `OpInfo` should be
The `GradOpDescMaker` will be registered in `OpInfo` and will replace the `grad_op_type_` field. The `OpInfo` should look like
```cpp
struct OpInfo {
...
...
@@ -49,7 +49,7 @@ struct OpInfo {
};
```
The `grad_op_maker_ ` is `nullptr` if the operator does not have associated gradient operators.
The `grad_op_maker_ ` is a `nullptr` if the operator does not have any associated gradient operators.
We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is
We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator.
We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
<spanid="design-doc-gradient-operators-registration"></span><h1>Design Doc: Gradient Operators Registration<aclass="headerlink"href="#design-doc-gradient-operators-registration"title="Permalink to this headline">¶</a></h1>
<divclass="section"id="the-problem-posed">
<spanid="the-problem-posed"></span><h2>The Problem Posed<aclass="headerlink"href="#the-problem-posed"title="Permalink to this headline">¶</a></h2>
<p>Currently, for each C++ operator class definition, there registers a <em>gradient operator creator</em> function, which takes a C++ operator instance and returns the corresponding gradient operator instance.</p>
<p>However, we noticed two problems with the current deisgn:</p>
<p>Currently, for each C++ operator class definition, a <em>gradient operator creator</em> function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.</p>
<p>However, we noticed two problems with the current design:</p>
<olclass="simple">
<li>As we decided to separate the <em>compilation</em> and <em>execution</em> phases, we need to change the creator to take an <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> protobuf message in a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> and inserts corresponding <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> messages into the <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> message.</li>
<li>Some operator’s gradient computation requires more than one gradient operators. For example, the gradient of <em>minus</em> consists of two operators – an identity operaotr and a scale operator. So we need to make the registration mechanism to support the mapping from an operator to a set of operators for gradient computation.</li>
<li>As we decided to separate the <em>compilation</em> and the <em>execution</em> phases, we need to change the creator to take an <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> protobuf message in a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> and inserts corresponding <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> messages into the <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> message.</li>
<li>For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of <em>minus</em> operator consists of two operators – an <em>identity</em> operator followed by a <em>scale</em> operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.</li>
<spanid="the-current-implementation"></span><h2>The Current Implementation<aclass="headerlink"href="#the-current-implementation"title="Permalink to this headline">¶</a></h2>
<p>The C++ class <codeclass="docutils literal"><spanclass="pre">OpInfos</span></code> store in a association map which key is the operator type. The <codeclass="docutils literal"><spanclass="pre">grad_op_type</span></code> indicate associated gradient operator type. Operator can create gradient operator by <codeclass="docutils literal"><spanclass="pre">OpInfo::creator_</span></code> of gradient. The pseudo code is</p>
<p>Instances of the C++ class <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code> are stored an associative map whose key is the operator type. The <codeclass="docutils literal"><spanclass="pre">grad_op_type</span></code> indicates the associated gradient operator type. An operator can create the gradient operator by invoking <codeclass="docutils literal"><spanclass="pre">OpInfo::creator_</span></code> of the gradient operator. The pseudo code is as follows</p>
<p>The function takes an <codeclass="docutils literal"><spanclass="pre">OpDescBind</span></code> of the forward operator and returns one or many gradient operator descriptions. <codeclass="docutils literal"><spanclass="pre">OpDescBind</span></code> is a C++ wrapper for protobuf message <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> to manipulate <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> fast.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">GradOpDescMaker</span></code> will be registered in <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code>, to replace <codeclass="docutils literal"><spanclass="pre">grad_op_type_</span></code> field. The <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code> should be</p>
<p>The function takes an <codeclass="docutils literal"><spanclass="pre">OpDescBind</span></code> of the forward operator and returns one or many gradient operator descriptions. <codeclass="docutils literal"><spanclass="pre">OpDescBind</span></code> is a C++ wrapper for the protobuf message <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> for rapid manipulation of <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code>.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">GradOpDescMaker</span></code> will be registered in <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code> and will replace the <codeclass="docutils literal"><spanclass="pre">grad_op_type_</span></code> field. The <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code> should look like</p>
<p>The <codeclass="docutils literal"><spanclass="pre">grad_op_maker_</span></code> is <codeclass="docutils literal"><spanclass="pre">nullptr</span></code> if the operator does not have associated gradient operators.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">grad_op_maker_</span></code> is a <codeclass="docutils literal"><spanclass="pre">nullptr</span></code> if the operator does not have any associated gradient operators.</p>
<p>We propose a base class called <codeclass="docutils literal"><spanclass="pre">GradOpDescMakerBase</span></code> to let operator developers generate <codeclass="docutils literal"><spanclass="pre">Gradient</span><spanclass="pre">Operators</span></code> easily. The public interface of that class is</p>
<p>We can write many helper functions since the <codeclass="docutils literal"><spanclass="pre">GradOpDescMakerBase</span></code> is a class now. The basic helper functions get the variables of <codeclass="docutils literal"><spanclass="pre">Input</span></code>, <codeclass="docutils literal"><spanclass="pre">Output</span></code>, <codeclass="docutils literal"><spanclass="pre">InputGradient</span></code> and <codeclass="docutils literal"><spanclass="pre">OutputGradient</span></code> in the forwarding operator.</p>
<p>We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So <codeclass="docutils literal"><spanclass="pre">REGISTER_OP</span></code> just register one operator. If the <codeclass="docutils literal"><spanclass="pre">REGISTER_OPERATOR</span></code> contains <codeclass="docutils literal"><spanclass="pre">OpProtoAndCheckerMaker</span></code> and <codeclass="docutils literal"><spanclass="pre">GradOpDescMaker</span></code>, we just list them in the same macro. It can be done by a macro contains <codeclass="docutils literal"><spanclass="pre">__VA_ARGS__</span></code>.</p>
<p>We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So <codeclass="docutils literal"><spanclass="pre">REGISTER_OP</span></code> just register one operator. If the <codeclass="docutils literal"><spanclass="pre">REGISTER_OPERATOR</span></code> contains <codeclass="docutils literal"><spanclass="pre">OpProtoAndCheckerMaker</span></code> and <codeclass="docutils literal"><spanclass="pre">GradOpDescMaker</span></code>, we just list them in the same macro. It can be done by a macro contains <codeclass="docutils literal"><spanclass="pre">__VA_ARGS__</span></code>.</p>
Currently, for each C++ operator class definition, there registers a *gradient operator creator* function, which takes a C++ operator instance and returns the corresponding gradient operator instance.
Currently, for each C++ operator class definition, a *gradient operator creator* function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.
However, we noticed two problems with the current deisgn:
However, we noticed two problems with the current design:
1. As we decided to separate the *compilation* and *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
1. As we decided to separate the *compilation* and the *execution* phases, we need to change the creator to take an `OpDesc` protobuf message in a `ProgramDesc` and inserts corresponding `OpDesc` messages into the `ProgramDesc` message.
1. Some operator's gradient computation requires more than one gradient operators. For example, the gradient of *minus* consists of two operators -- an identity operaotr and a scale operator. So we need to make the registration mechanism to support the mapping from an operator to a set of operators for gradient computation.
1. For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of *minus* operator consists of two operators -- an *identity* operator followed by a *scale* operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.
## The Current Implementation
The C++ class `OpInfos` store in a association map which key is the operator type. The `grad_op_type` indicate associated gradient operator type. Operator can create gradient operator by `OpInfo::creator_` of gradient. The pseudo code is
Instances of the C++ class `OpInfo` are stored an associative map whose key is the operator type. The `grad_op_type` indicates the associated gradient operator type. An operator can create the gradient operator by invoking `OpInfo::creator_` of the gradient operator. The pseudo code is as follows
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for protobuf message `OpDesc` to manipulate `OpDesc` fast.
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for the protobuf message `OpDesc` for rapid manipulation of `OpDesc`.
The `GradOpDescMaker` will be registered in `OpInfo`, to replace `grad_op_type_` field. The `OpInfo` should be
The `GradOpDescMaker` will be registered in `OpInfo` and will replace the `grad_op_type_` field. The `OpInfo` should look like
```cpp
struct OpInfo {
...
...
@@ -49,7 +49,7 @@ struct OpInfo {
};
```
The `grad_op_maker_ ` is `nullptr` if the operator does not have associated gradient operators.
The `grad_op_maker_ ` is a `nullptr` if the operator does not have any associated gradient operators.
We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is
We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator.
We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.
<spanid="the-problem-posed"></span><h2>The Problem Posed<aclass="headerlink"href="#the-problem-posed"title="永久链接至标题">¶</a></h2>
<p>Currently, for each C++ operator class definition, there registers a <em>gradient operator creator</em> function, which takes a C++ operator instance and returns the corresponding gradient operator instance.</p>
<p>However, we noticed two problems with the current deisgn:</p>
<p>Currently, for each C++ operator class definition, a <em>gradient operator creator</em> function is registered, which takes as input a C++ operator instance and returns the corresponding gradient operator instance.</p>
<p>However, we noticed two problems with the current design:</p>
<olclass="simple">
<li>As we decided to separate the <em>compilation</em> and <em>execution</em> phases, we need to change the creator to take an <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> protobuf message in a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> and inserts corresponding <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> messages into the <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> message.</li>
<li>Some operator’s gradient computation requires more than one gradient operators. For example, the gradient of <em>minus</em> consists of two operators – an identity operaotr and a scale operator. So we need to make the registration mechanism to support the mapping from an operator to a set of operators for gradient computation.</li>
<li>As we decided to separate the <em>compilation</em> and the <em>execution</em> phases, we need to change the creator to take an <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> protobuf message in a <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> and inserts corresponding <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> messages into the <codeclass="docutils literal"><spanclass="pre">ProgramDesc</span></code> message.</li>
<li>For some operators, the gradient computation can be written in terms of existing operators. For example, the gradient of <em>minus</em> operator consists of two operators – an <em>identity</em> operator followed by a <em>scale</em> operator. Hence the registration mechanism needs to support mapping from an operator to a set of operators for the gradient computation.</li>
<spanid="the-current-implementation"></span><h2>The Current Implementation<aclass="headerlink"href="#the-current-implementation"title="永久链接至标题">¶</a></h2>
<p>The C++ class <codeclass="docutils literal"><spanclass="pre">OpInfos</span></code> store in a association map which key is the operator type. The <codeclass="docutils literal"><spanclass="pre">grad_op_type</span></code> indicate associated gradient operator type. Operator can create gradient operator by <codeclass="docutils literal"><spanclass="pre">OpInfo::creator_</span></code> of gradient. The pseudo code is</p>
<p>Instances of the C++ class <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code> are stored an associative map whose key is the operator type. The <codeclass="docutils literal"><spanclass="pre">grad_op_type</span></code> indicates the associated gradient operator type. An operator can create the gradient operator by invoking <codeclass="docutils literal"><spanclass="pre">OpInfo::creator_</span></code> of the gradient operator. The pseudo code is as follows</p>
<p>The function takes an <codeclass="docutils literal"><spanclass="pre">OpDescBind</span></code> of the forward operator and returns one or many gradient operator descriptions. <codeclass="docutils literal"><spanclass="pre">OpDescBind</span></code> is a C++ wrapper for protobuf message <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> to manipulate <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> fast.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">GradOpDescMaker</span></code> will be registered in <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code>, to replace <codeclass="docutils literal"><spanclass="pre">grad_op_type_</span></code> field. The <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code> should be</p>
<p>The function takes an <codeclass="docutils literal"><spanclass="pre">OpDescBind</span></code> of the forward operator and returns one or many gradient operator descriptions. <codeclass="docutils literal"><spanclass="pre">OpDescBind</span></code> is a C++ wrapper for the protobuf message <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> for rapid manipulation of <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code>.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">GradOpDescMaker</span></code> will be registered in <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code> and will replace the <codeclass="docutils literal"><spanclass="pre">grad_op_type_</span></code> field. The <codeclass="docutils literal"><spanclass="pre">OpInfo</span></code> should look like</p>
<p>The <codeclass="docutils literal"><spanclass="pre">grad_op_maker_</span></code> is <codeclass="docutils literal"><spanclass="pre">nullptr</span></code> if the operator does not have associated gradient operators.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">grad_op_maker_</span></code> is a <codeclass="docutils literal"><spanclass="pre">nullptr</span></code> if the operator does not have any associated gradient operators.</p>
<p>We propose a base class called <codeclass="docutils literal"><spanclass="pre">GradOpDescMakerBase</span></code> to let operator developers generate <codeclass="docutils literal"><spanclass="pre">Gradient</span><spanclass="pre">Operators</span></code> easily. The public interface of that class is</p>
<p>We can write many helper functions since the <codeclass="docutils literal"><spanclass="pre">GradOpDescMakerBase</span></code> is a class now. The basic helper functions get the variables of <codeclass="docutils literal"><spanclass="pre">Input</span></code>, <codeclass="docutils literal"><spanclass="pre">Output</span></code>, <codeclass="docutils literal"><spanclass="pre">InputGradient</span></code> and <codeclass="docutils literal"><spanclass="pre">OutputGradient</span></code> in the forwarding operator.</p>
<p>We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So <codeclass="docutils literal"><spanclass="pre">REGISTER_OP</span></code> just register one operator. If the <codeclass="docutils literal"><spanclass="pre">REGISTER_OPERATOR</span></code> contains <codeclass="docutils literal"><spanclass="pre">OpProtoAndCheckerMaker</span></code> and <codeclass="docutils literal"><spanclass="pre">GradOpDescMaker</span></code>, we just list them in the same macro. It can be done by a macro contains <codeclass="docutils literal"><spanclass="pre">__VA_ARGS__</span></code>.</p>
<p>We should change register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So <codeclass="docutils literal"><spanclass="pre">REGISTER_OP</span></code> just register one operator. If the <codeclass="docutils literal"><spanclass="pre">REGISTER_OPERATOR</span></code> contains <codeclass="docutils literal"><spanclass="pre">OpProtoAndCheckerMaker</span></code> and <codeclass="docutils literal"><spanclass="pre">GradOpDescMaker</span></code>, we just list them in the same macro. It can be done by a macro contains <codeclass="docutils literal"><spanclass="pre">__VA_ARGS__</span></code>.</p>