index.html 32.4 KB
Newer Older
Y
Yu Yang 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>Writing New Layers &mdash; PaddlePaddle  documentation</title>
    
    <link rel="stylesheet" href="../../_static/classic.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../../',
        VERSION:     '',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../../_static/jquery.js"></script>
    <script type="text/javascript" src="../../_static/underscore.js"></script>
    <script type="text/javascript" src="../../_static/doctools.js"></script>
    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <link rel="top" title="PaddlePaddle  documentation" href="../../index.html" />
    <link rel="up" title="Examples and demos" href="../index.html" />
    <link rel="next" title="Cluster Train" href="../../cluster/index.html" />
    <link rel="prev" title="Chinese Word Embedding Model Tutorial" href="../embedding_model/index.html" /> 
  </head>
  <body role="document">
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="../../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="../../cluster/index.html" title="Cluster Train"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="../embedding_model/index.html" title="Chinese Word Embedding Model Tutorial"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../../index.html">PaddlePaddle  documentation</a> &raquo;</li>
          <li class="nav-item nav-item-1"><a href="../index.html" accesskey="U">Examples and demos</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="writing-new-layers">
<h1>Writing New Layers<a class="headerlink" href="#writing-new-layers" title="Permalink to this headline"></a></h1>
<p>This tutorial will guide you to write customized layers in PaddlePaddle. We will utilize fully connected layer as an example to guide you through the following steps for writing a new layer.</p>
<ul class="simple">
<li>Derive equations for the forward and backward part of the layer.</li>
<li>Implement C++ class for the layer.</li>
<li>Write gradient check unit test to make sure the gradients are correctly computed.</li>
<li>Implement Python wrapper for the layer.</li>
</ul>
<div class="section" id="derive-equations">
<h2>Derive Equations<a class="headerlink" href="#derive-equations" title="Permalink to this headline"></a></h2>
<p>First we need to derive equations of the <em>forward</em> and <em>backward</em> part of the layer. The forward part computes the output given an input. The backward part computes the gradients of the input and the parameters given the the gradients of the output.</p>
<p>The illustration of a fully connected layer is shown in the following figure. In a fully connected layer, all output nodes are connected to all the input nodes.</p>
<img alt="../../_images/FullyConnected.jpg" src="../../_images/FullyConnected.jpg" />
<p>The <em>forward part</em> of a layer transforms an input into the corresponding output.
Fully connected layer takes a dense input vector with dimension <span class="math">\(D_i\)</span>. It uses a transformation matrix <span class="math">\(W\)</span> with size <span class="math">\(D_i \times D_o\)</span> to project x into a <span class="math">\(D_o\)</span> dimensional vector, and add a bias vector  <span class="math">\(b\)</span> with dimension <span class="math">\(D_o\)</span> to the vector.</p>
<div class="math">
\[y = f(W^T x + b)\]</div>
<p>where <span class="math">\(f(.)\)</span> is an nonlinear <em>activation</em> function, such as sigmoid, tanh, and Relu.</p>
<p>The transformation matrix <span class="math">\(W\)</span> and bias vector <span class="math">\(b\)</span> are the <em>parameters</em> of the layer. The <em>parameters</em> of a layer are learned during training in the <em>backward pass</em>. The backward pass computes the gradients of the output function with respect to all parameters and inputs. The optimizer can use chain rule to compute the gradients of the loss function with respect to each parameter. Suppose our loss function is <span class="math">\(c(y)\)</span>, then</p>
<div class="math">
\[\frac{\partial c(y)}{\partial x} = \frac{\partial c(y)}{\partial y} \frac{\partial y}{\partial x}\]</div>
<p>Suppose <span class="math">\(z = f(W^T x + b)\)</span>, then</p>
<div class="math">
\[\frac{\partial y}{\partial z} = \frac{\partial f(z)}{\partial z}\]</div>
<p>This derivative can be automatically computed by our base layer class.</p>
<p>Then, for fully connected layer, we need to compute <span class="math">\(\frac{\partial z}{\partial x}\)</span>, and <span class="math">\(\frac{\partial z}{\partial W}\)</span>, and <span class="math">\(\frac{\partial z}{\partial b}\)</span>
.</p>
<div class="math">
\[\begin{split}\frac{\partial z}{\partial x} = W \\
\frac{\partial z_j}{\partial W_{ij}} = x_i \\
\frac{\partial z}{\partial b} = \mathbf 1 \\\end{split}\]</div>
<p>where .. math::<cite>mathbf 1</cite> is an all one vector, .. math::<cite>W_{ij}</cite> is the number at the i-th row and j-th column of the matrix .. math::<cite>W</cite>, .. math::<cite>z_j</cite> is the j-th component of the vector .. math::<cite>z</cite>, and .. math::<cite>x_i</cite> is the i-th component of the vector .. math::<cite>x</cite>.</p>
<p>Then we can use chain rule to calculate .. math::<cite>frac{partial z}{partial x}</cite>, and .. math::<cite>frac{partial z}{partial W}</cite>. The details of the computation will be given in the next section.</p>
</div>
<div class="section" id="implement-c-class">
<h2>Implement C++ Class<a class="headerlink" href="#implement-c-class" title="Permalink to this headline"></a></h2>
<p>The C++ class of the layer implements the initialization, forward, and backward part of the layer. The fully connected layer is at <cite>paddle/gserver/layers/FullyConnectedLayer.h</cite> and <cite>paddle/gserver/layers/FullyConnectedLayer.cpp</cite>. We list simplified version of the code below.</p>
<p>It needs to derive the base class <cite>paddle::BaseLayer</cite>, and it needs to override the following functions:</p>
<ul class="simple">
<li>constructor and destructor.</li>
<li><cite>init</cite> function. It is used to initialize the parameters and settings.</li>
<li><cite>forward</cite>. It implements the forward part of the layer.</li>
<li><cite>backward</cite>. It implements the backward part of the layer.</li>
<li><cite>prefetch</cite>. It is utilized to determine the rows corresponding parameter matrix to prefetch from parameter server. You do not need to override this function if your layer does not need remote sparse update. (most layers do not need to support remote sparse update)</li>
</ul>
<p>The header file is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>namespace paddle {
/**
 * A layer has full connections to all neurons in the previous layer.
 * It computes an inner product with a set of learned weights, and
 * (optionally) adds biases.
 *
 * The config file api is fc_layer.
 */

class FullyConnectedLayer : public Layer {
protected:
  WeightList weights_;
  std::unique_ptr&lt;Weight&gt; biases_;

public:
  explicit FullyConnectedLayer(const LayerConfig&amp; config)
      : Layer(config) {}
  ~FullyConnectedLayer() {}

  bool init(const LayerMap&amp; layerMap, const ParameterMap&amp; parameterMap);

  Weight&amp; getWeight(int idx) { return *weights_[idx]; }

  void prefetch();
  void forward(PassType passType);
  void backward(const UpdateCallback&amp; callback = nullptr);
};
}  // namespace paddle
</pre></div>
</div>
<p>It defines the parameters as class variables. We use <cite>Weight</cite> class as abstraction of parameters. It supports multi-thread update. The details of this class will be described in details in the implementations.
- <cite>weights_</cite> is a list of weights for the transformation matrices. The current implementation can have more than one inputs. Thus, it has a list of weights. One weight corresponds to an input.
- <cite>biases_</cite> is a weight for the bias vector.</p>
<p>The fully connected layer does not have layer configuration hyper-parameters. If there are some layer hyper-parameters, a common practice is to store it in <cite>LayerConfig&amp; config</cite>, and put it into a class variable in the constructor.</p>
<p>The following code snippet implements the <cite>init</cite> function.
- First, every <cite>init</cite> function must call the <cite>init</cite> function of the base class <cite>Layer::init(layerMap, parameterMap);</cite>. This statement will initialize the required variables and connections for each layer.
- The it initializes all the weights matrices <span class="math">\(W\)</span>. The current implementation can have more than one inputs. Thus, it has a list of weights.
- Finally, it initializes the bias.</p>
<p>The code is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>bool FullyConnectedLayer::init(const LayerMap&amp; layerMap,
                               const ParameterMap&amp; parameterMap) {
  /* Initialize the basic parent class */
  Layer::init(layerMap, parameterMap);

  /* initialize the weightList */
  CHECK(inputLayers_.size() == parameters_.size());
  for (size_t i = 0; i &lt; inputLayers_.size(); i++) {
    // Option the parameters
    size_t height = inputLayers_[i]-&gt;getSize();
    size_t width = getSize();

    // create a new weight
    if (parameters_[i]-&gt;isSparse()) {
      CHECK_LE(parameters_[i]-&gt;getSize(), width * height);
    } else {
      CHECK_EQ(parameters_[i]-&gt;getSize(), width * height);
    }
    Weight* w = new Weight(height, width, parameters_[i]);

    // append the new weight to the list
    weights_.emplace_back(w);
  }

  /* initialize biases_ */
  if (biasParameter_.get() != NULL) {
    biases_ = std::unique_ptr&lt;Weight&gt;(new Weight(1, getSize(), biasParameter_));
  }

  return true;
}
</pre></div>
</div>
<p>The implementation of the forward part has the following steps.
- Every layer must call <cite>Layer::forward(passType);</cite> at the beginning of its <cite>forward</cite> function.
- Then it allocates memory for the output using <cite>reserveOutput(batchSize, size);</cite>. This step is necessary because we support the batches to have different batch sizes. <cite>reserveOutput</cite> will change the size of the output accordingly. For the sake of efficiency, we will allocate new memory if we want to expand the matrix, but we will reuse the existing memory block if we want to shrink the matrix.
- Then it computes <span class="math">\(\sum_i W_i x + b\)</span> using Matrix operations. <cite>getInput(i).value</cite> retrieve the matrix of the i-th input. Each input is a <span class="math">\(batchSize \times dim\)</span> matrix, where each row represents an single input in a batch. For a complete lists of supported matrix operations, please refer to <cite>paddle/math/Matrix.h</cite> and <cite>paddle/math/BaseMatrix.h</cite>.
- Finally it applies the activation function using <cite>forwardActivation();</cite>. It will automatically applies the corresponding activation function specifies in the network configuration.</p>
<p>The code is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>void FullyConnectedLayer::forward(PassType passType) {
  Layer::forward(passType);

  /* malloc memory for the output_ if necessary */
  int batchSize = getInput(0).getBatchSize();
  int size = getSize();

  {
    // Settup the size of the output.
    reserveOutput(batchSize, size);
  }

  MatrixPtr outV = getOutputValue();

  // Apply the the transformation matrix to each input.
  for (size_t i = 0; i != inputLayers_.size(); ++i) {
    auto input = getInput(i);
    CHECK(input.value) &lt;&lt; &quot;The input of &#39;fc&#39; layer must be matrix&quot;;
    i == 0 ? outV-&gt;mul(input.value, weights_[i]-&gt;getW(), 1, 0)
           : outV-&gt;mul(input.value, weights_[i]-&gt;getW(), 1, 1);
  }

  /* add the bias-vector */
  if (biases_.get() != NULL) {
    outV-&gt;addBias(*(biases_-&gt;getW()), 1);
  }

  /* activation */ {
    forwardActivation();
  }
}
</pre></div>
</div>
<p>The implementation of the backward part has the following steps.
- ` backwardActivation();` computes the gradients of the activation. The gradients will be multiplies in place to the gradients of the output, which can be retrieved using <cite>getOutputGrad()</cite>.
- Compute the gradients of bias. Notice that we an use <cite>biases_-&gt;getWGrad()</cite> to get the gradient matrix of the corresponding parameter. After the gradient of one parameter is updated, it <em>MUST</em> call <cite>getParameterPtr()-&gt;incUpdate(callback);</cite>. This is utilize for parameter update over multiple threads or multiple machines.
- Then it computes the gradients of the transformation matrices and inputs, and it calls <cite>incUpdate</cite> for the corresponding parameter. This gives the framework the chance to know whether it has gathered all the gradient to one parameter so that it can do some overlapping work (e.g., network communication)</p>
<p>The code is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>void FullyConnectedLayer::backward(const UpdateCallback&amp; callback) {
  /* Do derivation for activations.*/ {
    backwardActivation();
  }

  if (biases_ &amp;&amp; biases_-&gt;getWGrad()) {
    biases_-&gt;getWGrad()-&gt;collectBias(*getOutputGrad(), 1);

    /* Increasing the number of gradient */
    biases_-&gt;getParameterPtr()-&gt;incUpdate(callback);
  }

  bool syncFlag = hl_get_sync_flag();

  for (size_t i = 0; i != inputLayers_.size(); ++i) {
    /* Calculate the W-gradient for the current layer */
    if (weights_[i]-&gt;getWGrad()) {
      MatrixPtr input_T = getInputValue(i)-&gt;getTranspose();
      MatrixPtr oGrad = getOutputGrad();
      {
        weights_[i]-&gt;getWGrad()-&gt;mul(input_T, oGrad, 1, 1);
      }
    }


    /* Calculate the input layers error */
    MatrixPtr preGrad = getInputGrad(i);
    if (NULL != preGrad) {
      MatrixPtr weights_T = weights_[i]-&gt;getW()-&gt;getTranspose();
      preGrad-&gt;mul(getOutputGrad(), weights_T, 1, 1);
    }

    {
      weights_[i]-&gt;getParameterPtr()-&gt;incUpdate(callback);
    }
  }
}
</pre></div>
</div>
<p>The <cite>prefetch</cite> function specifies the rows that need to be fetched from parameter server during training. It is only useful for remote sparse training. In remote sparse training, the full parameter matrix is stored distributedly at the parameter server. When the layer uses a batch for training, only a subset of locations of the input is non-zero in this batch. Thus, this layer only needs the rows of the transformation matrix corresponding to the locations of these non-zero entries. The <cite>prefetch</cite> function specifies the ids of these rows.</p>
<p>Most of the layers do not need remote sparse training function. You do not need to override this function in this case:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>void FullyConnectedLayer::prefetch() {
  for (size_t i = 0; i != inputLayers_.size(); ++i) {
    auto* sparseParam =
        dynamic_cast&lt;SparsePrefetchRowCpuMatrix*&gt;(weights_[i]-&gt;getW().get());
    if (sparseParam) {
      MatrixPtr input = getInputValue(i);
      sparseParam-&gt;addRows(input);
    }
  }
}
</pre></div>
</div>
<p>Finally, you can use <cite>REGISTER_LAYER(fc, FullyConnectedLayer);</cite> to register the layer. <cite>fc</cite> is the identifier of the layer, and <cite>FullyConnectedLayer</cite> is the class name of the layer:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>namespace paddle {
REGISTER_LAYER(fc, FullyConnectedLayer);
}
</pre></div>
</div>
<p>If the <cite>cpp</cite> file is put into <cite>paddle/gserver/layers</cite>, it will be automatically added to the compilation list.</p>
</div>
<div class="section" id="write-gradient-check-unit-test">
<h2>Write Gradient Check Unit Test<a class="headerlink" href="#write-gradient-check-unit-test" title="Permalink to this headline"></a></h2>
<p>An easy way to verify the correctness of new layer&#8217;s implementation is to write a gradient check unit test. Gradient check unit test utilizes finite difference method to verify the gradient of a layer. It modifies the input with a small perturbation <span class="math">\(\Delta x\)</span> and observes the changes of output <span class="math">\(\Delta y\)</span>, the gradient can be computed as <span class="math">\(\frac{\Delta y}{\Delta x }\)</span>. This gradient can be compared with the gradient computed by the <cite>backward</cite> function of the layer to ensure the correctness of the gradient computation. Notice that the gradient check only tests the correctness of the gradient computation, it does not necessarily guarantee the correctness of the implementation of the <cite>forward</cite> and <cite>backward</cite> function. You need to write more sophisticated unit tests to make sure your layer is implemented correctly.</p>
<p>All the gradient check unit tests are located in <cite>paddle/gserver/tests/test_LayerGrad.cpp</cite>. You are recommended to put your test into a new test file if you are planning to write a new layer. The gradient test of the gradient check unit test of the fully connected layer is listed below. It has the following steps.</p>
<ul>
<li><p class="first">Create layer configuration. A layer configuration can include the following attributes:</p>
<blockquote>
<div><ul class="simple">
<li>size of the bias parameter. (4096 in our example)</li>
<li>type of the layer. (fc in our example)</li>
<li>size of the layer. (4096 in our example)</li>
<li>activation type. (softmax in our example)</li>
<li>dropout rate. (0.1 in our example)</li>
</ul>
</div></blockquote>
</li>
<li><p class="first">configure the input of the layer. In our example, we have only one input.</p>
<blockquote>
<div><blockquote>
<div><ul>
<li><p class="first">type of the input (<cite>INPUT_DATA</cite>) in our example. It can be one of the following types</p>
<blockquote>
<div><ul class="simple">
<li><cite>INPUT_DATA</cite>: dense vector.</li>
<li><cite>INPUT_LABEL</cite>: integer.</li>
<li><cite>INPUT_DATA_TARGET</cite>: dense vector, but it does not used to compute gradient.</li>
<li><cite>INPUT_SEQUENCE_DATA</cite>: dense vector with sequence information.</li>
<li><cite>INPUT_HASSUB_SEQUENCE_DATA</cite>: dense vector with both sequence and sub-sequence information.</li>
<li><cite>INPUT_SEQUENCE_LABEL</cite>: integer with sequence information.</li>
<li><cite>INPUT_SPARSE_NON_VALUE_DATA</cite>: 0-1 sparse data.</li>
<li><cite>INPUT_SPARSE_FLOAT_VALUE_DATA</cite>: float sparse data.</li>
</ul>
</div></blockquote>
</li>
</ul>
</div></blockquote>
<ul class="simple">
<li>name of the input. (<cite>layer_0</cite> in our example)</li>
<li>size of the input. (8192 in our example)</li>
<li>number of non-zeros, only useful for sparse inputs.</li>
<li>format of sparse data, only useful for sparse inputs.</li>
</ul>
</div></blockquote>
</li>
<li><p class="first">each inputs needs to call <cite>config.layerConfig.add_inputs();</cite> once.</p>
</li>
<li><p class="first">call <cite>testLayerGrad</cite> to perform gradient checks. It has the following arguments.</p>
<blockquote>
<div><ul class="simple">
<li>layer and input configurations. (<cite>config</cite> in our example)</li>
<li>type of the input. (<cite>fc</cite> in our example)</li>
<li>batch size of the gradient check. (100 in our example)</li>
<li>whether the input is transpose. Most layers need to set it to <cite>false</cite>. (<cite>false</cite> in our example)</li>
<li>whether to use weights. Some layers or activations perform normalization so that the sum of their output is a constant. For example, the sum of output of a softmax activation is one. In this case, we cannot correctly compute the gradients using regular gradient check techniques. A weighted sum of the output, which is not a constant, is utilized to compute the gradients. (<cite>true</cite> in our example, because the activation of a fully connected layer can be softmax)</li>
</ul>
</div></blockquote>
</li>
</ul>
<p>The code is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>void testFcLayer(string format, size_t nnz) {
  // Create layer configuration.
  TestConfig config;
  config.biasSize = 4096;
  config.layerConfig.set_type(&quot;fc&quot;);
  config.layerConfig.set_size(4096);
  config.layerConfig.set_active_type(&quot;sigmoid&quot;);
  config.layerConfig.set_drop_rate(0.1);
  // Setup inputs.
  config.inputDefs.push_back(
      {INPUT_DATA, &quot;layer_0&quot;, 8192, nnz, ParaSparse(format)});
          config.layerConfig.add_inputs();
  LOG(INFO) &lt;&lt; config.inputDefs[0].sparse.sparse &lt;&lt; &quot; &quot;
            &lt;&lt; config.inputDefs[0].sparse.format;
  for (auto useGpu : {false, true}) {
    testLayerGrad(config, &quot;fc&quot;, 100, /* trans */ false, useGpu,
                  /* weight */ true);
  }
}
</pre></div>
</div>
<p>If you are creating a new file for the test, such as <cite>paddle/gserver/tests/testFCGrad.cpp</cite>, you need to add the file to <cite>paddle/gserver/tests/CMakeLists.txt</cite>. An example is given below. All the unit tests will run when you execute the command <cite>make tests</cite>. Notice that some layers might need high accuracy for the gradient check unit tests to work well. You need to configure <cite>WITH_DOUBLE</cite> to <cite>ON</cite> when configuring cmake.</p>
<p>The code is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span>add_unittest_without_exec(test_FCGrad
    test_FCGrad.cpp
    LayerGradUtil.cpp
    TestUtil.cpp)

add_test(NAME test_FCGrad
    COMMAND test_FCGrad)
</pre></div>
</div>
</div>
<div class="section" id="implement-python-wrapper">
<h2>Implement Python Wrapper<a class="headerlink" href="#implement-python-wrapper" title="Permalink to this headline"></a></h2>
<p>Implementing Python wrapper allows us to use the added layer in configuration files. All the Python wrappers are in file <cite>python/paddle/trainer/config_parser.py</cite>. An example of the Python wrapper for fully connected layer is listed below. It has the following steps:</p>
<ul>
<li><p class="first">Use <cite>&#64;config_layer(&#8216;fc’)</cite> at the decorator for all the Python wrapper class. <cite>fc</cite> is the identifier of the layer.</p>
</li>
<li><p class="first">Implements <cite>__init__</cite> constructor function.</p>
<blockquote>
<div><ul class="simple">
<li>It first call  <cite>super(FCLayer, self).__init__(name, &#8216;fc&#8217;, size, inputs=inputs, **xargs)</cite> base constructor function. <cite>FCLayer</cite> is the Python wrapper class name, and <cite>fc</cite> is the layer identifier name. They must be correct in order for the wrapper to work.</li>
<li>Then it computes the size and format (whether sparse) of each transformation matrix as well as the size.</li>
</ul>
</div></blockquote>
</li>
</ul>
<p>The code is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nd">@config_layer</span><span class="p">(</span><span class="s1">&#39;fc&#39;</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">FCLayer</span><span class="p">(</span><span class="n">LayerBase</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span>
            <span class="bp">self</span><span class="p">,</span>
            <span class="n">name</span><span class="p">,</span>
            <span class="n">size</span><span class="p">,</span>
            <span class="n">inputs</span><span class="p">,</span>
            <span class="n">bias</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
            <span class="o">**</span><span class="n">xargs</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">FCLayer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">__init__</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="s1">&#39;fc&#39;</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">inputs</span><span class="o">=</span><span class="n">inputs</span><span class="p">,</span> <span class="o">**</span><span class="n">xargs</span><span class="p">)</span>
        <span class="k">for</span> <span class="n">input_index</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">inputs</span><span class="p">)):</span>
            <span class="n">input_layer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_input_layer</span><span class="p">(</span><span class="n">input_index</span><span class="p">)</span>
            <span class="n">psize</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">size</span> <span class="o">*</span> <span class="n">input_layer</span><span class="o">.</span><span class="n">size</span>
            <span class="n">dims</span> <span class="o">=</span> <span class="p">[</span><span class="n">input_layer</span><span class="o">.</span><span class="n">size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">size</span><span class="p">]</span>
            <span class="n">format</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">inputs</span><span class="p">[</span><span class="n">input_index</span><span class="p">]</span><span class="o">.</span><span class="n">format</span>
            <span class="n">sparse</span> <span class="o">=</span> <span class="n">format</span> <span class="o">==</span> <span class="s2">&quot;csr&quot;</span> <span class="ow">or</span> <span class="n">format</span> <span class="o">==</span> <span class="s2">&quot;csc&quot;</span>
            <span class="k">if</span> <span class="n">sparse</span><span class="p">:</span>
                <span class="n">psize</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">inputs</span><span class="p">[</span><span class="n">input_index</span><span class="p">]</span><span class="o">.</span><span class="n">nnz</span>
            <span class="bp">self</span><span class="o">.</span><span class="n">create_input_parameter</span><span class="p">(</span><span class="n">input_index</span><span class="p">,</span> <span class="n">psize</span><span class="p">,</span> <span class="n">dims</span><span class="p">,</span> <span class="n">sparse</span><span class="p">,</span> <span class="n">format</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">create_bias_parameter</span><span class="p">(</span><span class="n">bias</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">config</span><span class="o">.</span><span class="n">size</span><span class="p">)</span>
</pre></div>
</div>
<p>In network configuration, the layer can be specifies using the following code snippets. The arguments of this class are:
- <cite>name</cite> is the name identifier of the layer instance.
- <cite>type</cite> is the type of the layer, specified using layer identifier.
- <cite>size</cite> is the output size of the layer.
- <cite>bias</cite> specifies whether this layer instance has bias.
- <cite>inputs</cite> specifies a list of layer instance names as inputs.</p>
<p>The code is listed below:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">Layer</span><span class="p">(</span>
    <span class="n">name</span> <span class="o">=</span> <span class="s2">&quot;fc1&quot;</span><span class="p">,</span>
    <span class="nb">type</span> <span class="o">=</span> <span class="s2">&quot;fc&quot;</span><span class="p">,</span>
    <span class="n">size</span> <span class="o">=</span> <span class="mi">64</span><span class="p">,</span>
    <span class="n">bias</span> <span class="o">=</span> <span class="bp">True</span><span class="p">,</span>
    <span class="n">inputs</span> <span class="o">=</span> <span class="p">[</span><span class="n">Input</span><span class="p">(</span><span class="s2">&quot;pool3&quot;</span><span class="p">)]</span>
<span class="p">)</span>
</pre></div>
</div>
<p>You are also recommended to implement a helper for the Python wrapper, which makes it easier to write models. You can refer to <cite>python/paddle/trainer_config_helpers/layers.py</cite> for examples.</p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../../index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Writing New Layers</a><ul>
<li><a class="reference internal" href="#derive-equations">Derive Equations</a></li>
<li><a class="reference internal" href="#implement-c-class">Implement C++ Class</a></li>
<li><a class="reference internal" href="#write-gradient-check-unit-test">Write Gradient Check Unit Test</a></li>
<li><a class="reference internal" href="#implement-python-wrapper">Implement Python Wrapper</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="../embedding_model/index.html"
                        title="previous chapter">Chinese Word Embedding Model Tutorial</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="../../cluster/index.html"
                        title="next chapter">Cluster Train</a></p>
  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="../../_sources/demo/new_layer/index.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <form class="search" action="../../search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../../genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="../../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="../../cluster/index.html" title="Cluster Train"
             >next</a> |</li>
        <li class="right" >
          <a href="../embedding_model/index.html" title="Chinese Word Embedding Model Tutorial"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../../index.html">PaddlePaddle  documentation</a> &raquo;</li>
          <li class="nav-item nav-item-1"><a href="../index.html" >Examples and demos</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &copy; Copyright 2016, PaddlePaddle developers.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.3.5.
    </div>
  </body>
</html>