<spanid="paddle::DataProvider::create__DataConfigCR.b"></span><spanclass="target"id="paddleclasspaddle_1_1DataProvider_1ad782dc59f7366c19ba4375101159ba95"></span><emclass="property">static</em><aclass="reference internal"href="#_CPPv2N6paddle12DataProviderE"title="paddle::DataProvider">DataProvider</a> *<codeclass="descname">create</code><spanclass="sig-paren">(</span><emclass="property">const</em> DataConfig &<em>config</em>, bool <em>useGpu</em><spanclass="sig-paren">)</span><aclass="headerlink"href="#_CPPv2N6paddle12DataProvider6createERK10DataConfigb"title="Permalink to this definition">¶</a></dt>
<spanid="paddle::DataProvider::registrar___ClassRegistrar:DataProvider.DataConfig.b:"></span><spanclass="target"id="paddleclasspaddle_1_1DataProvider_1acc7ff8754097b2ecdd6c85ba98e78e18"></span>ClassRegistrar<<aclass="reference internal"href="#_CPPv2N6paddle12DataProviderE"title="paddle::DataProvider">DataProvider</a>, DataConfig, bool><codeclass="descname">registrar_</code><aclass="headerlink"href="#_CPPv2N6paddle12DataProvider10registrar_E"title="Permalink to this definition">¶</a></dt>
<spanid="paddle::DataProvider::registrar___ClassRegistrar:DataProvider.DataConfig.ModelConfig.b:"></span><spanclass="target"id="paddleclasspaddle_1_1DataProvider_1ae40e5169aa51da3fb0e903ff75b8ab01"></span>ClassRegistrar<<aclass="reference internal"href="#_CPPv2N6paddle12DataProviderE"title="paddle::DataProvider">DataProvider</a>, DataConfig, <aclass="reference internal"href="../../api/api.html#_CPPv211ModelConfig"title="ModelConfig">ModelConfig</a>, bool><codeclass="descname">registrar_</code><aclass="headerlink"href="#_CPPv2N6paddle12DataProvider10registrar_E"title="Permalink to this definition">¶</a></dt>
<spanid="paddle::MultiDataProvider::MultiDataProvider__DataConfigCR.b"></span><spanclass="target"id="paddleclasspaddle_1_1MultiDataProvider_1a9335b9b57d19fccd0b374746b0e28336"></span><codeclass="descname">MultiDataProvider</code><spanclass="sig-paren">(</span><emclass="property">const</em> DataConfig &<em>config</em>, bool <em>useGpu</em><spanclass="sig-paren">)</span><aclass="headerlink"href="#_CPPv2N6paddle17MultiDataProvider17MultiDataProviderERK10DataConfigb"title="Permalink to this definition">¶</a></dt>
<spanid="paddle::MultiDataProvider::MultiDataProvider__DataConfigCR.ModelConfigCR.b"></span><spanclass="target"id="paddleclasspaddle_1_1MultiDataProvider_1acd993858e31a0f829e7ca3034bcdb655"></span><codeclass="descname">MultiDataProvider</code><spanclass="sig-paren">(</span><emclass="property">const</em> DataConfig &<em>config</em>, <emclass="property">const</em><aclass="reference internal"href="../../api/api.html#_CPPv211ModelConfig"title="ModelConfig">ModelConfig</a>&<em>modelConfig</em>, bool <em>useGpu</em><spanclass="sig-paren">)</span><aclass="headerlink"href="#_CPPv2N6paddle17MultiDataProvider17MultiDataProviderERK10DataConfigRK11ModelConfigb"title="Permalink to this definition">¶</a></dt>
<spanid="paddle::PyDataProvider2::PyDataProvider2__DataConfigCR.b"></span><spanclass="target"id="paddleclasspaddle_1_1PyDataProvider2_1a493a2635a8b83aeb9be3ce6db0d3aa55"></span><codeclass="descname">PyDataProvider2</code><spanclass="sig-paren">(</span><emclass="property">const</em> DataConfig &<em>config</em>, bool <em>useGpu</em><spanclass="sig-paren">)</span><aclass="headerlink"href="#_CPPv2N6paddle15PyDataProvider215PyDataProvider2ERK10DataConfigb"title="Permalink to this definition">¶</a></dt>
<spanid="paddle::PyDataProvider2::PyDataProvider2__DataConfigCR.ModelConfigCR.b"></span><spanclass="target"id="paddleclasspaddle_1_1PyDataProvider2_1a2281a7aca7c68247414746864688e7bb"></span><codeclass="descname">PyDataProvider2</code><spanclass="sig-paren">(</span><emclass="property">const</em> DataConfig &<em>config</em>, <emclass="property">const</em><aclass="reference internal"href="../../api/api.html#_CPPv211ModelConfig"title="ModelConfig">ModelConfig</a>&<em>modelConfig</em>, bool <em>useGpu</em><spanclass="sig-paren">)</span><aclass="headerlink"href="#_CPPv2N6paddle15PyDataProvider215PyDataProvider2ERK10DataConfigRK11ModelConfigb"title="Permalink to this definition">¶</a></dt>
<spanclass="k">def</span><spanclass="nf">process</span><spanclass="p">(</span><spanclass="n">settings</span><spanclass="p">,</span><spanclass="n">filename</span><spanclass="p">):</span><spanclass="c1"># settings is not used currently.</span>
<spanclass="n">f</span><spanclass="o">=</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="n">filename</span><spanclass="p">,</span><spanclass="s1">'r'</span><spanclass="p">)</span><spanclass="c1"># open one of training file</span>
<spanclass="k">for</span><spanclass="n">line</span><spanclass="ow">in</span><spanclass="n">f</span><spanclass="p">:</span><spanclass="c1"># read each line</span>
<spanclass="n">f</span><spanclass="o">.</span><spanclass="n">close</span><spanclass="p">()</span><spanclass="c1"># close file</span>
</pre></div>
</td></tr></table></div>
<p>If user did’t give the <codeclass="code docutils literal"><spanclass="pre">data_layer</span></code>‘s name, PaddlePaddle will use
the order of <codeclass="code docutils literal"><spanclass="pre">data_layer</span></code> definition roughly to determine which feature to
which <codeclass="code docutils literal"><spanclass="pre">data_layer</span></code>. This order may be not correct, so TO DEFINE THE
<codeclass="code docutils literal"><spanclass="pre">data_layer</span></code>‘s NAMES EXPLICITLY IS THE RECOMMANDED WAY TO PROVIDER DATA.</p>
<p>Now, this simple example of using PyDataProvider is finished.
The only thing that the user should know is how to generte <strong>one sample</strong> from
<strong>one data file</strong>.
...
...
@@ -168,7 +229,7 @@ And PaddlePadle will do all of the rest things:</p>
<h2>DataProvider for the sequential model<aclass="headerlink"href="#dataprovider-for-the-sequential-model"title="Permalink to this headline">¶</a></h2>
<p>A sequence model takes sequences as its input. A sequence is made up of several
timesteps. The so-called timestep, is not necessary to have something to do
with ‘time’. It can also be explained to that the order of data are taken into
with time. It can also be explained to that the order of data are taken into
consideration into model design and training.
For example, the sentence can be interpreted as a kind of sequence data in NLP
tasks.</p>
...
...
@@ -278,24 +339,73 @@ Please refer to the following section reference for details.</p>
<h2>Reference<aclass="headerlink"href="#reference"title="Permalink to this headline">¶</a></h2>
<divclass="section"id="provider">
<h3>@provider<aclass="headerlink"href="#provider"title="Permalink to this headline">¶</a></h3>
<p><aclass="reference external"href="mailto:'%40provider">'<span>@</span>provider</a>‘ is a Python <aclass="reference external"href="http://www.learnpython.org/en/Decorators">Decorator</a>, it can construct a PyDataProvider in
PaddlePaddle from a user defined function. Its parameters are:</p>
<ulclass="simple">
<li><aclass="reference internal"href="#input-types">input_types</a> defines format of the data input.</li>
<li>should_shuffle defines whether to shuffle data or not. By default, it is set
true during training, and false during testing.</li>
<li>pool_size is the memory pool size (in sample number) in DataProvider.
-1 means no limit.</li>
<li>can_over_batch_size defines whether PaddlePaddle can store little more
samples than pool_size. It is better to set True to avoid some deadlocks.</li>
<li>calc_batch_size is a function define how to calculate batch size. This is
usefull in sequential model, that defines batch size is counted upon sequence
or token. By default, each sample or sequence counts to 1 when calculating
batch size.</li>
<li>cache is a data cache strategy, see <aclass="reference internal"href="#cache">cache</a>.</li>
<li>Init_hook function is invoked once the data provider is initialized,
see <aclass="reference internal"href="#init-hook">init_hook</a>.</li>
<dlclass="function">
<dtid="paddle.trainer.PyDataProvider2.provider">
<codeclass="descclassname">paddle.trainer.PyDataProvider2.</code><codeclass="descname">provider</code><spanclass="sig-paren">(</span><em>input_types=None</em>, <em>should_shuffle=None</em>, <em>pool_size=-1</em>, <em>min_pool_size=-1</em>, <em>can_over_batch_size=True</em>, <em>calc_batch_size=None</em>, <em>cache=0</em>, <em>check=False</em>, <em>check_fail_continue=False</em>, <em>use_dynamic_order=True</em>, <em>init_hook=None</em>, <em>**kwargs</em><spanclass="sig-paren">)</span><aclass="headerlink"href="#paddle.trainer.PyDataProvider2.provider"title="Permalink to this definition">¶</a></dt>
<dd><p>Provider decorator. Use it to make a function into PyDataProvider2 object.
In this function, user only need to get each sample for some train/test
<spanclass="k">def</span><spanclass="nf">process</span><spanclass="p">(</span><spanclass="n">settings</span><spanclass="p">,</span><spanclass="n">filename</span><spanclass="p">):</span><spanclass="c1"># settings is not used currently.</span>
<spanclass="n">f</span><spanclass="o">=</span><spanclass="nb">open</span><spanclass="p">(</span><spanclass="n">filename</span><spanclass="p">,</span><spanclass="s1">'r'</span><spanclass="p">)</span><spanclass="c1"># open one of training file</span>
<spanclass="k">for</span><spanclass="n">line</span><spanclass="ow">in</span><spanclass="n">f</span><spanclass="p">:</span><spanclass="c1"># read each line</span>