提交 546fa1a8 编写于 作者: T Travis CI

Deploy to GitHub Pages: 624d22d9

上级 ad7f6583
...@@ -87,6 +87,11 @@ roi_pool ...@@ -87,6 +87,11 @@ roi_pool
.. autoclass:: paddle.v2.layer.roi_pool .. autoclass:: paddle.v2.layer.roi_pool
:noindex: :noindex:
pad
----
.. autoclass:: paddle.v2.layer.pad
:noindex:
Norm Layer Norm Layer
========== ==========
...@@ -133,6 +138,11 @@ grumemory ...@@ -133,6 +138,11 @@ grumemory
.. autoclass:: paddle.v2.layer.grumemory .. autoclass:: paddle.v2.layer.grumemory
:noindex: :noindex:
gated_unit
-----------
.. autoclass:: paddle.v2.layer.gated_unit
:noindex:
Recurrent Layer Group Recurrent Layer Group
===================== =====================
...@@ -340,6 +350,11 @@ bilinear_interp ...@@ -340,6 +350,11 @@ bilinear_interp
.. autoclass:: paddle.v2.layer.bilinear_interp .. autoclass:: paddle.v2.layer.bilinear_interp
:noindex: :noindex:
dropout
--------
.. autoclass:: paddle.v2.layer.dropout
:noindex:
dot_prod dot_prod
--------- ---------
.. autoclass:: paddle.v2.layer.dot_prod .. autoclass:: paddle.v2.layer.dot_prod
...@@ -402,6 +417,11 @@ scale_shift ...@@ -402,6 +417,11 @@ scale_shift
.. autoclass:: paddle.v2.layer.scale_shift .. autoclass:: paddle.v2.layer.scale_shift
:noindex: :noindex:
factorization_machine
---------------------
.. autoclass:: paddle.v2.layer.factorization_machine
:noindex:
Sampling Layers Sampling Layers
=============== ===============
...@@ -420,22 +440,6 @@ multiplex ...@@ -420,22 +440,6 @@ multiplex
.. autoclass:: paddle.v2.layer.multiplex .. autoclass:: paddle.v2.layer.multiplex
:noindex: :noindex:
Factorization Machine Layer
============================
factorization_machine
---------------------
.. autoclass:: paddle.v2.layer.factorization_machine
:noindex:
Slicing and Joining Layers
==========================
pad
----
.. autoclass:: paddle.v2.layer.pad
:noindex:
.. _api_v2.layer_costs: .. _api_v2.layer_costs:
Cost Layers Cost Layers
...@@ -526,6 +530,11 @@ multibox_loss ...@@ -526,6 +530,11 @@ multibox_loss
.. autoclass:: paddle.v2.layer.multibox_loss .. autoclass:: paddle.v2.layer.multibox_loss
:noindex: :noindex:
detection_output
----------------
.. autoclass:: paddle.v2.layer.detection_output
:noindex:
Check Layer Check Layer
============ ============
...@@ -534,31 +543,10 @@ eos ...@@ -534,31 +543,10 @@ eos
.. autoclass:: paddle.v2.layer.eos .. autoclass:: paddle.v2.layer.eos
:noindex: :noindex:
Miscs Activation
===== ==========
dropout
--------
.. autoclass:: paddle.v2.layer.dropout
:noindex:
Activation with learnable parameter
===================================
prelu prelu
-------- --------
.. autoclass:: paddle.v2.layer.prelu .. autoclass:: paddle.v2.layer.prelu
:noindex: :noindex:
gated_unit
-----------
.. autoclass:: paddle.v2.layer.gated_unit
:noindex:
Detection output Layer
======================
detection_output
----------------
.. autoclass:: paddle.v2.layer.detection_output
:noindex:
...@@ -73,3 +73,10 @@ wmt14 ...@@ -73,3 +73,10 @@ wmt14
.. automodule:: paddle.v2.dataset.wmt14 .. automodule:: paddle.v2.dataset.wmt14
:members: :members:
:noindex: :noindex:
wmt16
+++++
.. automodule:: paddle.v2.dataset.wmt16
:members:
:noindex:
...@@ -729,6 +729,191 @@ sequence.</p> ...@@ -729,6 +729,191 @@ sequence.</p>
<dd><p>Converts dataset to recordio format</p> <dd><p>Converts dataset to recordio format</p>
</dd></dl> </dd></dl>
</div>
<div class="section" id="wmt16">
<h2>wmt16<a class="headerlink" href="#wmt16" title="Permalink to this headline"></a></h2>
<p>ACL2016 Multimodal Machine Translation. Please see this website for more
details: <a class="reference external" href="http://www.statmt.org/wmt16/multimodal-task.html#task1">http://www.statmt.org/wmt16/multimodal-task.html#task1</a></p>
<p>If you use the dataset created for your task, please cite the following paper:
Multi30K: Multilingual English-German Image Descriptions.</p>
<dl class="docutils">
<dt>&#64;article{elliott-EtAl:2016:VL16,</dt>
<dd>author = {{Elliott}, D. and {Frank}, S. and {Sima&#8221;an}, K. and {Specia}, L.},
title = {Multi30K: Multilingual English-German Image Descriptions},
booktitle = {Proceedings of the 6th Workshop on Vision and Language},
year = {2016},
pages = {70&#8211;74},
year = 2016</dd>
</dl>
<p>}</p>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">train</code><span class="sig-paren">(</span><em>src_dict_size</em>, <em>trg_dict_size</em>, <em>src_lang='en'</em><span class="sig-paren">)</span></dt>
<dd><p>WMT16 train set reader.</p>
<p>This function returns the reader for train data. Each sample the reader
returns is made up of three fields: the source language word index sequence,
target language word index sequence and next word index sequence.</p>
<p>NOTE:
The original like for training data is:
<a class="reference external" href="http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz">http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz</a></p>
<p>paddle.dataset.wmt16 provides a tokenized version of the original dataset by
using moses&#8217;s tokenization script:
<a class="reference external" href="https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl">https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl</a></p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>src_dict_size</strong> (<em>int</em>) &#8211; Size of the source language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>trg_dict_size</strong> (<em>int</em>) &#8211; Size of the target language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>src_lang</strong> (<em>string</em>) &#8211; A string indicating which language is the source
language. Available options are: &#8220;en&#8221; for English
and &#8220;de&#8221; for Germany.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The train reader.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">callable</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">test</code><span class="sig-paren">(</span><em>src_dict_size</em>, <em>trg_dict_size</em>, <em>src_lang='en'</em><span class="sig-paren">)</span></dt>
<dd><p>WMT16 test set reader.</p>
<p>This function returns the reader for test data. Each sample the reader
returns is made up of three fields: the source language word index sequence,
target language word index sequence and next word index sequence.</p>
<p>NOTE:
The original like for test data is:
<a class="reference external" href="http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/mmt16_task1_test.tar.gz">http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/mmt16_task1_test.tar.gz</a></p>
<p>paddle.dataset.wmt16 provides a tokenized version of the original dataset by
using moses&#8217;s tokenization script:
<a class="reference external" href="https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl">https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl</a></p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>src_dict_size</strong> (<em>int</em>) &#8211; Size of the source language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>trg_dict_size</strong> (<em>int</em>) &#8211; Size of the target language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>src_lang</strong> (<em>string</em>) &#8211; A string indicating which language is the source
language. Available options are: &#8220;en&#8221; for English
and &#8220;de&#8221; for Germany.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The test reader.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">callable</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">validation</code><span class="sig-paren">(</span><em>src_dict_size</em>, <em>trg_dict_size</em>, <em>src_lang='en'</em><span class="sig-paren">)</span></dt>
<dd><p>WMT16 validation set reader.</p>
<p>This function returns the reader for validation data. Each sample the reader
returns is made up of three fields: the source language word index sequence,
target language word index sequence and next word index sequence.</p>
<p>NOTE:
The original like for validation data is:
<a class="reference external" href="http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz">http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz</a></p>
<p>paddle.dataset.wmt16 provides a tokenized version of the original dataset by
using moses&#8217;s tokenization script:
<a class="reference external" href="https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl">https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl</a></p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>src_dict_size</strong> (<em>int</em>) &#8211; Size of the source language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>trg_dict_size</strong> (<em>int</em>) &#8211; Size of the target language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>src_lang</strong> (<em>string</em>) &#8211; A string indicating which language is the source
language. Available options are: &#8220;en&#8221; for English
and &#8220;de&#8221; for Germany.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The validation reader.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">callable</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">get_dict</code><span class="sig-paren">(</span><em>lang</em>, <em>dict_size</em>, <em>reverse=False</em><span class="sig-paren">)</span></dt>
<dd><p>return the word dictionary for the specified language.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
<li><strong>lang</strong> (<em>string</em>) &#8211; A string indicating which language is the source
language. Available options are: &#8220;en&#8221; for English
and &#8220;de&#8221; for Germany.</li>
<li><strong>dict_size</strong> (<em>int</em>) &#8211; Size of the specified language dictionary.</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; If reverse is set to False, the returned python
dictionary will use word as key and use index as value.
If reverse is set to True, the returned python
dictionary will use index as key and word as value.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The word dictionary for the specific language.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">dict</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">fetch</code><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
<dd><p>download the entire dataset.</p>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">convert</code><span class="sig-paren">(</span><em>path</em>, <em>src_dict_size</em>, <em>trg_dict_size</em>, <em>src_lang</em><span class="sig-paren">)</span></dt>
<dd><p>Converts dataset to recordio format.</p>
</dd></dl>
</div> </div>
</div> </div>
......
此差异已折叠。
因为 它太大了无法显示 source diff 。你可以改为 查看blob
...@@ -87,6 +87,11 @@ roi_pool ...@@ -87,6 +87,11 @@ roi_pool
.. autoclass:: paddle.v2.layer.roi_pool .. autoclass:: paddle.v2.layer.roi_pool
:noindex: :noindex:
pad
----
.. autoclass:: paddle.v2.layer.pad
:noindex:
Norm Layer Norm Layer
========== ==========
...@@ -133,6 +138,11 @@ grumemory ...@@ -133,6 +138,11 @@ grumemory
.. autoclass:: paddle.v2.layer.grumemory .. autoclass:: paddle.v2.layer.grumemory
:noindex: :noindex:
gated_unit
-----------
.. autoclass:: paddle.v2.layer.gated_unit
:noindex:
Recurrent Layer Group Recurrent Layer Group
===================== =====================
...@@ -340,6 +350,11 @@ bilinear_interp ...@@ -340,6 +350,11 @@ bilinear_interp
.. autoclass:: paddle.v2.layer.bilinear_interp .. autoclass:: paddle.v2.layer.bilinear_interp
:noindex: :noindex:
dropout
--------
.. autoclass:: paddle.v2.layer.dropout
:noindex:
dot_prod dot_prod
--------- ---------
.. autoclass:: paddle.v2.layer.dot_prod .. autoclass:: paddle.v2.layer.dot_prod
...@@ -402,6 +417,11 @@ scale_shift ...@@ -402,6 +417,11 @@ scale_shift
.. autoclass:: paddle.v2.layer.scale_shift .. autoclass:: paddle.v2.layer.scale_shift
:noindex: :noindex:
factorization_machine
---------------------
.. autoclass:: paddle.v2.layer.factorization_machine
:noindex:
Sampling Layers Sampling Layers
=============== ===============
...@@ -420,22 +440,6 @@ multiplex ...@@ -420,22 +440,6 @@ multiplex
.. autoclass:: paddle.v2.layer.multiplex .. autoclass:: paddle.v2.layer.multiplex
:noindex: :noindex:
Factorization Machine Layer
============================
factorization_machine
---------------------
.. autoclass:: paddle.v2.layer.factorization_machine
:noindex:
Slicing and Joining Layers
==========================
pad
----
.. autoclass:: paddle.v2.layer.pad
:noindex:
.. _api_v2.layer_costs: .. _api_v2.layer_costs:
Cost Layers Cost Layers
...@@ -526,6 +530,11 @@ multibox_loss ...@@ -526,6 +530,11 @@ multibox_loss
.. autoclass:: paddle.v2.layer.multibox_loss .. autoclass:: paddle.v2.layer.multibox_loss
:noindex: :noindex:
detection_output
----------------
.. autoclass:: paddle.v2.layer.detection_output
:noindex:
Check Layer Check Layer
============ ============
...@@ -534,31 +543,10 @@ eos ...@@ -534,31 +543,10 @@ eos
.. autoclass:: paddle.v2.layer.eos .. autoclass:: paddle.v2.layer.eos
:noindex: :noindex:
Miscs Activation
===== ==========
dropout
--------
.. autoclass:: paddle.v2.layer.dropout
:noindex:
Activation with learnable parameter
===================================
prelu prelu
-------- --------
.. autoclass:: paddle.v2.layer.prelu .. autoclass:: paddle.v2.layer.prelu
:noindex: :noindex:
gated_unit
-----------
.. autoclass:: paddle.v2.layer.gated_unit
:noindex:
Detection output Layer
======================
detection_output
----------------
.. autoclass:: paddle.v2.layer.detection_output
:noindex:
...@@ -73,3 +73,10 @@ wmt14 ...@@ -73,3 +73,10 @@ wmt14
.. automodule:: paddle.v2.dataset.wmt14 .. automodule:: paddle.v2.dataset.wmt14
:members: :members:
:noindex: :noindex:
wmt16
+++++
.. automodule:: paddle.v2.dataset.wmt16
:members:
:noindex:
...@@ -748,6 +748,191 @@ sequence.</p> ...@@ -748,6 +748,191 @@ sequence.</p>
<dd><p>Converts dataset to recordio format</p> <dd><p>Converts dataset to recordio format</p>
</dd></dl> </dd></dl>
</div>
<div class="section" id="wmt16">
<h2>wmt16<a class="headerlink" href="#wmt16" title="永久链接至标题"></a></h2>
<p>ACL2016 Multimodal Machine Translation. Please see this website for more
details: <a class="reference external" href="http://www.statmt.org/wmt16/multimodal-task.html#task1">http://www.statmt.org/wmt16/multimodal-task.html#task1</a></p>
<p>If you use the dataset created for your task, please cite the following paper:
Multi30K: Multilingual English-German Image Descriptions.</p>
<dl class="docutils">
<dt>&#64;article{elliott-EtAl:2016:VL16,</dt>
<dd>author = {{Elliott}, D. and {Frank}, S. and {Sima&#8221;an}, K. and {Specia}, L.},
title = {Multi30K: Multilingual English-German Image Descriptions},
booktitle = {Proceedings of the 6th Workshop on Vision and Language},
year = {2016},
pages = {70&#8211;74},
year = 2016</dd>
</dl>
<p>}</p>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">train</code><span class="sig-paren">(</span><em>src_dict_size</em>, <em>trg_dict_size</em>, <em>src_lang='en'</em><span class="sig-paren">)</span></dt>
<dd><p>WMT16 train set reader.</p>
<p>This function returns the reader for train data. Each sample the reader
returns is made up of three fields: the source language word index sequence,
target language word index sequence and next word index sequence.</p>
<p>NOTE:
The original like for training data is:
<a class="reference external" href="http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz">http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz</a></p>
<p>paddle.dataset.wmt16 provides a tokenized version of the original dataset by
using moses&#8217;s tokenization script:
<a class="reference external" href="https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl">https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl</a></p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>src_dict_size</strong> (<em>int</em>) &#8211; Size of the source language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>trg_dict_size</strong> (<em>int</em>) &#8211; Size of the target language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>src_lang</strong> (<em>string</em>) &#8211; A string indicating which language is the source
language. Available options are: &#8220;en&#8221; for English
and &#8220;de&#8221; for Germany.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The train reader.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">callable</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">test</code><span class="sig-paren">(</span><em>src_dict_size</em>, <em>trg_dict_size</em>, <em>src_lang='en'</em><span class="sig-paren">)</span></dt>
<dd><p>WMT16 test set reader.</p>
<p>This function returns the reader for test data. Each sample the reader
returns is made up of three fields: the source language word index sequence,
target language word index sequence and next word index sequence.</p>
<p>NOTE:
The original like for test data is:
<a class="reference external" href="http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/mmt16_task1_test.tar.gz">http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/mmt16_task1_test.tar.gz</a></p>
<p>paddle.dataset.wmt16 provides a tokenized version of the original dataset by
using moses&#8217;s tokenization script:
<a class="reference external" href="https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl">https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl</a></p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>src_dict_size</strong> (<em>int</em>) &#8211; Size of the source language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>trg_dict_size</strong> (<em>int</em>) &#8211; Size of the target language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>src_lang</strong> (<em>string</em>) &#8211; A string indicating which language is the source
language. Available options are: &#8220;en&#8221; for English
and &#8220;de&#8221; for Germany.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The test reader.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">callable</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">validation</code><span class="sig-paren">(</span><em>src_dict_size</em>, <em>trg_dict_size</em>, <em>src_lang='en'</em><span class="sig-paren">)</span></dt>
<dd><p>WMT16 validation set reader.</p>
<p>This function returns the reader for validation data. Each sample the reader
returns is made up of three fields: the source language word index sequence,
target language word index sequence and next word index sequence.</p>
<p>NOTE:
The original like for validation data is:
<a class="reference external" href="http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz">http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz</a></p>
<p>paddle.dataset.wmt16 provides a tokenized version of the original dataset by
using moses&#8217;s tokenization script:
<a class="reference external" href="https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl">https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl</a></p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>src_dict_size</strong> (<em>int</em>) &#8211; Size of the source language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>trg_dict_size</strong> (<em>int</em>) &#8211; Size of the target language dictionary. Three
special tokens will be added into the dictionary:
&lt;s&gt; for start mark, &lt;e&gt; for end mark, and &lt;unk&gt; for
unknown word.</li>
<li><strong>src_lang</strong> (<em>string</em>) &#8211; A string indicating which language is the source
language. Available options are: &#8220;en&#8221; for English
and &#8220;de&#8221; for Germany.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The validation reader.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">callable</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">get_dict</code><span class="sig-paren">(</span><em>lang</em>, <em>dict_size</em>, <em>reverse=False</em><span class="sig-paren">)</span></dt>
<dd><p>return the word dictionary for the specified language.</p>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
<li><strong>lang</strong> (<em>string</em>) &#8211; A string indicating which language is the source
language. Available options are: &#8220;en&#8221; for English
and &#8220;de&#8221; for Germany.</li>
<li><strong>dict_size</strong> (<em>int</em>) &#8211; Size of the specified language dictionary.</li>
<li><strong>reverse</strong> (<em>bool</em>) &#8211; If reverse is set to False, the returned python
dictionary will use word as key and use index as value.
If reverse is set to True, the returned python
dictionary will use index as key and word as value.</li>
</ul>
</td>
</tr>
<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The word dictionary for the specific language.</p>
</td>
</tr>
<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">dict</p>
</td>
</tr>
</tbody>
</table>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">fetch</code><span class="sig-paren">(</span><span class="sig-paren">)</span></dt>
<dd><p>download the entire dataset.</p>
</dd></dl>
<dl class="function">
<dt>
<code class="descclassname">paddle.v2.dataset.wmt16.</code><code class="descname">convert</code><span class="sig-paren">(</span><em>path</em>, <em>src_dict_size</em>, <em>trg_dict_size</em>, <em>src_lang</em><span class="sig-paren">)</span></dt>
<dd><p>Converts dataset to recordio format.</p>
</dd></dl>
</div> </div>
</div> </div>
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册