提交 659d1ff7 编写于 作者: T Travis CI

Deploy to GitHub Pages: d3e003be

上级 77785bd6
...@@ -287,26 +287,62 @@ label for ctc</li> ...@@ -287,26 +287,62 @@ label for ctc</li>
<dt> <dt>
<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">chunk</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">chunk</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>Chunk evaluator is used to evaluate segment labelling accuracy for a <dd><p>Chunk evaluator is used to evaluate segment labelling accuracy for a
sequence. It calculates the chunk detection F1 score.</p> sequence. It calculates precision, recall and F1 scores for the chunk detection.</p>
<p>A chunk is correctly detected if its beginning, end and type are correct. <p>To use chunk evaluator, several concepts need to be clarified firstly.</p>
Other chunk type is ignored.</p> <ul class="simple">
<p>For each label in the label sequence, we have:</p> <li><strong>Chunk type</strong> is the type of the whole chunk and a chunk consists of one or several words. (For example in NER, ORG for organization name, PER for person name etc.)</li>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">tagType</span> <span class="o">=</span> <span class="n">label</span> <span class="o">%</span> <span class="n">numTagType</span> <li><strong>Tag type</strong> indicates the position of a word in a chunk. (B for begin, I for inside, E for end, S for single)</li>
<span class="n">chunkType</span> <span class="o">=</span> <span class="n">label</span> <span class="o">/</span> <span class="n">numTagType</span> </ul>
<span class="n">otherChunkType</span> <span class="o">=</span> <span class="n">numChunkTypes</span> <p>We can name a label by combining tag type and chunk type. (ie. B-ORG for begining of an organization name)</p>
<p>The construction of label dictionary should obey the following rules:</p>
<ul class="simple">
<li>Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry.</li>
</ul>
<div class="highlight-text"><div class="highlight"><pre><span></span>Scheme Description
plain Use the same label for the whole chunk.
IOB Two labels for chunk type X, B-X for chunk begining and I-X for chunk inside.
IOE Two labels for chunk type X, E-X for chunk ending and I-X for chunk inside.
IOBES Four labels for chunk type X, B-X for chunk begining, I-X for chunk inside, E-X for chunk end and S-X for single word chunk.
</pre></div>
</div>
<p>To make it clear, let&#8217;s illustrate by an NER example.
Assuming that there are three named entity types including ORG, PER and LOC which are called &#8216;chunk type&#8217; here,
if &#8216;IOB&#8217; scheme were used, the label set will be extended to a set including B-ORG, I-ORG, B-PER, I-PER, B-LOC, I-LOC and O,
in which B-ORG for begining of ORG and I-ORG for inside of ORG.
Prefixes which are called &#8216;tag type&#8217; here are added to chunk types and there are two tag types including B and I.
Of course, the training data should be labeled accordingly.</p>
<ul class="simple">
<li>Mapping is done correctly by the listed equations and assigning protocol.</li>
</ul>
<p>The following table are equations to extract tag type and chunk type from a label.</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>tagType = label % numTagType
chunkType = label / numTagType
otherChunkType = numChunkTypes
</pre></div>
</div>
<p>The following table shows the mapping rule between tagType and tag type in each scheme.</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>Scheme Begin Inside End Single
plain 0 - - -
IOB 0 1 - -
IOE - 0 1 -
IOBES 0 1 2 3
</pre></div> </pre></div>
</div> </div>
<p>The total number of different labels is numTagType*numChunkTypes+1. <p>Continue the NER example, and the label dict should look like this to satify above equations:</p>
We support 4 labelling scheme. <div class="highlight-text"><div class="highlight"><pre><span></span>B-ORG 0
The tag type for each of the scheme is shown as follows:</p> I-ORG 1
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">Scheme</span> <span class="n">Begin</span> <span class="n">Inside</span> <span class="n">End</span> <span class="n">Single</span> B-PER 2
<span class="n">plain</span> <span class="mi">0</span> <span class="o">-</span> <span class="o">-</span> <span class="o">-</span> I-PER 3
<span class="n">IOB</span> <span class="mi">0</span> <span class="mi">1</span> <span class="o">-</span> <span class="o">-</span> B-LOC 4
<span class="n">IOE</span> <span class="o">-</span> <span class="mi">0</span> <span class="mi">1</span> <span class="o">-</span> I-LOC 5
<span class="n">IOBES</span> <span class="mi">0</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span> O 6
</pre></div> </pre></div>
</div> </div>
<p>&#8216;plain&#8217; means the whole chunk must contain exactly the same chunk label.</p> <p>In this example, chunkType has three values: 0 for ORG, 1 for PER, 2 for LOC, because the scheme is
&#8220;IOB&#8221; so tagType has two values: 0 for B and 1 for I.
Here we will use I-LOC to explain the above mapping rules in detail.
For I-LOC, the label id is 5, so we can get tagType=1 and chunkType=2, which means I-LOC is a part of NER chunk LOC
and the tag is I.</p>
<p>The simple usage is:</p> <p>The simple usage is:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">chunk_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">chunk_scheme</span><span class="p">,</span> <span class="n">num_chunk_types</span><span class="p">)</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">chunk_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">chunk_scheme</span><span class="p">,</span> <span class="n">num_chunk_types</span><span class="p">)</span>
</pre></div> </pre></div>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
...@@ -294,26 +294,62 @@ label for ctc</li> ...@@ -294,26 +294,62 @@ label for ctc</li>
<dt> <dt>
<em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">chunk</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt> <em class="property">class </em><code class="descclassname">paddle.v2.evaluator.</code><code class="descname">chunk</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
<dd><p>Chunk evaluator is used to evaluate segment labelling accuracy for a <dd><p>Chunk evaluator is used to evaluate segment labelling accuracy for a
sequence. It calculates the chunk detection F1 score.</p> sequence. It calculates precision, recall and F1 scores for the chunk detection.</p>
<p>A chunk is correctly detected if its beginning, end and type are correct. <p>To use chunk evaluator, several concepts need to be clarified firstly.</p>
Other chunk type is ignored.</p> <ul class="simple">
<p>For each label in the label sequence, we have:</p> <li><strong>Chunk type</strong> is the type of the whole chunk and a chunk consists of one or several words. (For example in NER, ORG for organization name, PER for person name etc.)</li>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">tagType</span> <span class="o">=</span> <span class="n">label</span> <span class="o">%</span> <span class="n">numTagType</span> <li><strong>Tag type</strong> indicates the position of a word in a chunk. (B for begin, I for inside, E for end, S for single)</li>
<span class="n">chunkType</span> <span class="o">=</span> <span class="n">label</span> <span class="o">/</span> <span class="n">numTagType</span> </ul>
<span class="n">otherChunkType</span> <span class="o">=</span> <span class="n">numChunkTypes</span> <p>We can name a label by combining tag type and chunk type. (ie. B-ORG for begining of an organization name)</p>
<p>The construction of label dictionary should obey the following rules:</p>
<ul class="simple">
<li>Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry.</li>
</ul>
<div class="highlight-text"><div class="highlight"><pre><span></span>Scheme Description
plain Use the same label for the whole chunk.
IOB Two labels for chunk type X, B-X for chunk begining and I-X for chunk inside.
IOE Two labels for chunk type X, E-X for chunk ending and I-X for chunk inside.
IOBES Four labels for chunk type X, B-X for chunk begining, I-X for chunk inside, E-X for chunk end and S-X for single word chunk.
</pre></div>
</div>
<p>To make it clear, let&#8217;s illustrate by an NER example.
Assuming that there are three named entity types including ORG, PER and LOC which are called &#8216;chunk type&#8217; here,
if &#8216;IOB&#8217; scheme were used, the label set will be extended to a set including B-ORG, I-ORG, B-PER, I-PER, B-LOC, I-LOC and O,
in which B-ORG for begining of ORG and I-ORG for inside of ORG.
Prefixes which are called &#8216;tag type&#8217; here are added to chunk types and there are two tag types including B and I.
Of course, the training data should be labeled accordingly.</p>
<ul class="simple">
<li>Mapping is done correctly by the listed equations and assigning protocol.</li>
</ul>
<p>The following table are equations to extract tag type and chunk type from a label.</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>tagType = label % numTagType
chunkType = label / numTagType
otherChunkType = numChunkTypes
</pre></div>
</div>
<p>The following table shows the mapping rule between tagType and tag type in each scheme.</p>
<div class="highlight-text"><div class="highlight"><pre><span></span>Scheme Begin Inside End Single
plain 0 - - -
IOB 0 1 - -
IOE - 0 1 -
IOBES 0 1 2 3
</pre></div> </pre></div>
</div> </div>
<p>The total number of different labels is numTagType*numChunkTypes+1. <p>Continue the NER example, and the label dict should look like this to satify above equations:</p>
We support 4 labelling scheme. <div class="highlight-text"><div class="highlight"><pre><span></span>B-ORG 0
The tag type for each of the scheme is shown as follows:</p> I-ORG 1
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">Scheme</span> <span class="n">Begin</span> <span class="n">Inside</span> <span class="n">End</span> <span class="n">Single</span> B-PER 2
<span class="n">plain</span> <span class="mi">0</span> <span class="o">-</span> <span class="o">-</span> <span class="o">-</span> I-PER 3
<span class="n">IOB</span> <span class="mi">0</span> <span class="mi">1</span> <span class="o">-</span> <span class="o">-</span> B-LOC 4
<span class="n">IOE</span> <span class="o">-</span> <span class="mi">0</span> <span class="mi">1</span> <span class="o">-</span> I-LOC 5
<span class="n">IOBES</span> <span class="mi">0</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span> O 6
</pre></div> </pre></div>
</div> </div>
<p>&#8216;plain&#8217; means the whole chunk must contain exactly the same chunk label.</p> <p>In this example, chunkType has three values: 0 for ORG, 1 for PER, 2 for LOC, because the scheme is
&#8220;IOB&#8221; so tagType has two values: 0 for B and 1 for I.
Here we will use I-LOC to explain the above mapping rules in detail.
For I-LOC, the label id is 5, so we can get tagType=1 and chunkType=2, which means I-LOC is a part of NER chunk LOC
and the tag is I.</p>
<p>The simple usage is:</p> <p>The simple usage is:</p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">chunk_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">chunk_scheme</span><span class="p">,</span> <span class="n">num_chunk_types</span><span class="p">)</span> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="nb">eval</span> <span class="o">=</span> <span class="n">chunk_evaluator</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">chunk_scheme</span><span class="p">,</span> <span class="n">num_chunk_types</span><span class="p">)</span>
</pre></div> </pre></div>
......
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册