提交 a74060d4 编写于 作者: Y yangyaming

modify usage document of chunk evaluator

上级 8411a73e
...@@ -347,32 +347,45 @@ def chunk_evaluator( ...@@ -347,32 +347,45 @@ def chunk_evaluator(
excluded_chunk_types=None, ): excluded_chunk_types=None, ):
""" """
Chunk evaluator is used to evaluate segment labelling accuracy for a Chunk evaluator is used to evaluate segment labelling accuracy for a
sequence. It calculates the chunk detection F1 score. sequence. It calculates precision, recall and F1 score of the chunk detection.
A chunk is correctly detected if its beginning, end and type are correct. To use chunk evaluator, the construction of label dict should obey the following rules:
Other chunk type is ignored. (1) Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry.
For each label in the label sequence, we have:
.. code-block:: python
Scheme Begin Inside End Single
plain 0 - - -
IOB 0 1 - -
IOE - 0 1 -
IOBES 0 1 2 3
.. code-block:: python .. code-block:: python
tagType = label % numTagType To make it clear, let's illustrate by a NER example.
chunkType = label / numTagType Assuming that there are two named entity types including ORG and PER which are called 'chunk type' here,
otherChunkType = numChunkTypes if 'IOB' scheme were used, the label set will be extended to a set including B-ORG, I-ORG, B-PER, I-PER and O,
in which B-ORG for begining of ORG and I-ORG for end of ORG.
Prefixes which are called 'tag type' here are added to chunk types and there are two tag types including B and I.
Of course, the training data should be labeled accordingly.
The total number of different labels is numTagType*numChunkTypes+1. (2) Map can be done correctly by the listed equations.
We support 4 labelling scheme.
The tag type for each of the scheme is shown as follows:
.. code-block:: python
tagType = label % numTagType
chunkType = label / numTagType
otherChunkType = numChunkTypes
.. code-block:: python .. code-block:: python
Scheme Begin Inside End Single Continue the NER example, and the label dict should like this to satify above equations:
plain 0 - - -
IOB 0 1 - - .. code-block:: python
IOE - 0 1 - B-ORG 0
IOBES 0 1 2 3 I-ORG 1
B-PER 2
I-PER 3
O 4
.. code-block:: python
'plain' means the whole chunk must contain exactly the same chunk label. Realizing that the number of is chunk type is 2 and number of tag type is 2, it is easy to validate this.
The simple usage is: The simple usage is:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册