@@ -175,13 +175,6 @@ The raw data needs to be preprocessed into formats that PaddlePaddle can handle.
...
@@ -175,13 +175,6 @@ The raw data needs to be preprocessed into formats that PaddlePaddle can handle.
4. Construct the markings in BIO format;
4. Construct the markings in BIO format;
5. Obtain the integer index corresponding to the word according to the dictionary.
5. Obtain the integer index corresponding to the word according to the dictionary.
```python
# import paddle.v2.dataset.conll05 as conll05
# conll05.corpus_reader does step 1 and 2 as mentioned above.
# conll05.reader_creator does step 3 to 5.
# conll05.test gets preprocessed training instances.
```
After preprocessing, a training sample contains nine features, namely: word sequence, predicate, predicate context (5 columns), region mark sequence, label sequence. The following table is an example of a training sample.
After preprocessing, a training sample contains nine features, namely: word sequence, predicate, predicate context (5 columns), region mark sequence, label sequence. The following table is an example of a training sample.
| word sequence | predicate | predicate context(5 columns) | region mark sequence | label sequence|
| word sequence | predicate | predicate context(5 columns) | region mark sequence | label sequence|
...
@@ -209,6 +202,8 @@ We trained a language model on the English Wikipedia to get a word vector lookup
...
@@ -209,6 +202,8 @@ We trained a language model on the English Wikipedia to get a word vector lookup
@@ -217,13 +217,6 @@ The raw data needs to be preprocessed into formats that PaddlePaddle can handle.
...
@@ -217,13 +217,6 @@ The raw data needs to be preprocessed into formats that PaddlePaddle can handle.
4. Construct the markings in BIO format;
4. Construct the markings in BIO format;
5. Obtain the integer index corresponding to the word according to the dictionary.
5. Obtain the integer index corresponding to the word according to the dictionary.
```python
# import paddle.v2.dataset.conll05 as conll05
# conll05.corpus_reader does step 1 and 2 as mentioned above.
# conll05.reader_creator does step 3 to 5.
# conll05.test gets preprocessed training instances.
```
After preprocessing, a training sample contains nine features, namely: word sequence, predicate, predicate context (5 columns), region mark sequence, label sequence. The following table is an example of a training sample.
After preprocessing, a training sample contains nine features, namely: word sequence, predicate, predicate context (5 columns), region mark sequence, label sequence. The following table is an example of a training sample.
| word sequence | predicate | predicate context(5 columns) | region mark sequence | label sequence|
| word sequence | predicate | predicate context(5 columns) | region mark sequence | label sequence|
...
@@ -251,6 +244,8 @@ We trained a language model on the English Wikipedia to get a word vector lookup
...
@@ -251,6 +244,8 @@ We trained a language model on the English Wikipedia to get a word vector lookup