From 9901e1e4a57f66f3e74bc0b4e21d51a441d231fe Mon Sep 17 00:00:00 2001 From: Steffy-zxf <48793257+Steffy-zxf@users.noreply.github.com> Date: Thu, 4 Jul 2019 16:19:35 +0800 Subject: [PATCH] Update README.md --- demo/multi-label-classification/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/demo/multi-label-classification/README.md b/demo/multi-label-classification/README.md index e54b651d..f576259f 100644 --- a/demo/multi-label-classification/README.md +++ b/demo/multi-label-classification/README.md @@ -46,11 +46,11 @@ reader = hub.reader.MultiLabelClassifyReader( `hub.dataset.Toxic()` 会自动从网络下载数据集并解压到用户目录下`$HOME/.paddlehub/dataset`目录 -`module.get_vaocab_path()` 会返回预训练模型对应的词表 +`module.get_vocab_path()` 会返回预训练模型对应的词表 `max_seq_len` 需要与Step1中context接口传入的序列长度保持一致 -MultiLabelClassifyReader中的`data_generator`会自动按照模型对应词表对数据进行切词,以迭代器的方式返回BERT所需要的Tensor格式,包括`input_ids`,`position_ids`,`segment_id`与序列对应的mask `input_mask`. +MultiLabelClassifyReader中的`data_generator`会自动按照模型对应词表对数据进行tokenize,以迭代器的方式返回BERT所需要的Tensor格式,包括`input_ids`,`position_ids`,`segment_id`与序列对应的mask `input_mask`. **NOTE**: Reader返回tensor的顺序是固定的,默认按照input_ids, position_ids, segment_id, input_mask这一顺序返回。 -- GitLab