Merge pull request #1800 from pengli09/emb_doc

The description for vocabulary file is not consistent with the latest file

Merge pull request #1800 from pengli09/emb_doc
The description for vocabulary file is not consistent with the latest file
7384966f · Peng LI · GitHub · c1b47b20 · bf1a4afb · 7384966f
隐藏空白更改
内联并排

Showing with 4 addition and 2 deletion

doc/tutorials/embedding_model/index_cn.md doc/tutorials/embedding_model/index_cn.md +2 -1

doc/tutorials/embedding_model/index_en.md doc/tutorials/embedding_model/index_en.md +2 -1

未找到文件。
--- a/doc/tutorials/embedding_model/index_cn.md
+++ b/doc/tutorials/embedding_model/index_cn.md
@@ -6,9 +6,10 @@

 ## 介绍 ###
 ### 中文字典 ###
-我们的字典使用内部的分词工具对百度知道和百度百科的语料进行分词后产生。分词风格如下： "《红楼梦》"将被分为 "《"，"红楼梦"，"》"，和 "《红楼梦》"。字典采用UTF8编码，输出有2列：词本身和词频。字典共包含 3206325个词和3个特殊标记：
+我们的字典使用内部的分词工具对百度知道和百度百科的语料进行分词后产生。分词风格如下： "《红楼梦》"将被分为 "《"，"红楼梦"，"》"，和 "《红楼梦》"。字典采用UTF8编码，输出有2列：词本身和词频。字典共包含 3206326个词和4个特殊标记：
  - `<s>`: 分词序列的开始
  - `<e>`: 分词序列的结束
+  - `PALCEHOLDER_JUST_IGNORE_THE_EMBEDDING`: 占位符，没有实际意义
  - `<unk>`: 未知词

 ### 中文词向量的预训练模型 ###

--- a/doc/tutorials/embedding_model/index_en.md
+++ b/doc/tutorials/embedding_model/index_en.md
@@ -6,9 +6,10 @@ We thank @lipeng for the pull request that defined the model schemas and pretrai

 ## Introduction ###
 ### Chinese Word Dictionary ###
-Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of "《红楼梦》" is "《"，"红楼梦"，"》"，and "《红楼梦》". Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206325, including 3 special token:
+Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of "《红楼梦》" is "《"，"红楼梦"，"》"，and "《红楼梦》". Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206326, including 4 special token:
  - `<s>`: the start of a sequence
  - `<e>`: the end of a sequence
+  - `PALCEHOLDER_JUST_IGNORE_THE_EMBEDDING`: a placeholder, just ignore it and its embedding
  - `<unk>`: a word not included in dictionary

 ### Pretrained Chinese Word Embedding Model ###