提交 8db295fd 编写于 作者: T Travis CI

Deploy to GitHub Pages: 7384966f

上级 b489cf6c
...@@ -6,9 +6,10 @@ We thank @lipeng for the pull request that defined the model schemas and pretrai ...@@ -6,9 +6,10 @@ We thank @lipeng for the pull request that defined the model schemas and pretrai
## Introduction ### ## Introduction ###
### Chinese Word Dictionary ### ### Chinese Word Dictionary ###
Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of "《红楼梦》" is "《","红楼梦","》",and "《红楼梦》". Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206325, including 3 special token: Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of "《红楼梦》" is "《","红楼梦","》",and "《红楼梦》". Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206326, including 4 special token:
- `<s>`: the start of a sequence - `<s>`: the start of a sequence
- `<e>`: the end of a sequence - `<e>`: the end of a sequence
- `PALCEHOLDER_JUST_IGNORE_THE_EMBEDDING`: a placeholder, just ignore it and its embedding
- `<unk>`: a word not included in dictionary - `<unk>`: a word not included in dictionary
### Pretrained Chinese Word Embedding Model ### ### Pretrained Chinese Word Embedding Model ###
......
此差异已折叠。
...@@ -233,10 +233,11 @@ ...@@ -233,10 +233,11 @@
<span id="introduction"></span><h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline"></a></h2> <span id="introduction"></span><h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline"></a></h2>
<div class="section" id="chinese-word-dictionary"> <div class="section" id="chinese-word-dictionary">
<span id="chinese-word-dictionary"></span><h3>Chinese Word Dictionary<a class="headerlink" href="#chinese-word-dictionary" title="Permalink to this headline"></a></h3> <span id="chinese-word-dictionary"></span><h3>Chinese Word Dictionary<a class="headerlink" href="#chinese-word-dictionary" title="Permalink to this headline"></a></h3>
<p>Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of &#8220;《红楼梦》&#8221; is &#8220;&#8221;&#8221;红楼梦&#8221;&#8221;&#8221;,and &#8220;《红楼梦》&#8221;. Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206325, including 3 special token:</p> <p>Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of &#8220;《红楼梦》&#8221; is &#8220;&#8221;&#8221;红楼梦&#8221;&#8221;&#8221;,and &#8220;《红楼梦》&#8221;. Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206326, including 4 special token:</p>
<ul class="simple"> <ul class="simple">
<li><code class="docutils literal"><span class="pre">&lt;s&gt;</span></code>: the start of a sequence</li> <li><code class="docutils literal"><span class="pre">&lt;s&gt;</span></code>: the start of a sequence</li>
<li><code class="docutils literal"><span class="pre">&lt;e&gt;</span></code>: the end of a sequence</li> <li><code class="docutils literal"><span class="pre">&lt;e&gt;</span></code>: the end of a sequence</li>
<li><code class="docutils literal"><span class="pre">PALCEHOLDER_JUST_IGNORE_THE_EMBEDDING</span></code>: a placeholder, just ignore it and its embedding</li>
<li><code class="docutils literal"><span class="pre">&lt;unk&gt;</span></code>: a word not included in dictionary</li> <li><code class="docutils literal"><span class="pre">&lt;unk&gt;</span></code>: a word not included in dictionary</li>
</ul> </ul>
</div> </div>
......
...@@ -6,9 +6,10 @@ ...@@ -6,9 +6,10 @@
## 介绍 ### ## 介绍 ###
### 中文字典 ### ### 中文字典 ###
我们的字典使用内部的分词工具对百度知道和百度百科的语料进行分词后产生。分词风格如下: "《红楼梦》"将被分为 "《","红楼梦","》",和 "《红楼梦》"。字典采用UTF8编码,输出有2列:词本身和词频。字典共包含 3206325个词和3个特殊标记: 我们的字典使用内部的分词工具对百度知道和百度百科的语料进行分词后产生。分词风格如下: "《红楼梦》"将被分为 "《","红楼梦","》",和 "《红楼梦》"。字典采用UTF8编码,输出有2列:词本身和词频。字典共包含 3206326个词和4个特殊标记:
- `<s>`: 分词序列的开始 - `<s>`: 分词序列的开始
- `<e>`: 分词序列的结束 - `<e>`: 分词序列的结束
- `PALCEHOLDER_JUST_IGNORE_THE_EMBEDDING`: 占位符,没有实际意义
- `<unk>`: 未知词 - `<unk>`: 未知词
### 中文词向量的预训练模型 ### ### 中文词向量的预训练模型 ###
......
此差异已折叠。
...@@ -240,10 +240,11 @@ ...@@ -240,10 +240,11 @@
<span id="id2"></span><h2>介绍<a class="headerlink" href="#" title="永久链接至标题"></a></h2> <span id="id2"></span><h2>介绍<a class="headerlink" href="#" title="永久链接至标题"></a></h2>
<div class="section" id=""> <div class="section" id="">
<span id="id3"></span><h3>中文字典<a class="headerlink" href="#" title="永久链接至标题"></a></h3> <span id="id3"></span><h3>中文字典<a class="headerlink" href="#" title="永久链接至标题"></a></h3>
<p>我们的字典使用内部的分词工具对百度知道和百度百科的语料进行分词后产生。分词风格如下: &#8220;《红楼梦》&#8221;将被分为 &#8220;&#8221;&#8221;红楼梦&#8221;&#8221;&#8221;,和 &#8220;《红楼梦》&#8221;。字典采用UTF8编码,输出有2列:词本身和词频。字典共包含 3206325个词和3个特殊标记:</p> <p>我们的字典使用内部的分词工具对百度知道和百度百科的语料进行分词后产生。分词风格如下: &#8220;《红楼梦》&#8221;将被分为 &#8220;&#8221;&#8221;红楼梦&#8221;&#8221;&#8221;,和 &#8220;《红楼梦》&#8221;。字典采用UTF8编码,输出有2列:词本身和词频。字典共包含 3206326个词和4个特殊标记:</p>
<ul class="simple"> <ul class="simple">
<li><code class="docutils literal"><span class="pre">&lt;s&gt;</span></code>: 分词序列的开始</li> <li><code class="docutils literal"><span class="pre">&lt;s&gt;</span></code>: 分词序列的开始</li>
<li><code class="docutils literal"><span class="pre">&lt;e&gt;</span></code>: 分词序列的结束</li> <li><code class="docutils literal"><span class="pre">&lt;e&gt;</span></code>: 分词序列的结束</li>
<li><code class="docutils literal"><span class="pre">PALCEHOLDER_JUST_IGNORE_THE_EMBEDDING</span></code>: 占位符,没有实际意义</li>
<li><code class="docutils literal"><span class="pre">&lt;unk&gt;</span></code>: 未知词</li> <li><code class="docutils literal"><span class="pre">&lt;unk&gt;</span></code>: 未知词</li>
</ul> </ul>
</div> </div>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册