提交 a741056d 编写于 作者: T Travis CI

Deploy to GitHub Pages: 45280a07

上级 aafb7043
...@@ -260,7 +260,143 @@ out = recurrent_group(step=outer_step, input=SubsequenceInput(emb)) ...@@ -260,7 +260,143 @@ out = recurrent_group(step=outer_step, input=SubsequenceInput(emb))
## 示例3:双进双出,输入不等长 ## 示例3:双进双出,输入不等长
TBD **输入不等长**是指recurrent_group的多个输入在各时刻的长度可以不相等, 但需要指定一个和输出长度一致的input,用<font color="red">targetInlink</font>表示。参考配置:单层RNN(`sequence_rnn_multi_unequalength_inputs.conf`),双层RNN(`sequence_nest_rnn_multi_unequalength_inputs.conf`)
### 读取双层序列的方法
我们看一下单双层序列的数据组织形式和dataprovider(见`rnn_data_provider.py`)
```python
data2 = [
[[[1, 2], [4, 5, 2]], [[5, 4, 1], [3, 1]] ,0],
[[[0, 2], [2, 5], [0, 1, 2]],[[1, 5], [4], [2, 3, 6, 1]], 1],
]
@provider(input_types=[integer_value_sub_sequence(10),
integer_value_sub_sequence(10),
integer_value(2)],
should_shuffle=False)
def process_unequalength_subseq(settings, file_name): #双层RNN的dataprovider
for d in data2:
yield d
@provider(input_types=[integer_value_sequence(10),
integer_value_sequence(10),
integer_value(2)],
should_shuffle=False)
def process_unequalength_seq(settings, file_name): #单层RNN的dataprovider
for d in data2:
words1=reduce(lambda x,y: x+y, d[0])
words2=reduce(lambda x,y: x+y, d[1])
yield words1, words2, d[2]
```
data2 中有两个样本,每个样本有两个特征, 记fea1, fea2。
- 单层序列:两个样本分别为[[1, 2, 4, 5, 2], [5, 4, 1, 3, 1]] 和 [[0, 2, 2, 5, 0, 1, 2], [1, 5, 4, 2, 3, 6, 1]]
- 双层序列:两个样本分别为
- **样本1**:[[[1, 2], [4, 5, 2]], [[5, 4, 1], [3, 1]]]。fea1和fea2都分别有2个子句,fea1=[[1, 2], [4, 5, 2]], fea2=[[5, 4, 1], [3, 1]]
- **样本2**:[[[0, 2], [2, 5], [0, 1, 2]],[[1, 5], [4], [2, 3, 6, 1]]]。fea1和fea2都分别有3个子句, fea1=[[0, 2], [2, 5], [0, 1, 2]], fea2=[[1, 5], [4], [2, 3, 6, 1]]。<br/>
- **注意**:每个样本中,各特征的子句数目需要相等。这里说的“双进双出,输入不等长”是指fea1在i时刻的输入的长度可以不等于fea2在i时刻的输入的长度。如对于第1个样本,时刻i=2, fea1[2]=[4, 5, 2],fea2[2]=[3, 1],3≠2。
- 单双层序列中,两个样本的label都分别是0和1
### 模型中的配置
单层RNN(`sequence_rnn_multi_unequalength_inputs.conf`)和双层RNN(`sequence_nest_rnn_multi_unequalength_inputs.conf`)两个模型配置达到的效果完全一样,区别只在于输入为单层还是双层序列,现在我们来看它们内部分别是如何实现的。
- 单层序列:
- 过了一个简单的recurrent_group。每一个时间步,当前的输入y和上一个时间步的输出rnn_state做了一个全连接,功能与示例2中`sequence_rnn.conf`的`step`函数完全相同。这里,两个输入x1,x2分别通过calrnn返回最后时刻的状态。结果得到的encoder1_rep和encoder2_rep分别是单层序列,最后取encoder1_rep的最后一个时刻和encoder2_rep的所有时刻分别相加得到context。
- 注意到这里recurrent_group输入的每个样本中,fea1和fea2的长度都分别相等,这并非偶然,而是因为recurrent_group要求输入为单层序列时,所有输入的长度都必须相等。
```python
def step(x1, x2):
def calrnn(y):
mem = memory(name = 'rnn_state_' + y.name, size = hidden_dim)
out = fc_layer(input = [y, mem],
size = hidden_dim,
act = TanhActivation(),
bias_attr = True,
name = 'rnn_state_' + y.name)
return out
encoder1 = calrnn(x1)
encoder2 = calrnn(x2)
return [encoder1, encoder2]
encoder1_rep, encoder2_rep = recurrent_group(
name="stepout",
step=step,
input=[emb1, emb2])
encoder1_last = last_seq(input = encoder1_rep)
encoder1_expandlast = expand_layer(input = encoder1_last,
expand_as = encoder2_rep)
context = mixed_layer(input = [identity_projection(encoder1_expandlast),
identity_projection(encoder2_rep)],
size = hidden_dim)
```
- 双层序列:
- 双层RNN中,对输入的两个特征分别求时序上的连续全连接(`inner_step1`和`inner_step2`分别处理fea1和fea2),其功能与示例2中`sequence_nest_rnn.conf`的`outer_step`函数完全相同。不同之处是,此时输入`[SubsequenceInput(emb1), SubsequenceInput(emb2)]`在各时刻并不等长。
- 函数`outer_step`中可以分别处理这两个特征,但我们需要用<font color=red>targetInlink</font>指定recurrent_group的输出的格式(各子句长度)只能和其中一个保持一致,如这里选择了和emb2的长度一致。
- 最后,依然是取encoder1_rep的最后一个时刻和encoder2_rep的所有时刻分别相加得到context。
```python
def outer_step(x1, x2):
outer_mem1 = memory(name = "outer_rnn_state1", size = hidden_dim)
outer_mem2 = memory(name = "outer_rnn_state2", size = hidden_dim)
def inner_step1(y):
inner_mem = memory(name = 'inner_rnn_state_' + y.name,
size = hidden_dim,
boot_layer = outer_mem1)
out = fc_layer(input = [y, inner_mem],
size = hidden_dim,
act = TanhActivation(),
bias_attr = True,
name = 'inner_rnn_state_' + y.name)
return out
def inner_step2(y):
inner_mem = memory(name = 'inner_rnn_state_' + y.name,
size = hidden_dim,
boot_layer = outer_mem2)
out = fc_layer(input = [y, inner_mem],
size = hidden_dim,
act = TanhActivation(),
bias_attr = True,
name = 'inner_rnn_state_' + y.name)
return out
encoder1 = recurrent_group(
step = inner_step1,
name = 'inner1',
input = x1)
encoder2 = recurrent_group(
step = inner_step2,
name = 'inner2',
input = x2)
sentence_last_state1 = last_seq(input = encoder1, name = 'outer_rnn_state1')
sentence_last_state2_ = last_seq(input = encoder2, name = 'outer_rnn_state2')
encoder1_expand = expand_layer(input = sentence_last_state1,
expand_as = encoder2)
return [encoder1_expand, encoder2]
encoder1_rep, encoder2_rep = recurrent_group(
name="outer",
step=outer_step,
input=[SubsequenceInput(emb1), SubsequenceInput(emb2)],
targetInlink=emb2)
encoder1_last = last_seq(input = encoder1_rep)
encoder1_expandlast = expand_layer(input = encoder1_last,
expand_as = encoder2_rep)
context = mixed_layer(input = [identity_projection(encoder1_expandlast),
identity_projection(encoder2_rep)],
size = hidden_dim)
```
## 示例4:beam_search的生成 ## 示例4:beam_search的生成
......
...@@ -333,7 +333,150 @@ var _hmt = _hmt || []; ...@@ -333,7 +333,150 @@ var _hmt = _hmt || [];
</div> </div>
<div class="section" id=""> <div class="section" id="">
<span id="id6"></span><h2>示例3:双进双出,输入不等长<a class="headerlink" href="#" title="Permalink to this headline"></a></h2> <span id="id6"></span><h2>示例3:双进双出,输入不等长<a class="headerlink" href="#" title="Permalink to this headline"></a></h2>
<p>TBD</p> <p><strong>输入不等长</strong>是指recurrent_group的多个输入在各时刻的长度可以不相等, 但需要指定一个和输出长度一致的input,用<font color="red">targetInlink</font>表示。参考配置:单层RNN(<code class="docutils literal"><span class="pre">sequence_rnn_multi_unequalength_inputs.conf</span></code>),双层RNN(<code class="docutils literal"><span class="pre">sequence_nest_rnn_multi_unequalength_inputs.conf</span></code></p>
<div class="section" id="">
<span id="id7"></span><h3>读取双层序列的方法<a class="headerlink" href="#" title="Permalink to this headline"></a></h3>
<p>我们看一下单双层序列的数据组织形式和dataprovider(见<code class="docutils literal"><span class="pre">rnn_data_provider.py</span></code></p>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">data2</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">[[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">2</span><span class="p">]],</span> <span class="p">[[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">]]</span> <span class="p">,</span><span class="mi">0</span><span class="p">],</span>
<span class="p">[[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">]],[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">1</span><span class="p">]],</span> <span class="mi">1</span><span class="p">],</span>
<span class="p">]</span>
<span class="nd">@provider</span><span class="p">(</span><span class="n">input_types</span><span class="o">=</span><span class="p">[</span><span class="n">integer_value_sub_sequence</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span>
<span class="n">integer_value_sub_sequence</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span>
<span class="n">integer_value</span><span class="p">(</span><span class="mi">2</span><span class="p">)],</span>
<span class="n">should_shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">process_unequalength_subseq</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="n">file_name</span><span class="p">):</span> <span class="c1">#双层RNN的dataprovider</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data2</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">d</span>
<span class="nd">@provider</span><span class="p">(</span><span class="n">input_types</span><span class="o">=</span><span class="p">[</span><span class="n">integer_value_sequence</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span>
<span class="n">integer_value_sequence</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span>
<span class="n">integer_value</span><span class="p">(</span><span class="mi">2</span><span class="p">)],</span>
<span class="n">should_shuffle</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">process_unequalength_seq</span><span class="p">(</span><span class="n">settings</span><span class="p">,</span> <span class="n">file_name</span><span class="p">):</span> <span class="c1">#单层RNN的dataprovider</span>
<span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data2</span><span class="p">:</span>
<span class="n">words1</span><span class="o">=</span><span class="nb">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">:</span> <span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="p">,</span> <span class="n">d</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">words2</span><span class="o">=</span><span class="nb">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">:</span> <span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="p">,</span> <span class="n">d</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">yield</span> <span class="n">words1</span><span class="p">,</span> <span class="n">words2</span><span class="p">,</span> <span class="n">d</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
</pre></div>
</div>
<p>data2 中有两个样本,每个样本有两个特征, 记fea1, fea2。</p>
<ul class="simple">
<li>单层序列:两个样本分别为[[1, 2, 4, 5, 2], [5, 4, 1, 3, 1]] 和 [[0, 2, 2, 5, 0, 1, 2], [1, 5, 4, 2, 3, 6, 1]]</li>
<li>双层序列:两个样本分别为<ul>
<li><strong>样本1</strong>:[[[1, 2], [4, 5, 2]], [[5, 4, 1], [3, 1]]]。fea1和fea2都分别有2个子句,fea1=[[1, 2], [4, 5, 2]], fea2=[[5, 4, 1], [3, 1]]</li>
<li><strong>样本2</strong>:[[[0, 2], [2, 5], [0, 1, 2]],[[1, 5], [4], [2, 3, 6, 1]]]。fea1和fea2都分别有3个子句, fea1=[[0, 2], [2, 5], [0, 1, 2]], fea2=[[1, 5], [4], [2, 3, 6, 1]]。<br/></li>
<li><strong>注意</strong>:每个样本中,各特征的子句数目需要相等。这里说的“双进双出,输入不等长”是指fea1在i时刻的输入的长度可以不等于fea2在i时刻的输入的长度。如对于第1个样本,时刻i=2, fea1[2]=[4, 5, 2],fea2[2]=[3, 1],3≠2。</li>
</ul>
</li>
<li>单双层序列中,两个样本的label都分别是0和1</li>
</ul>
</div>
<div class="section" id="">
<span id="id8"></span><h3>模型中的配置<a class="headerlink" href="#" title="Permalink to this headline"></a></h3>
<p>单层RNN(<code class="docutils literal"><span class="pre">sequence_rnn_multi_unequalength_inputs.conf</span></code>)和双层RNN(<code class="docutils literal"><span class="pre">sequence_nest_rnn_multi_unequalength_inputs.conf</span></code>)两个模型配置达到的效果完全一样,区别只在于输入为单层还是双层序列,现在我们来看它们内部分别是如何实现的。</p>
<ul class="simple">
<li>单层序列:<ul>
<li>过了一个简单的recurrent_group。每一个时间步,当前的输入y和上一个时间步的输出rnn_state做了一个全连接,功能与示例2中<code class="docutils literal"><span class="pre">sequence_rnn.conf</span></code><code class="docutils literal"><span class="pre">step</span></code>函数完全相同。这里,两个输入x1,x2分别通过calrnn返回最后时刻的状态。结果得到的encoder1_rep和encoder2_rep分别是单层序列,最后取encoder1_rep的最后一个时刻和encoder2_rep的所有时刻分别相加得到context。</li>
<li>注意到这里recurrent_group输入的每个样本中,fea1和fea2的长度都分别相等,这并非偶然,而是因为recurrent_group要求输入为单层序列时,所有输入的长度都必须相等。</li>
</ul>
</li>
</ul>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">step</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">calrnn</span><span class="p">(</span><span class="n">y</span><span class="p">):</span>
<span class="n">mem</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;rnn_state_&#39;</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="p">[</span><span class="n">y</span><span class="p">,</span> <span class="n">mem</span><span class="p">],</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">,</span>
<span class="n">act</span> <span class="o">=</span> <span class="n">TanhActivation</span><span class="p">(),</span>
<span class="n">bias_attr</span> <span class="o">=</span> <span class="bp">True</span><span class="p">,</span>
<span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;rnn_state_&#39;</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="k">return</span> <span class="n">out</span>
<span class="n">encoder1</span> <span class="o">=</span> <span class="n">calrnn</span><span class="p">(</span><span class="n">x1</span><span class="p">)</span>
<span class="n">encoder2</span> <span class="o">=</span> <span class="n">calrnn</span><span class="p">(</span><span class="n">x2</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="n">encoder1</span><span class="p">,</span> <span class="n">encoder2</span><span class="p">]</span>
<span class="n">encoder1_rep</span><span class="p">,</span> <span class="n">encoder2_rep</span> <span class="o">=</span> <span class="n">recurrent_group</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s2">&quot;stepout&quot;</span><span class="p">,</span>
<span class="n">step</span><span class="o">=</span><span class="n">step</span><span class="p">,</span>
<span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">emb1</span><span class="p">,</span> <span class="n">emb2</span><span class="p">])</span>
<span class="n">encoder1_last</span> <span class="o">=</span> <span class="n">last_seq</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="n">encoder1_rep</span><span class="p">)</span>
<span class="n">encoder1_expandlast</span> <span class="o">=</span> <span class="n">expand_layer</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="n">encoder1_last</span><span class="p">,</span>
<span class="n">expand_as</span> <span class="o">=</span> <span class="n">encoder2_rep</span><span class="p">)</span>
<span class="n">context</span> <span class="o">=</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="p">[</span><span class="n">identity_projection</span><span class="p">(</span><span class="n">encoder1_expandlast</span><span class="p">),</span>
<span class="n">identity_projection</span><span class="p">(</span><span class="n">encoder2_rep</span><span class="p">)],</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">)</span>
</pre></div>
</div>
<ul class="simple">
<li>双层序列:<ul>
<li>双层RNN中,对输入的两个特征分别求时序上的连续全连接(<code class="docutils literal"><span class="pre">inner_step1</span></code><code class="docutils literal"><span class="pre">inner_step2</span></code>分别处理fea1和fea2),其功能与示例2中<code class="docutils literal"><span class="pre">sequence_nest_rnn.conf</span></code><code class="docutils literal"><span class="pre">outer_step</span></code>函数完全相同。不同之处是,此时输入<code class="docutils literal"><span class="pre">[SubsequenceInput(emb1),</span> <span class="pre">SubsequenceInput(emb2)]</span></code>在各时刻并不等长。</li>
<li>函数<code class="docutils literal"><span class="pre">outer_step</span></code>中可以分别处理这两个特征,但我们需要用<font color=red>targetInlink</font>指定recurrent_group的输出的格式(各子句长度)只能和其中一个保持一致,如这里选择了和emb2的长度一致。</li>
<li>最后,依然是取encoder1_rep的最后一个时刻和encoder2_rep的所有时刻分别相加得到context。</li>
</ul>
</li>
</ul>
<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">outer_step</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">):</span>
<span class="n">outer_mem1</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s2">&quot;outer_rnn_state1&quot;</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">)</span>
<span class="n">outer_mem2</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s2">&quot;outer_rnn_state2&quot;</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">inner_step1</span><span class="p">(</span><span class="n">y</span><span class="p">):</span>
<span class="n">inner_mem</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;inner_rnn_state_&#39;</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">name</span><span class="p">,</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">,</span>
<span class="n">boot_layer</span> <span class="o">=</span> <span class="n">outer_mem1</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="p">[</span><span class="n">y</span><span class="p">,</span> <span class="n">inner_mem</span><span class="p">],</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">,</span>
<span class="n">act</span> <span class="o">=</span> <span class="n">TanhActivation</span><span class="p">(),</span>
<span class="n">bias_attr</span> <span class="o">=</span> <span class="bp">True</span><span class="p">,</span>
<span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;inner_rnn_state_&#39;</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="k">return</span> <span class="n">out</span>
<span class="k">def</span> <span class="nf">inner_step2</span><span class="p">(</span><span class="n">y</span><span class="p">):</span>
<span class="n">inner_mem</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;inner_rnn_state_&#39;</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">name</span><span class="p">,</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">,</span>
<span class="n">boot_layer</span> <span class="o">=</span> <span class="n">outer_mem2</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="p">[</span><span class="n">y</span><span class="p">,</span> <span class="n">inner_mem</span><span class="p">],</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">,</span>
<span class="n">act</span> <span class="o">=</span> <span class="n">TanhActivation</span><span class="p">(),</span>
<span class="n">bias_attr</span> <span class="o">=</span> <span class="bp">True</span><span class="p">,</span>
<span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;inner_rnn_state_&#39;</span> <span class="o">+</span> <span class="n">y</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="k">return</span> <span class="n">out</span>
<span class="n">encoder1</span> <span class="o">=</span> <span class="n">recurrent_group</span><span class="p">(</span>
<span class="n">step</span> <span class="o">=</span> <span class="n">inner_step1</span><span class="p">,</span>
<span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;inner1&#39;</span><span class="p">,</span>
<span class="nb">input</span> <span class="o">=</span> <span class="n">x1</span><span class="p">)</span>
<span class="n">encoder2</span> <span class="o">=</span> <span class="n">recurrent_group</span><span class="p">(</span>
<span class="n">step</span> <span class="o">=</span> <span class="n">inner_step2</span><span class="p">,</span>
<span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;inner2&#39;</span><span class="p">,</span>
<span class="nb">input</span> <span class="o">=</span> <span class="n">x2</span><span class="p">)</span>
<span class="n">sentence_last_state1</span> <span class="o">=</span> <span class="n">last_seq</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="n">encoder1</span><span class="p">,</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;outer_rnn_state1&#39;</span><span class="p">)</span>
<span class="n">sentence_last_state2_</span> <span class="o">=</span> <span class="n">last_seq</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="n">encoder2</span><span class="p">,</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">&#39;outer_rnn_state2&#39;</span><span class="p">)</span>
<span class="n">encoder1_expand</span> <span class="o">=</span> <span class="n">expand_layer</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="n">sentence_last_state1</span><span class="p">,</span>
<span class="n">expand_as</span> <span class="o">=</span> <span class="n">encoder2</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="n">encoder1_expand</span><span class="p">,</span> <span class="n">encoder2</span><span class="p">]</span>
<span class="n">encoder1_rep</span><span class="p">,</span> <span class="n">encoder2_rep</span> <span class="o">=</span> <span class="n">recurrent_group</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s2">&quot;outer&quot;</span><span class="p">,</span>
<span class="n">step</span><span class="o">=</span><span class="n">outer_step</span><span class="p">,</span>
<span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">SubsequenceInput</span><span class="p">(</span><span class="n">emb1</span><span class="p">),</span> <span class="n">SubsequenceInput</span><span class="p">(</span><span class="n">emb2</span><span class="p">)],</span>
<span class="n">targetInlink</span><span class="o">=</span><span class="n">emb2</span><span class="p">)</span>
<span class="n">encoder1_last</span> <span class="o">=</span> <span class="n">last_seq</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="n">encoder1_rep</span><span class="p">)</span>
<span class="n">encoder1_expandlast</span> <span class="o">=</span> <span class="n">expand_layer</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="n">encoder1_last</span><span class="p">,</span>
<span class="n">expand_as</span> <span class="o">=</span> <span class="n">encoder2_rep</span><span class="p">)</span>
<span class="n">context</span> <span class="o">=</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="nb">input</span> <span class="o">=</span> <span class="p">[</span><span class="n">identity_projection</span><span class="p">(</span><span class="n">encoder1_expandlast</span><span class="p">),</span>
<span class="n">identity_projection</span><span class="p">(</span><span class="n">encoder2_rep</span><span class="p">)],</span>
<span class="n">size</span> <span class="o">=</span> <span class="n">hidden_dim</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div> </div>
<div class="section" id="beam-search"> <div class="section" id="beam-search">
<span id="beam-search"></span><h2>示例4:beam_search的生成<a class="headerlink" href="#beam-search" title="Permalink to this headline"></a></h2> <span id="beam-search"></span><h2>示例4:beam_search的生成<a class="headerlink" href="#beam-search" title="Permalink to this headline"></a></h2>
...@@ -360,7 +503,11 @@ var _hmt = _hmt || []; ...@@ -360,7 +503,11 @@ var _hmt = _hmt || [];
<li><a class="reference internal" href="#">模型中的配置</a></li> <li><a class="reference internal" href="#">模型中的配置</a></li>
</ul> </ul>
</li> </li>
<li><a class="reference internal" href="#">示例3:双进双出,输入不等长</a></li> <li><a class="reference internal" href="#">示例3:双进双出,输入不等长</a><ul>
<li><a class="reference internal" href="#">读取双层序列的方法</a></li>
<li><a class="reference internal" href="#">模型中的配置</a></li>
</ul>
</li>
<li><a class="reference internal" href="#beam-search">示例4:beam_search的生成</a></li> <li><a class="reference internal" href="#beam-search">示例4:beam_search的生成</a></li>
</ul> </ul>
</li> </li>
......
因为 它太大了无法显示 source diff 。你可以改为 查看blob
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册