Why the GRU layer does not include weighing of the input?
Created by: wojtuss
The dynamic GRU algorithm described in the documentation (http://paddlepaddle.org/docs/0.14.0/api/fluid/en/layers.html#dynamic-gru) includes multiplication of the input xt by weight matrices Wx, but the PaddlePaddle's implementation of the operator does not include this operation.
Here are my questions:
- Why is that?
- Could the usual FC layer (actually the MUL operator) preceding the dynamic GRU op be incorporated into GRU in nn.py?
- Could the combination of two FC layers with two GRUs for the two opposite directions (like in the CRNN-CTC model) be joined in nn.py into a single bidirectional variant of the GRU op?
Changes proposed in 2. and 3. would allow for significant optimization using a single MKL-DNN's GRU operator.