TODO list for Transformer.
Created by: lcy-seso
-
implement they layer normalization operator and the Python wrapper.
- CPU implementation.
- GPU implementation.
- python wrapper.
- enhance the matmul operator to support 4-D tensor as its inputs https://github.com/PaddlePaddle/Paddle/issues/7319. fixed by PR: https://github.com/PaddlePaddle/Paddle/pull/7656
-
prepare the dataset.
fixed by PR: https://github.com/PaddlePaddle/Paddle/pull/7661 - wrap the masked positional embedding.
- enhance the lookup_table operator to support the special token: padding index. https://github.com/PaddlePaddle/Paddle/issues/7309.
- wrap the multi-head dot product attention. This is different to ConvS2S.
- wrap the positional-wise feed-forward network.
- wrap the basic computation block.
- build the entire model.
- enhance the documentation of operators used in Transformer.
- add beam search for Transformer.
- clean codes and merge the entire project into the models repo (merge the work part by part).
- Learning Rate Scheduler
- Residual Dropout
-
Label Smoothing
- label smooth operator.
- python wrapper.
- Scaled Dot Product Attention
- Weight sharing between embedding and pre-softmax linear transformation layers