Adding fused_embedding_fc_lstm op (!13553) · 合并请求 · PaddlePaddle / Paddle

Adding fused_embedding_fc_lstm op !13553

Created by: jczaja

This PR is introducing fused_embedding_fc_lstm_op along with corresponding pass.

General idea is to replace lookup table of Embeddings with lookup table of Weights_x * embeddings. That way we can skip W_x * x in LSTM equations, and replace it with lookup. W_x * embeddings is computed only once in a relevant pass using SGEMM.

Performance and functional testing was done using Test_Text_classification C-API test along with Senta model and data.txt input. Accuracy on provided input is the same as when using fusion_lstm.

Performance gain observed is that execution time relatively to fusion_lstm is better by ~10%. Eg. fused_embedding_fc_lstm execution takes ~90% of time of fusion_lstm.

Memory consumption is is higher ((4*hidden_size / embedding_size) times higher) . For used benchmark peak memory consumption was four times higher.

Notes & missing items:

Unit tests will be implemented in next PR
This PR does not invalidate fusion_lstm which still should be used for models that do not have embedding eg. CRNN CTC , DeepSpeech etc.
BatchCompute path is enabled but was not validated (Test_Text_Classification is requiring Batch Size : 1.)
implementation of operator is heavily based on fusion_lstm_op . It would be good to exclude common part in a future , not to produce redundant code
Currently lookup table attributes : is_sparse and is_distributed are not supported (New op is not created when lookup op is not having either of mentioned attributes set)
Similar concept could be implemented for GRU ops.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Adding fused_embedding_fc_lstm op !13553

PaddlePaddle / Paddle
大约 1 年前同步成功