Created by: jczaja
This PR is introducing fused_embedding_fc_lstm_op along with corresponding pass.
General idea is to replace lookup table of Embeddings with lookup table of Weights_x * embeddings. That way we can skip W_x * x in LSTM equations, and replace it with lookup. W_x * embeddings is computed only once in a relevant pass using SGEMM.
Performance and functional testing was done using Test_Text_classification C-API test along with Senta model and data.txt input. Accuracy on provided input is the same as when using fusion_lstm.
Performance gain observed is that execution time relatively to fusion_lstm is better by ~10%. Eg. fused_embedding_fc_lstm execution takes ~90% of time of fusion_lstm.
Memory consumption is is higher ((4*hidden_size / embedding_size) times higher) . For used benchmark peak memory consumption was four times higher.
Notes & missing items:
- Unit tests will be implemented in next PR
- This PR does not invalidate fusion_lstm which still should be used for models that do not have embedding eg. CRNN CTC , DeepSpeech etc.
- BatchCompute path is enabled but was not validated (Test_Text_Classification is requiring Batch Size : 1.)
- implementation of operator is heavily based on fusion_lstm_op . It would be good to exclude common part in a future , not to produce redundant code
- Currently lookup table attributes : is_sparse and is_distributed are not supported (New op is not created when lookup op is not having either of mentioned attributes set)
- Similar concept could be implemented for GRU ops.