Speed up Transformer inference (#1476)
* Add py-reader and parallel-executor support in Transformer inference * Add statick k, v cache for encoder output in Transformer inference * Replace the cache from compute_qkv with cahce from split_heads in Transformer inference * Fuse k, q, v projection in Transformer * Revert the fused k, q, v projection in Transformer to be compatible with saved models * Use gather_op to replace sequence_expand_op in Transformer inference * Add fluid_transformer.md * Refine README for released models and data in Transformer * Refine README for released models and data in Transformer
Showing
23.7 KB