[MKL-DNN] Advice on runtime context related crash in Transformer (#16841) · Issue · PaddlePaddle / Paddle

[MKL-DNN] Advice on runtime context related crash in Transformer

Created by: jczaja

When test_analyzer_transformer.profile_mkldnn is run with "--test_all_data" option there is a crash (matmul inputs has mismatch of dims). Problem is that when MKL-DNN is used (any op) then starting from third iteration (when Input data shapes will differ from Warmup and first iteration) then wrong Tensor is chosen for op to work.

Problem is that runtime context is having a Variables pointing to tensors "scale_0.tmp_0" that comes from previous iterations when shape of those Tensors was diffrent. At the same time scope does contain "scale_0_tmp_0" of good proper dims. As a test/workround I forced paddle to recreate runtime context each time. With this hack crash goes away (it all pass). Code I'm refering to (that was disabled to force recreate of runtime context): https://github.com/PaddlePaddle/Paddle/blob/4267a81afcab6ccc4d84eab8ffad0dff24fd8d65/paddle/fluid/framework/operator.cc#L897

@luotao1 Could you please advice/suggest on how to fix this problem?

PaddlePaddle / Paddle 大约 1 年 前同步成功

[MKL-DNN] Advice on runtime context related crash in Transformer

PaddlePaddle / Paddle
大约 1 年前同步成功