Malformed graph of ernie when ran with benchmark application (#21492) · Issue · PaddlePaddle / Paddle

Malformed graph of ernie when ran with benchmark application

Created by: Sand3r-

Current behaviour

The error has been discovered thanks to level 3 logging enabled by GLOG_v environmental variable. GLOG has reported, that:

Some operators use the same variables for reading/writing output. For example, when the fp32_model has been ran, one could observe that scale_op as well transpose2_op accept transpose_4.tmp_0 as their input to the operator (while according to the original graph they do not) Subpart of a GLOG error documenting this:

operator.cc:172 CPUPlace Op(scale), inputs:{X[transpose_4.tmp_0:float[1, 12, 128, 64]({})]}, outputs:{Out[scale_12.tmp_0:float[1, 12, 128, 64]({})]}.
(...several ops later...)
operator.cc:172 CPUPlace Op(transpose2), inputs:{X[transpose_4.tmp_0:float[1, 12, 128, 64]({})]}, outputs:{Out[fc_66.tmp_0:float[1, 128, 12, 64]({})], XShape[transpose_47.tmp_1:[0, 1, 12, 128, 64]({})]}.

As far as I understand that, this is a bug, since variable names should be unique (as long as they are enclosed in the same scope).

To illustrate the problem, please see the following figure depicting a different model (ernie_quant) which suffers from the same problem:

This is a blocking issue for INT8 Ernie quantization task, since our quantization system associates scales with variable names. And if the variable repeats in several places, we have end up with the same scales where we didn't mean to.

Reproduction

based on 8da0cd53 -CPU: including MKLDNN version v.20 -OS Platform Ubuntu 16.04 -Cmake orders -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_GPU=OFF -DON_INFER=ON -DWITH_MKLDNN=ON -DWITH_TESTING=ON -DWITH_PROFILER=ON -DWITH_STYLE_CHECK=OFF -DWITH_INFERENCE_API_TEST=ON -API information To Reproduce

Build paddle
Build benchmark Inference application for ernie https://github.com/PaddlePaddle/benchmark/tree/master/Inference/c%2B%2B/ernie Run any 4-input ernie model.

@luotao1 Could you please assign someone to help solving this issue?

PaddlePaddle / Paddle 大约 2 年 前同步成功

Malformed graph of ernie when ran with benchmark application

Current behaviour

Reproduction

PaddlePaddle / Paddle
大约 2 年前同步成功