Created by: jczaja
This is first PR trying to reduce MKL-DNN memory consumption to the level quite close to native CPU Paddle implementation. Relevant discussion takes place at #21493.
Changes introduced are very limited e.g.
- only inference of fp32 is supported (no int8 support yet)
- Memory savings happen only when original model params size is the same as DNNL weights size.
Both of limitations can be overcomed with more changes in core of PaddlePaddle.
Model | Peak memory cosumption : develop | Peak memory consumption: this PR |
---|---|---|
googlenetv1 | 600 MB | 580 MB |
demark | 622 MB | 600 MB |
mobilenetv1 | 640 MB | 580 MB |
I would appreciate if @LeoZhao-Intel and @zhangting2020 would express their opinion on those changes. Please tell me if this kind of changes are conceptually fine to you. In short : original model's param is replaced with DNNL's params.
CI can only pass with approval.