Created by: jczaja
This PR is to be discussed. oneDNN kernel of grad pool2d needs "MidOut" data , similar way like it is done for LRN op. So far instead of MidOut , oneDNN cache was used. But this is very complex especially when multi-threaded training is running. It will also reduce cache operations for dygraph.
Notes:
- MidOut is added regardless the execution (oneDNN, GPU, native CPU)
- MidOut data depends on oneDNN , so format is opaque, so we do not validate this data in UT.
@luotao1 Please take a look if this idea is acceptable.
PR types
Breaking changes
PR changes
Ops
Describe
oneDNN pool2d kernel of grad op is using data from FWD. This PR is adding MidOut tensor to store that data and make it available during grad execution.