diff --git a/doc/fluid/design/mkldnn/acquire_api/acquire_api.md b/doc/fluid/design/mkldnn/acquire_api/acquire_api.md new file mode 100644 index 0000000000000000000000000000000000000000..8e14f9b233117c83318b03242169f86999703bf5 --- /dev/null +++ b/doc/fluid/design/mkldnn/acquire_api/acquire_api.md @@ -0,0 +1,76 @@ +# Design Doc: MKL-DNN Acquire API + +MKL-DNN kernels that are using MKL-DNN API tend to be quite complex due to: +* number of MKL-DNN api calls needed, which in fact are mostly repeated across all MKL-DNN kernels +* caching mechanism of MKL-DNN objects (conceptually the same across all paddle MKL-DNN kernels) +* still evolving MKL-DNN API which makes paddle MKL-DNN kernels difficult to maintain + +Hence Acquire API was created to wrap around MKL-DNN API that address above defined issues. + +### Common functionality +Each MKL-DNN kernel is essentially creating MKL-DNN memory objects followed by creation of MKL-DNN computational primitives and as a last step, execution +of created MKL-DNN primitives is triggered. Creation of mentioned MKL-DNN primitives require at least few calls to MKL-DNN API (for each MKL-DNN object) and code is much more complex when caching of created objects is added. Moreover code is pretty similar across MKL-DNN kernels, hence Acquire API was designed to provide easy to use way of creating and caching mentioned MKL-DNN objects. Having common code implemented inside Acquire API, to be used in operators, require less effort when creating given operator. It also makes integration of MKL-DNN kernels shorter and less prone to errors. + +### Details of Acquire API +Basic element of Acquire API is so called Handler. There is Basic MKLDNNHandler class which is implementing a code common to all operators using Acquire API . On the picture below rightmost nodes (Nodes grouped with "Basic MKLDNNHandler") represent common functionality used by Softmax and activation MKL-DNN kernels. Apart from basic MKLDNNHandler, there are derived handlers that are implementing functionality that is specific to given operator eg. Constructing caching key for given operator and add some non-standard function for getting workspace memory objects (Nodes grouped with "Derived handlers"). Leftmost nodes are entry functions (Compute) of Softmax and activation MKL-DNN kernels. + +![](images/acquire.svg) + +Caching MKL-DNN objects is already implemented in Basic MKLDNNHandler, so most of the time when implementing derived handler you do not have to consider caching. + +### Usage of Acquire MKL-DNN for MKL-DNN kernels implementation + +#### 1. Creating MKLDNNHandler +As a first step one need to create derived handler for his target MKL-DNN kernel (operator). For LRN op it would be LRNMKLDNNHandler that inherits from MKLDNNHandlerT. +Goal of derived handler is to provide operator specific functionality: creating key to caching, creation of Forward and Backward MKL-DNN primitive descriptors. +It is best to look into existing examples of derived handlers and implement new one by analogy. + +Example code of calling created LRN MKLDNNHandler: + + const float alpha = ctx.Attr("alpha") * static_cast(n); + const float beta = ctx.Attr("beta"); + const float k = ctx.Attr("k"); + bool is_test = ctx.Attr("is_test"); + + auto dims = paddle::framework::vectorize(x->dims()); + + platform::LRNMKLDNNHandler handler(dims, n, alpha, beta, k, x->format(), + is_test, dev_ctx, ctx.GetPlace(), + ctx.op().Output("Out")); + +#### 2. Creating MKL-DNN Memory objects +Once we have a derived handler, then it is time to get needed MKL-DNN memory objects. Memory objects either can wrap Tensor data or allocate data on its own. +Family of functions to get Memory objects are: +* AcquireSrcMemory +* AcquireDstMemory +* AcquireDiffDstMemory +* etc... + +They do expect Tensor to be passed as a parameter to each of them so then MKL-DNN memory object is wrapping Tensor (recommended way). If this is not possible +like in a case of some of workspace memory objects then avoiding passing Tensor will trigger creation of MKL-DNN memory object with its own allocation. + +Example usage based on LRN MKL-DNN kernel: + + auto src_memory = handler.AcquireSrcMemory(x); // x is input tensor of LRN + auto dst_memory = handler.AcquireDstMemory(out); // out is output tensor of LRN + +#### 3. Creating MKL-DNN computational primitives +Once We got Handler and MKL-DNN memory objects then we are to get computational MKL-DNN primitive. This is done with AcquireForwardPrimitive (For forward pass op) and AcquireBackwardPrimitive (for grad pass op). + +Example usage based on LRN MKL-DNN kernel: + + lrn_p = handler.AcquireForwardPrimitive(*src_memory, *dst_memory); + +#### 4. Execution of MKL-DNN computational primitives +Having memory objects and computational primitive we may trigger its execution . Example for LRN op: + + std::vector pipeline = {*lrn_p}; + mkldnn::stream(mkldnn::stream::kind::eager).submit(pipeline).wait(); + +#### 5. Registering MKL-DNN memory format in corresponding Tensor +Last step is to register MKL-DNN output memory object format inside of Output tensor eg. set Tensor::format_ to MKL-DNN enum that corresponds the way Tensor data is arranged (NCHW, NCHW16C etc.) This enum can be taken from dst memory object (wrapper to Output tensor) in Forward pass or from diff_src memory object (wrapper to X_grad Tensor). + +Example of registring MKL-DNN format in output tensor: + + out->set_layout(framework::DataLayout::kMKLDNN); + out->set_format(platform::GetMKLDNNFormat(*dst_memory)); diff --git a/doc/fluid/design/mkldnn/acquire_api/images/acquire.svg b/doc/fluid/design/mkldnn/acquire_api/images/acquire.svg new file mode 100644 index 0000000000000000000000000000000000000000..a304a6c8ad84bde9b9eb28341d6f4f173b32d698 --- /dev/null +++ b/doc/fluid/design/mkldnn/acquire_api/images/acquire.svg @@ -0,0 +1,111 @@ + + + + + + +%3 + +cluster_A + +Derived Handlers + +cluster_B + +Base MKLDNNHandler + + +Node0x490c380 + +SoftmaxMKLDNNKernel::Compute() + + +Node0x4915e90 + +SoftmaxMKLDNNHandler::SoftmaxMKLDNNHandler<forward>() + + +Node0x490c380->Node0x4915e90 + + + + +Node0x49164c0 + +MKLDNNHandlerT::AcquireSrcMemory() + + +Node0x490c380->Node0x49164c0 + + + + +Dst + +MKLDNNHandlerT::AcquireDstMemory() + + +Node0x490c380->Dst + + + + +Node0x491bca0 + +MKLDNNHandlerT::AcquireForwardPrimitive() + + +Node0x490c380->Node0x491bca0 + + + + +Node0x4ab38f0 + +MKLDNNActivationKernel::Compute() + + +Node0x4b2e4f0 + +ActivationMKLDNNHandler::ActivationMKLDNNHandler<forward>() + + +Node0x4ab38f0->Node0x4b2e4f0 + + + + +Node0x4ab38f0->Node0x49164c0 + + + + +Node0x4ab38f0->Dst + + + + +Node0x4ab38f0->Node0x491bca0 + + + + +Node0x496cfc0 + +MKLDNNHandlerT::AcquireForwardPrimitiveDescriptor() + + +Node0x4915e90->Node0x496cfc0 + + + + +Node0x4b2e4f0->Node0x496cfc0 + + + + + diff --git a/doc/fluid/design/mkldnn/acquire_api/index_en.rst b/doc/fluid/design/mkldnn/acquire_api/index_en.rst new file mode 100644 index 0000000000000000000000000000000000000000..1038704070862a3c6a5c98df156260fe0bca6e47 --- /dev/null +++ b/doc/fluid/design/mkldnn/acquire_api/index_en.rst @@ -0,0 +1,7 @@ +MKL-DNN Acquire API +-------------------------------------- + +.. toctree:: + :maxdepth: 1 + + acquire_api.md diff --git a/doc/fluid/design/mkldnn/acquire_api/scripts/acquire.dot b/doc/fluid/design/mkldnn/acquire_api/scripts/acquire.dot new file mode 100644 index 0000000000000000000000000000000000000000..3332fc6eb3def2743408845d7c09630034e69a66 --- /dev/null +++ b/doc/fluid/design/mkldnn/acquire_api/scripts/acquire.dot @@ -0,0 +1,53 @@ +digraph { + rankdir=LR + weight=0.5 + concentrate=true + splines=ortho + newrank=true +nodesep=1 + + node[width=4.4,shape=box] + + Node0x490c380 [shape=record,label="SoftmaxMKLDNNKernel::Compute()\l"]; + + Node0x4ab38f0 [shape=record,label="MKLDNNActivationKernel::Compute()\l"]; + + subgraph cluster_A { + label="Derived Handlers" + node[width=7.4,shape=box] + style=dotted +// Dummy[shape=record,label="", color=invis]; + Node0x4915e90 [shape=record,label="SoftmaxMKLDNNHandler::SoftmaxMKLDNNHandler\()\l"]; + Node0x4b2e4f0 [shape=record,label="ActivationMKLDNNHandler::ActivationMKLDNNHandler\()\l"]; + } + + subgraph cluster_B { + label="Base MKLDNNHandler" + style=dotted + node[width=6.2,shape=box] + Node0x49164c0 [shape=record,label="MKLDNNHandlerT::AcquireSrcMemory()\l"]; + Dst[shape=record,label="MKLDNNHandlerT::AcquireDstMemory()\l"]; + Node0x491bca0 [shape=record,label="MKLDNNHandlerT::AcquireForwardPrimitive()\l"]; + Node0x496cfc0 [shape=record,label="MKLDNNHandlerT::AcquireForwardPrimitiveDescriptor()\l"]; + } + +Node0x490c380 -> Node0x4915e90[style="bold"]; +Node0x490c380 -> Node0x49164c0; +Node0x490c380 -> Node0x491bca0; +Node0x490c380 -> Dst; +Node0x4915e90 -> Node0x496cfc0; + + +{rank=same Node0x4ab38f0 Node0x490c380 } // Compute level +{rank=same Node0x4915e90 Node0x4b2e4f0 } // Derived Handler level +{rank=same Node0x49164c0 Dst Node0x491bca0 Node0x496cfc0 } // Compute level + + + +Node0x4b2e4f0 -> Node0x496cfc0 +Node0x4ab38f0 -> Node0x49164c0 +Node0x4ab38f0 -> Dst +Node0x4ab38f0 -> Node0x491bca0 +Node0x4ab38f0 -> Node0x4b2e4f0[style="bold"] + +}