1. 02 7月, 2018 2 次提交
  2. 07 6月, 2018 2 次提交
    • D
      split reduce op into multiple libraries, accelerate the compiling (#11029) · d48172f2
      dzhwinter 提交于
      * "split into multiple .ccl"
      
      * "refine file structure"
      
      * "refine files"
      
      * "remove the cmakelist"
      
      * "fix typo"
      
      * "fix typo"
      
      * fix ci
      d48172f2
    • M
      Mkldnn layout (#11040) · 3ff9ba0e
      mozga-intel 提交于
      * Add MKLDNN layout support in Paddle
      
      Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
      can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
      is hardcode to be used in all MKLDNN op kernels. As a result,
      non-optimized execution path is selected in MKLDNN primitive which
      bring worse performance.
      Besides framework change, three MKLDNN OP kernels were updated
      for using new MKLDNN layout. They are conv/pool2d/batch_norm.
      Other MKLDNN OP kernels need be also updated in similar way to
      achieve best performance.
      
      * Add MKLDNN layout support in activation OP
      
      * Don't populate layout from input to output when kMKLDNN in
      
      * Refine pool mkldnn op kernel
      
      * MKLDNN layout
      
      * Remove the inferitance from tensor file
      
      * MKLDNN layout: refactoring
      
      * Remove additional #define to register new operator
      
      * Prepare mkldnn tests to work with layout
      3ff9ba0e
  3. 18 4月, 2018 1 次提交
  4. 17 4月, 2018 1 次提交
  5. 12 2月, 2018 1 次提交
  6. 10 2月, 2018 2 次提交
  7. 09 2月, 2018 1 次提交
  8. 17 1月, 2018 1 次提交
  9. 03 1月, 2018 1 次提交
  10. 27 12月, 2017 1 次提交
    • D
      "refine kernel registrar" (#6998) · 35c1683e
      dzhwinter 提交于
      * "refine kernel registrar"
      
      * "refine registrar with multikey"
      
      * "fix register"
      
      * "refine multikernel register"
      
      * "fix CI"
      
      * "fix CI"
      
      * "fix registry"
      
      * "swtich GPU to CUDA"
      
      * "add register macro test case"
      
      * "fix CI"
      35c1683e
  11. 25 12月, 2017 1 次提交
  12. 24 12月, 2017 1 次提交
  13. 22 12月, 2017 1 次提交
  14. 21 12月, 2017 1 次提交
  15. 20 12月, 2017 1 次提交
  16. 12 12月, 2017 1 次提交
    • Q
      Refine device context (#6433) · 61ec0b95
      QI JUN 提交于
      There are mainly following fixes:
      
      - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
      - remove `eigen_device` interface in base class  `DeviceContext`
      - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
      - remove unused `platform::EigenDeviceConverter`
      - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
      - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
      61ec0b95
  17. 08 11月, 2017 1 次提交
    • Y
      Polish OpWithKernel · bbdac7f7
      Yu Yang 提交于
      * Chage `IndicateDataType` to `GetKernelType`. Make it easier to
        understand.
      * Change `OpKernelKey` to `OpKernelType`
      * Make operator developers can customize which kernel the operator will
        use in runtime.
      bbdac7f7
  18. 01 11月, 2017 1 次提交
    • Y
      Feature/executor use program bind (#5196) · 1363ddb6
      Yu Yang 提交于
      * Init commit
      
      * Make executor use ProgramDescBind
      
      * Change Attribute from BlockDesc to BlockDescBind
      
      * Since we will get the program desc in RNN, just BlockDesc is not
        enough.
      1363ddb6
  19. 29 10月, 2017 2 次提交
  20. 24 10月, 2017 1 次提交
  21. 19 10月, 2017 2 次提交
  22. 18 10月, 2017 1 次提交
  23. 17 10月, 2017 1 次提交
  24. 13 10月, 2017 1 次提交
  25. 10 10月, 2017 1 次提交
  26. 06 10月, 2017 2 次提交
  27. 05 10月, 2017 5 次提交
  28. 04 10月, 2017 3 次提交