1. 21 6月, 2018 4 次提交
    • J
      - MKLDNN Softmax Grad Op · 98f3ad3b
      Jacek Czaja 提交于
      - Added hash function inside of MKLDNN softmax op to be used as handle for primitives stroing in a
      context
      
      - Style fixes to softmax mkldnn op
      
      - Fixes after review
      
      - Coding style
      
      - Fix to style
      
      - style fixes
      
      - style fix
      
      - style fixes
      
      - Fix to cody style check
      
      - Rephrasing a comment
      
      fix t obroken merge
      
      Fixes to rebase
      
      Conflicts:
      	benchmark/fluid/models/machine_translation.py
      	cmake/external/mkldnn.cmake
      	paddle/fluid/operators/softmax_mkldnn_op.cc
      
      - Bumped revision of MKL-DNN up to have softmax backward primitive
      
      - Added choosing MKLDNN softmax grad operator
      
      - First reuse of softmax backward
      
      - Reinvented reusing for softmax
      
      - Fix to crash in reinvented reuse
      
      - Clang format fixes
      
      - Clang format fixes
      
      - Improved softmax mkldnn reuse mechanism
      
      - clang format fixes
      
      - Fix to broken merge
      
      - Fix
      98f3ad3b
    • T
      Revert "Merge pull request #11628 from PaddlePaddle/revert-11102-mozga-intel/Sum_mkldnn_layout" · d5fb8fa7
      tensor-tang 提交于
      This reverts commit 4d8e8ee2, reversing
      changes made to d6a9f005.
      d5fb8fa7
    • T
      Revert "MKLDNN layout: Support for sum operator" · 90780e22
      tensor-tang 提交于
      90780e22
    • C
      Add No Mutex · c99fca5f
      chengduoZH 提交于
      c99fca5f
  2. 19 6月, 2018 2 次提交
  3. 16 6月, 2018 1 次提交
  4. 14 6月, 2018 2 次提交
    • Q
      Fix NCCLBcast hang up bug in Parallel Executor (#11377) · 046bb5c8
      Qiyang Min 提交于
      * 1. Create buddy allocator in each places before NcclBcast the variables
      2. Check the memory usage of ALL gpus rather than the first one
      
      * 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing
      2. NOTE the usage of NCCLGroupGuard
      
      * Remove the memory usage check of gpus
      
      * Fix code style
      046bb5c8
    • X
      Remove cuptiFinalize. · d2afd210
      Xin Pan 提交于
      In cupti samples, only cuptiFlush is used.
      I can't find any places calling cuptiFinalize and
      this API can error out as not_implemented in some
      cuda installation.
      d2afd210
  5. 13 6月, 2018 1 次提交
  6. 12 6月, 2018 1 次提交
  7. 11 6月, 2018 1 次提交
  8. 08 6月, 2018 2 次提交
  9. 07 6月, 2018 1 次提交
    • M
      Mkldnn layout (#11040) · 3ff9ba0e
      mozga-intel 提交于
      * Add MKLDNN layout support in Paddle
      
      Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
      can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
      is hardcode to be used in all MKLDNN op kernels. As a result,
      non-optimized execution path is selected in MKLDNN primitive which
      bring worse performance.
      Besides framework change, three MKLDNN OP kernels were updated
      for using new MKLDNN layout. They are conv/pool2d/batch_norm.
      Other MKLDNN OP kernels need be also updated in similar way to
      achieve best performance.
      
      * Add MKLDNN layout support in activation OP
      
      * Don't populate layout from input to output when kMKLDNN in
      
      * Refine pool mkldnn op kernel
      
      * MKLDNN layout
      
      * Remove the inferitance from tensor file
      
      * MKLDNN layout: refactoring
      
      * Remove additional #define to register new operator
      
      * Prepare mkldnn tests to work with layout
      3ff9ba0e
  10. 06 6月, 2018 5 次提交
  11. 01 6月, 2018 4 次提交
  12. 31 5月, 2018 1 次提交
  13. 30 5月, 2018 2 次提交
  14. 23 5月, 2018 1 次提交
  15. 22 5月, 2018 1 次提交
    • X
      multi-thread handlerequest · b4dd4c04
      Xin Pan 提交于
          Experiment on vgg flower, 2 trainers, 1ps.
          more trainer could have more speedup.
      
          After:
          Pass = 0, Iters = 327, Speed = (7.52) img/s
          Before:
          Pass = 0, Iters = 385, Speed = (6.77) img/s
      b4dd4c04
  16. 21 5月, 2018 2 次提交
  17. 17 5月, 2018 1 次提交
    • J
      - Draft of reuse of pooling mkldnn operator · 5f133305
      Jacek Czaja 提交于
      - Finished draft of pooling reusing of operators
      
      - Using gethash in PoolGrad added
      
      - Removed diagnostic
      
      - Added pool mkldnn grad reusing of primitives
      
      - Added diagnostic
      
      - Removed diagnostic
      
      - added dependency to mkldnn data type for pooling mkldnn
      
      - Added mkldnn memory data type determining based on template type of op
      
      - Compilation warning fix
      
      - codying style fixes
      5f133305
  18. 15 5月, 2018 2 次提交
    • X
      Fix a profiler race condition · 94c0a64d
      Xin Pan 提交于
      In multi-thread condition, EnableProfiler can
      be called after RecordEvent is constructed. In this
      case, RecordEvent constructor will not init anything,
      but RecordEvent destructor will do something since EnableProfiler
      was called.
      This PR fixes it.
      94c0a64d
    • Y
      Polish cmake · dc6ce071
      yuyang18 提交于
      dc6ce071
  19. 14 5月, 2018 2 次提交
  20. 11 5月, 2018 1 次提交
  21. 09 5月, 2018 1 次提交
  22. 08 5月, 2018 1 次提交
  23. 07 5月, 2018 1 次提交