1. 10 2月, 2018 1 次提交
  2. 15 1月, 2018 2 次提交
  3. 14 1月, 2018 1 次提交
    • D
      "cudnn operators change to cudnn kernel" (#6660) · 5ad1aef0
      dzhwinter 提交于
      * "unified operators"
      
      * "add CUDNN register"
      
      * "add use cudnn attribute"
      
      * "add attribute"
      
      * "test conv tranpose op"
      
      * "remove duplicated attr"
      
      * "fix op test"
      
      * "add attribute to set cudnn"
      
      * "add more log"
      
      * "need layout op register support"
      
      * "add more log"
      
      * "change GetExpectedKernelType "
      
      * "fix Get attr in conv_op"
      
      * "fix CI"
      
      * "fix tests"
      
      * "removed kernel priority fallback"
      
      * "fix CI"
      
      * "fix stack pointer bug"
      
      * "refine buggy interface"
      
      * "add const cast to save life"
      
      * "fix get_output_with_grad"
      
      * "fix op test with dataformat"
      
      * ""fix pooling
      
      * "fix pooling test"
      
      * "fix CI"
      
      * "fix with_gpu error"
      
      * "add transform needed functional check"
      
      * "fix unpack list error"
      
      * "comment out parallel.do temporary"
      
      * "fix CI"
      
      * "fix compile doc error"
      
      * "make threshold larger"
      5ad1aef0
  4. 09 1月, 2018 1 次提交
    • Y
      Port WarpCTC Operator (#5107) · b5fda272
      Yiqun Liu 提交于
      * Add Seq2BatchFunctor, which will be used in WarpCTCOp.
      
      * Implement WrapCTCFunctor and WrapCTCKernel.
      
      * Add unittest of warpctc_op.
      
      * Modify the check_output inferface in python unittest framework to allow check a subset of outputs.
      
      * Use absolute offset lod in warpctc_op and related functors.
      
      * Refine the comments of warpctc_op.
      
      * The new python unittest supports checking a subset of the outputs, so revoke the previous change.
      
      * Rename the transform from LoDTensor to Tensor with shape [max_sequence_length, num_sequences, sequence_width] to PaddingSequenceFunctor.
      
      * Update to the newest codes.
      
      * Rename the PaddingSequenceFunctor to PaddingLoDTensorFunctor and remove the computation of dimensions out of the functos.
      b5fda272
  5. 26 12月, 2017 1 次提交
  6. 24 12月, 2017 1 次提交
    • D
      Feature/operator run place (#6783) · 735eba29
      dzhwinter 提交于
      * "change operator interface"
      
      * "move devicepool to device_context"
      
      * "fix operator test"
      
      * "fix op_registry Run interface"
      
      * "net op passed. Need to fix nccl multi-Context"
      
      * "add nccl group function"
      
      * "add nccl group function"
      
      * "fix gpu count exceed 32 error"
      
      * "fix recurrent op, nccl op"
      
      * "change the other operators interface with Place"
      
      * "fix typo"
      
      * "fix pybind"
      
      * "fix device in python side"
      
      * "fix pybind failed"
      
      * "add init for test"
      
      * "fix CI"
      735eba29
  7. 15 12月, 2017 1 次提交
  8. 07 12月, 2017 1 次提交
  9. 29 11月, 2017 1 次提交
  10. 24 11月, 2017 1 次提交
  11. 11 11月, 2017 2 次提交
  12. 26 10月, 2017 1 次提交
    • Q
      Cudnn batch norm op (#5067) · 56b723c4
      Qiao Longfei 提交于
      * init cudnn batch norm op
      
      * rename batch_norm_cudnn_op.cc batch_norm_op.cu
      
      * correct name style
      
      * add ExtractNCWHD, simplify code
      
      * fix ExtractNCWHD
      
      * use CUDNN_ENFORCE instead of PADDLE_ENFORCE
      56b723c4
  13. 24 10月, 2017 2 次提交
  14. 18 10月, 2017 1 次提交
    • M
      MatMul operator (#4856) · 16489827
      Markus Kliegl 提交于
      * initial matmul operator
      
      Similar to np.matmul, but also has transpose_X and transpose_Y flags,
      and only supports tensors from rank 1 to 3 inclusive.
      
      For GPU, uses cublas?gemmStridedBatched. For CPU, uses
      cblas_?gemm_batch if available via MKL; otherwise a simple serial
      implementation that loops over the batch dimension is employed for now.
      16489827
  15. 16 10月, 2017 1 次提交
  16. 15 10月, 2017 1 次提交
  17. 31 8月, 2017 1 次提交
  18. 10 8月, 2017 4 次提交
  19. 04 8月, 2017 1 次提交
  20. 15 7月, 2017 1 次提交
  21. 13 7月, 2017 1 次提交
  22. 12 7月, 2017 1 次提交
  23. 11 7月, 2017 2 次提交
  24. 04 7月, 2017 3 次提交