1. 19 11月, 2019 1 次提交
  2. 20 3月, 2019 1 次提交
  3. 08 3月, 2019 1 次提交
  4. 07 3月, 2019 1 次提交
  5. 20 12月, 2018 1 次提交
  6. 18 12月, 2018 1 次提交
  7. 17 12月, 2018 1 次提交
  8. 19 11月, 2018 1 次提交
    • Y
      Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8
      Yihua Xu 提交于
      * Optimize layer_norm operator with AVX intrinsic functions
      
      * Revert the wrong modifications
      
      * Implement the jit kernel for layer_norm operator
      
      * Add math headfile to fix the compile issue (test=develop)
      
      * Add math headfile to fix the compile issue (test=develop)
      
      * Fixed the intrinsic headfile issue (test=develop)
      
      * Fix the conflicts (test=develop)
      
      * Revert for CUDA compiler (test=develop)
      
      * Fixed the cuda depency (test=develop)
      
      * Fix the marco issues (test=develop)
      f4c869d8
  9. 16 11月, 2018 1 次提交
    • W
      Refine operator cmake (#14413) · a2d9b344
      Wu Yi 提交于
      * wip simplify operator framework
      
      * wip
      
      * wip
      
      * done test=develop
      
      * clean test=develop
      
      * fix test=develop
      
      * fix deps test=develop
      
      * fix cpu build test=develop
      
      * fix tensorrt build test=develop
      
      * fix tests test=develop
      
      * fix test=develop
      
      * fix cpu build test=develop
      a2d9b344
  10. 04 5月, 2018 1 次提交
  11. 25 3月, 2018 2 次提交
  12. 15 2月, 2018 1 次提交
    • Y
      Update tensor_util.h (#8422) · cfffb1a3
      Yi Wang 提交于
      * Update tensor_util.h
      
      * Update with moved TensorDesc
      
      * Fix tensur_utils.cu
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Make tensor_util.cu a symbolic link
      cfffb1a3
  13. 12 2月, 2018 1 次提交
  14. 10 2月, 2018 2 次提交
  15. 05 2月, 2018 3 次提交
  16. 03 2月, 2018 1 次提交
  17. 24 1月, 2018 1 次提交
  18. 22 12月, 2017 1 次提交
  19. 12 12月, 2017 1 次提交
    • Q
      Refine device context (#6433) · 61ec0b95
      QI JUN 提交于
      There are mainly following fixes:
      
      - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
      - remove `eigen_device` interface in base class  `DeviceContext`
      - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
      - remove unused `platform::EigenDeviceConverter`
      - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
      - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
      61ec0b95
  20. 25 10月, 2017 1 次提交
    • Q
      CPU Batch Norm Op (#4964) · ee998a9c
      Qiao Longfei 提交于
      * init batch norm op
      
      * prepare input output
      
      * compute mean_out var_out save_mean save_var on CPU
      
      * active is test
      
      * use eigen to do computation
      
      * complete batch norm forward
      
      * set default momentum to 0.9
      
      * add batch norm grad op in CPU
      
      * add tensor_format and NHWC support, add python test
      
      * add test training
      
      * add batch norm gradient test
      
      * improve comment, fix foward Python UnitTest
      
      * add gradient test
      
      * fix eigen warning
      
      * follow name style
      
      * fix a bug
      
      * change float to T
      
      * add simple forward test
      
      * test with different place
      
      * add backward test
      
      * refine python test
      
      * remove old python test code
      
      * code clean
      
      * follow code style
      
      * update comment
      ee998a9c
  21. 10 10月, 2017 1 次提交
  22. 28 9月, 2017 1 次提交
  23. 20 9月, 2017 1 次提交
  24. 23 8月, 2017 1 次提交
  25. 11 8月, 2017 1 次提交
  26. 08 8月, 2017 1 次提交
  27. 07 8月, 2017 1 次提交
  28. 05 8月, 2017 1 次提交
  29. 02 8月, 2017 1 次提交
  30. 01 8月, 2017 1 次提交
  31. 26 7月, 2017 1 次提交
  32. 25 7月, 2017 1 次提交
  33. 19 7月, 2017 2 次提交
  34. 17 7月, 2017 2 次提交
    • Q
      set correct place for output tensor · 2a03e380
      qijun 提交于
      2a03e380
    • Y
      Op varient inputs (#2901) · a0caf234
      Yan Chunwei 提交于
      * add inputs
      
      * add ut for multiple inputs
      
      * fix AddToLayer
      
      * op_desc -> op_proto
      
      * CreateArgumentOffsetMap -> CreateInOutOffsetMap
      
      * move CreateInOutOffsetMap from OperatorBase to op registry
      
      * arg_idxs_ -> in_out_idxs_
      a0caf234