1. 24 6月, 2021 2 次提交
  2. 23 6月, 2021 1 次提交
    • J
      Added split op bf16/fp32 oneDNN kernel (#33584) · 68106509
      jakpiase 提交于
      * base changes for split op
      
      * 90% of split functionality added
      
      * full fp32 functionality
      
      * added bf16 test
      
      * added submemory caching
      
      * added bf test to static mode whitelist
      
      * minor change
      
      * enabled split op for inference
      
      * minor fix
      
      * minor fix
      68106509
  3. 21 6月, 2021 1 次提交
  4. 16 6月, 2021 1 次提交
  5. 11 6月, 2021 1 次提交
  6. 10 6月, 2021 2 次提交
  7. 09 6月, 2021 2 次提交
  8. 02 6月, 2021 4 次提交
  9. 01 6月, 2021 2 次提交
  10. 27 5月, 2021 2 次提交
  11. 26 5月, 2021 2 次提交
  12. 25 5月, 2021 1 次提交
    • C
      modify complex template for elementwise ops (#33071) · dbc08d69
      chentianyu03 提交于
      * modify complex template for elementwise ops
      
      * modify mul, div grad struct
      
      * add complex template for CudaShuffleDownSync CudaShuffleXorSync funcs and fix the bug when delete cuda<9000
      
      * fix shuffle func args bug
      
      * fix shuffle func args bug
      
      * fix shuffle func args bug
      dbc08d69
  13. 20 5月, 2021 2 次提交
    • C
      Add complex template type (#32857) · 738bf20e
      chentianyu03 提交于
      * add complex template file
      
      * add numtraits for complex template
      
      * add complex template type register
      
      * modify specify template of complex
      
      * modify specify template of complex
      
      * modify specify template of complex
      
      * modify specify template of complex
      
      * make TensorCheckerVisitor support complex type
      
      * fix operator= error
      
      * add complex template
      
      * add complex template type
      
      * add complex template type to pyarray transform
      
      * add complex template type to pyarray transform
      
      * remove complex type for dlpack register
      
      * set dlpack supprot complex type
      
      * set dlpack supprot complex type
      
      * set dlpack supprot complex type
      
      * remove explict for complex constructor
      
      * add complex unit test file
      738bf20e
    • L
      14949521
  14. 19 5月, 2021 1 次提交
  15. 12 5月, 2021 1 次提交
  16. 07 5月, 2021 1 次提交
  17. 06 5月, 2021 2 次提交
  18. 30 4月, 2021 2 次提交
  19. 29 4月, 2021 1 次提交
  20. 28 4月, 2021 1 次提交
  21. 27 4月, 2021 1 次提交
  22. 25 4月, 2021 1 次提交
  23. 21 4月, 2021 5 次提交
    • A
      bf0ec9b8
    • Z
      【NPU】Merge NPU ccl code (#32381) · c3158527
      zhang wenhui 提交于
      * add allreduce and broadcast without test (#31024)
      
      add allreduce and broadcast without test
      
      * Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      * [NPU] add npu kernel for communication op (#31437)
      
      * add allreduce and broadcast without test
      
      * add c_broadcast_test case
      
      * build c_comm_init and c_create_group operators
      
      * make the whole thing compile
      
      * add broadcast and init op test case but run failed
      
      * make unit test compile
      
      * fix broadcast test bug and change into hcom for ccl
      
      * change c_comm_init and c_create_group ops accordingly
      
      * make tests compile
      
      * transfer code to 27
      
      * compiled successfully in 28, but run failed
      
      * test broadcast in 28, but failed
      
      * make hcom primitives work
      
      * change hccl data type for base.h
      
      * fix broadcast bug
      
      * make attributes work
      
      * fix group name bug
      
      * add allreduce but test failed
      
      * allreduce bug for qiuliang
      
      * allreduce finished
      
      * add allgather and reducescatter
      
      * merge all op code
      
      * add allgather test
      
      * finish run all ccl op test exclude send/recv
      
      * all all op and test exclude send/recv
      
      * send_v2_npu.cc recv_v2_npiu.cc compiled
      
      * fix ccl core dump bug and test allgather, reducescatter, broadcast op
      
      * fix allreduce bug just for test
      
      * hcom send&recv test pass, without hcom_destroy
      
      * for qiuliang test
      
      * Ascend Send&Recv Test Pass
      
      * all op (ex send/recv) ok
      
      * fix bug
      
      * merge all ccl op
      
      * style merge to PaddlePaddle
      
      * merge style
      
      * new merge style
      
      * merge style 2
      
      * insert an empty at the end
      
      * disable ctest for hcom to pass ci
      Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      
      * Add auto-increasing tag id for Hcom OPs (#31702)
      
      * add c_reduce_sum op (#31793)
      
      add c_reduce_sum op
      
      * update Ascendrc hccl to 20.3 (#32126)
      
      update Ascendrc hccl to 20.3 (#32126)
      
      * fix merge code
      
      * change cmake.txt1
      
      * [NPU] Support npu kernel for c sync stream op (#31386)
      
      * sync stream npu op
      
      * add with_ascend_acl
      
      * update c++ unittest
      
      * compile all failed
      
      * try to pre commit
      
      * after pre commit
      
      * merge&compile&test hccl successfully!
      
      * fix code style
      
      * fix code style
      
      * fix bugs about hccl
      
      * fix some bugs
      
      * fix code style
      
      * fix style
      
      * fix style
      
      * fix
      
      * fixed
      
      * merge develop
      Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      Co-authored-by: Nxiayanming <41795079@qq.com>
      c3158527
    • L
      [NPU] register npu finalize on exit (#32390) · 8e4c1936
      Leo Chen 提交于
      * [NPU] register finalize on exit
      
      * fix
      8e4c1936
    • flush denormal in the tracer op, test=develop (#32350) · 9ff85561
      石晓伟 提交于
      * flush denormal in the tracer op, test=develop
      
      * add cmake dependencies, test=develop
      
      * add a macro, test=develop
      
      * fix the windows case, test=develop
      9ff85561
    • J
      Added oneDNN reduce_op GRAD kernel (#32280) · ead83422
      jakpiase 提交于
      ead83422
  24. 19 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
      Leo Chen 提交于
      * [NPU] support GarbageCollector for npu (#31874)
      
      * support GarbageCollector for npu
      
      * fix typo
      
      * fix gather_grad
      
      * disable NPUDefaultStreamGarbageCollector on NPU
      
      * [NPU] support npu for memcpy op (#31808)
      
      * support npu for memcpy op
      
      * add ut
      
      * fix ut
      
      * fix typo
      
      * 【NPU】fix bug of using temp vector (#31963)
      
      * fix bug when beta1_pow on cpu (#31995)
      
      * [NPU] support npu profiler (#31684)
      
      * support npu profiler
      
      * add python api
      
      * fix bugs
      
      * add wrapper for incomplete type
      
      * update profile proto
      
      * record npu wait
      
      * add xpu placeholder
      
      * fix adam (#32016)
      
      * [NPU] enable async copy and  add wait before sync operation (#31956)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * make TensorFromVector/TensorToVector sync
      
      * [NPU] Support dataloader on npu place. (#31867)
      
      * [NPU] Wait on NPUPlace (#32086)
      
      * [NPU] fix cast op (#32121)
      
      * fix npu kernel of cast op to handle casting to same dtype
      
      * add comments
      
      * [NPU] support cann 20.3 (#32044)
      
      * fix compile problem on cann 20.3
      
      * fix ut
      
      * fix test_mul
      
      * fix check_finite_and_scale
      
      * fix lookup_table_v2_grad
      
      * fix cmake
      
      * support print op
      
      * [NPU] Support npu save load (#31893)
      
      * support save load for NPU
      
      * add save load npu unittest
      
      * support np.array transform in NPU
      
      * fix errors
      
      * delete dygraph in unittest
      
      * add Wait
      
      * fix unittest
      
      * fix review comment
      
      * fix unittest problem
      
      * fix little problem
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
      
      * refine code
      
      * fix NPUDeviceContext in all c++ unittest (#32198)
      
      * fix NPUDeviceContext in all c++ unittest
      
      * refine log
      Co-authored-by: Npangyoki <pangyoki@126.com>
      
      * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * change TensorFromVector to FillNpuTensorWithConstant
      
      * fix ignored api
      
      * delete extra unittest
      
      * fix little error
      
      * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
      
      * change TensorCopySync to TensorCopy
      
      * delete useless Wait and add StreamWait
      
      * fix npu_stream error
      
      * fix check_finite_and_unscale_op_npu TensorCopy
      
      * only save stream wait
      
      * fix NPUDeviceContext in all c++ unittest
      
      * delete wait
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * delete useless unittest file (#32206)
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix adam bug again (#32246)
      
      * fix compile
      
      * fix ut
      
      * fix ut
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      cbe5c9f8