1. 21 4月, 2021 1 次提交
    • Z
      【NPU】Merge NPU ccl code (#32381) · c3158527
      zhang wenhui 提交于
      * add allreduce and broadcast without test (#31024)
      
      add allreduce and broadcast without test
      
      * Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      * [NPU] add npu kernel for communication op (#31437)
      
      * add allreduce and broadcast without test
      
      * add c_broadcast_test case
      
      * build c_comm_init and c_create_group operators
      
      * make the whole thing compile
      
      * add broadcast and init op test case but run failed
      
      * make unit test compile
      
      * fix broadcast test bug and change into hcom for ccl
      
      * change c_comm_init and c_create_group ops accordingly
      
      * make tests compile
      
      * transfer code to 27
      
      * compiled successfully in 28, but run failed
      
      * test broadcast in 28, but failed
      
      * make hcom primitives work
      
      * change hccl data type for base.h
      
      * fix broadcast bug
      
      * make attributes work
      
      * fix group name bug
      
      * add allreduce but test failed
      
      * allreduce bug for qiuliang
      
      * allreduce finished
      
      * add allgather and reducescatter
      
      * merge all op code
      
      * add allgather test
      
      * finish run all ccl op test exclude send/recv
      
      * all all op and test exclude send/recv
      
      * send_v2_npu.cc recv_v2_npiu.cc compiled
      
      * fix ccl core dump bug and test allgather, reducescatter, broadcast op
      
      * fix allreduce bug just for test
      
      * hcom send&recv test pass, without hcom_destroy
      
      * for qiuliang test
      
      * Ascend Send&Recv Test Pass
      
      * all op (ex send/recv) ok
      
      * fix bug
      
      * merge all ccl op
      
      * style merge to PaddlePaddle
      
      * merge style
      
      * new merge style
      
      * merge style 2
      
      * insert an empty at the end
      
      * disable ctest for hcom to pass ci
      Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      
      * Add auto-increasing tag id for Hcom OPs (#31702)
      
      * add c_reduce_sum op (#31793)
      
      add c_reduce_sum op
      
      * update Ascendrc hccl to 20.3 (#32126)
      
      update Ascendrc hccl to 20.3 (#32126)
      
      * fix merge code
      
      * change cmake.txt1
      
      * [NPU] Support npu kernel for c sync stream op (#31386)
      
      * sync stream npu op
      
      * add with_ascend_acl
      
      * update c++ unittest
      
      * compile all failed
      
      * try to pre commit
      
      * after pre commit
      
      * merge&compile&test hccl successfully!
      
      * fix code style
      
      * fix code style
      
      * fix bugs about hccl
      
      * fix some bugs
      
      * fix code style
      
      * fix style
      
      * fix style
      
      * fix
      
      * fixed
      
      * merge develop
      Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      Co-authored-by: Nxiayanming <41795079@qq.com>
      c3158527
  2. 07 4月, 2021 1 次提交
    • Z
      【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3
      zhang wenhui 提交于
      * Ascend rc (#30483)
      
      * Fix compilcation on CANN20.1 and older (#30494)
      
      Fix compilcation on CANN20.1 and older
      
      * Add distribution supported (#30578)
      
      Add distribution supported
      
      * Build praser for Hcom* operators (#30627)
      
      Build praser for Hcom* operators
      
      * Pass device_ids info from launch to trainer. (#30632)
      
      Pass device_ids info from launch to trainer
      
      * Add Hccl program group (#30642)
      
      Add Hccl program group
      
      * Add startup bash files of test_ascend_group. (#30645)
      
      Add startup bash files of test_ascend_group
      
      * cleanup (#30646)
      
      cleanup test_ascend_group.py
      
      * [Feature] Build parser to support distributed training (#30658)
      
      [Feature] Build parser to support distributed training
      
      * fix compilation on ascend-20.1 (#30722)
      
      fix compilation on ascend-20.1
      
      * Dev/fix ascend string (#30749)
      
      Dev/fix ascend string
      
      * code style (#30781)
      
      code style
      
      * Merge ascend_optimizer and ascend_parser. (#30776)
      
      Merge ascend_optimizer and ascend_parser.
      
      * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)
      
      Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug
      
      * Add paddle ascend distribution training supported (#30796)
      
      Add paddle ascend distribution training supported
      
      * pass cxx_flags to gloo cmake (#30857)
      
      * Destroy session first. (#30954)
      
      Destroy session first.
      
      * merge
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style, test=develop
      
      * fix, test=develop
      
      * fix
      
      * fix log fatal, test=develop
      
      * fix enforce style, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix rccl, test=develop
      
      * fix test, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix node_num, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Ndingsiyu <18369187719@163.com>
      Co-authored-by: NOleNet <olenet@126.com>
      8c7c53b3
  3. 01 4月, 2021 1 次提交
  4. 26 3月, 2021 1 次提交
  5. 17 1月, 2021 1 次提交
  6. 15 1月, 2021 1 次提交
    • P
      Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) · 13d75736
      pangyoki 提交于
      * add view strategy on squeeze,unsqueeze,reshape,flatten
      
      * add squeeze unittest
      
      * add unittests
      
      * use View strategy as name rather than Reuse Allacation
      
      * fix view api doc
      
      * fix format
      
      * use core.ops when input of reshape2 is Tensor
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * add inplace strategy
      
      * add elementwise_add sub
      
      * let backward op not use inplace
      
      * grad op do not use inplace
      
      * fix memory increase error and add leaf error message
      
      * delete selected_rows
      
      * change op_function
      
      * little change
      
      * solve HandleViewBetweenInputAndOutput
      
      * add unittest and leaf error message
      
      * merge view error
      
      * optimize op_function_generator format and support sum inplace op
      
      * fix format of basic_engine
      
      * fix format for framework
      
      * little change of variable wrapper
      
      * add reshape, squeeze, unsqueeze, scatter api
      
      * add relu elu tanh softmax inplace api
      
      * fix test_squeeze_op unittest
      
      * fix test_relu_op unittest
      
      * fix comment problems
      
      * delete sample code of inplace api
      
      * add reference of grad_pending_nodes in basic_engine
      
      * fix unittest name
      
      * add inplace apis into wlist
      
      * fix error message
      
      * add PADDLE_ENFORCE for set grad op twice
      
      * fix head file error
      13d75736
  7. 09 1月, 2021 1 次提交
    • P
      add View(reuse allocation) strategy on squeeze, unsqueeze, reshape, flatten op (#29913) · da16b33f
      pangyoki 提交于
      * add view strategy on squeeze,unsqueeze,reshape,flatten
      
      * add squeeze unittest
      
      * add unittests
      
      * use View strategy as name rather than Reuse Allacation
      
      * fix view api doc
      
      * fix format
      
      * use core.ops when input of reshape2 is Tensor
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * delete selected_rows
      
      * change op_function
      
      * little change
      
      * solve HandleViewBetweenInputAndOutput
      da16b33f
  8. 08 1月, 2021 1 次提交
    • L
      Fix dtype of ungenerated grad var (#28511) · 8696335f
      Leo Chen 提交于
      * fix dtype of ungenerated grad var
      
      * update ut
      
      * refine code
      
      * set default dtype
      
      * fix could_use_cudnn bug
      
      * remove debug code
      
      * re-implement
      
      * fix bug
      8696335f
  9. 07 1月, 2021 1 次提交
  10. 06 1月, 2021 1 次提交
  11. 02 12月, 2020 1 次提交
    • Z
      Add pure fp16 training with master weights. (#27712) · be3777a5
      Zhen Wang 提交于
      * add the weight decay func for the momentum op
      
      * Add the multi_precision function in Momentum Optimizer.
      
      * Make sure that the initial value of master weights are same with the fp16 weights.
      
      * add static loss scaling.
      
      * add the rescale_grad function in the pure fp16 training.
      
      * use the original momentum updating method.
      
      * Polish some codes, such as variable names.
      
      * add docstring for apis.
      
      * update the var creation details of _create_master_weight.
      
      * not modify codes about imperative momentum updating.
      
      * Fix the error of test_dist_sparse_tensor_load_momentum UT.
      
      * add unit test for multi precision fp16 training.
      
      * add more unit tests for CI.
      
      * Use lower threshold values for allclose comparing in test_multi_precision_fp16_train UT.
      
      * For CI Coverage Checking.
      be3777a5
  12. 18 11月, 2020 1 次提交
  13. 02 11月, 2020 1 次提交
  14. 29 10月, 2020 2 次提交
  15. 28 10月, 2020 1 次提交
  16. 14 10月, 2020 1 次提交
  17. 12 10月, 2020 1 次提交
  18. 27 9月, 2020 1 次提交
    • L
      add support to float64 input of warpctc op. (#27399) · 1501a80f
      Li Fuchen 提交于
      * add float64 input to ctc_loss
      
      * modified error message of  warpctc
      
      * update repo and tag of warpctc
      
      * add test for warpctc with float64 input
      
      * modified warpctc.cmake to make sure build always
      
      * resolved sample code bug of warpctc
      
      * add core.ops in warpctc dygraph
      
      * fix a bug of test
      1501a80f
  19. 21 9月, 2020 1 次提交
    • H
      Quant op dev (#25932) · 02606d45
      huangxu96 提交于
      * Finished ChannelWiseQuantDequantAbsMaxOp and Passed unittests.
      
      * Finished channel-wise quantize strategy in imperative quantization.
      
      * Added Cuda code of ChannelWiseQuantDequantMaxAbsOP
      Add Cuda code of ChannelWiseQuantDequantMaxAbsOp
      
      * Add quant_axis for channel_wise quant.
      
      * fixed a bug in unnitests, which will not trigger axis = 1 case and cannot meet the coverage rate requirement.
      
      * Added some assert infomation and fixed some coding style mistakes.
      02606d45
  20. 14 9月, 2020 1 次提交
    • Z
      Update amp_check_finite_and_scale_op and add an updating_loss_scaling op for... · d708b210
      Zhen Wang 提交于
      Update amp_check_finite_and_scale_op and add an updating_loss_scaling op for static graph amp training. (#26240)
      
      * update amp_check_finite_and_scale_op for static_amp.
      
      * use amp_check_finite_and_scale in static graph amp.
      
      * update grads to zero when grads own infinite values(as for amp_checkout_finite_and_scale op).
      
      * add update_loss_scaling op in cpp.
      
      * add update_loss_scaling_op unit test.
      
      * update the doc of the check_finite_and_unscale op
      
      * Update the process of gradients updating skipping if the gradients have infinite values.
      
      * update the way to zero grads.
      
      * update test_update_loss_scaling_op.py
      
      * add log info when find infinite grads.
      
      * add the unit test for UpdateLossScaling Layer.
      d708b210
  21. 08 9月, 2020 1 次提交
  22. 27 8月, 2020 1 次提交
  23. 25 8月, 2020 1 次提交
  24. 24 8月, 2020 1 次提交
    • W
      api2.0 paddle.nn.Bilinear and paddle.nn.functional.bilinear (#26399) · 422a1620
      wanghuancoder 提交于
      * api2.0 paddle.nn.Bilinear and paddle.nn.functional.bilinear, test=develop
      
      * api2.0 fix code examples, test=develop
      
      * modify test_bilinear_api, about place,to_tensor , test=develop
      
      * re pass pre-commit, test=develop
      
      * Update common.py
      
      * fix BilinearTensorProduct ci error, test=develop
      422a1620
  25. 23 8月, 2020 1 次提交
  26. 19 8月, 2020 1 次提交
  27. 18 8月, 2020 1 次提交
  28. 17 8月, 2020 1 次提交
  29. 12 8月, 2020 1 次提交
  30. 09 7月, 2020 1 次提交
  31. 08 7月, 2020 1 次提交
    • C
      fix instance norm in dy (#24717) · 52be62c5
      ceci3 提交于
      * fix bn & in in dy, test=develop
      
      * update instance_norm,test=develop
      
      * fix bugs,test=develop
      
      * add more case in unittest,test=develop
      
      * fix,test=develop
      
      * fix,test=develop
      52be62c5
  32. 10 6月, 2020 1 次提交
  33. 04 6月, 2020 1 次提交
  34. 01 6月, 2020 1 次提交
  35. 18 5月, 2020 1 次提交
  36. 22 4月, 2020 1 次提交
  37. 21 4月, 2020 1 次提交
  38. 31 3月, 2020 1 次提交
    • L
      Feature/expand params in auto-generated pybind functions for dygraph operators (#23181) · 488b2387
      Leo Chen 提交于
      * expand parameters, test=develop
      
      * support resnet, test=develop
      
      * fix resnet, test=develop
      
      * support duplicable out, test=develop
      
      * support ptb
      
      * fix bugs, test=develop
      
      * support null input, test=develop
      
      * fix bugs, test=develop
      
      * fix batchNorm is_test, test=develop
      
      * refine code, test=develop
      
      * follow comments, test=develop
      
      * follow comments, test=develop
      
      * follow comments, test=develop
      
      * follow comments, test=develop
      488b2387
  39. 19 1月, 2020 1 次提交