1. 14 4月, 2021 1 次提交
    • Z
      fix matrix_inverse_op with rocm (#32128) · 995b5f2c
      zhulei 提交于
      * fix matrix_inverse_op with rocm
      
      * fix matrix_inverse_op with rocm
      
      * fix matrix_inverse_op with rocm
      
      * fix matrix_inverse_op with rocm
      995b5f2c
  2. 13 4月, 2021 2 次提交
  3. 12 4月, 2021 3 次提交
    • R
      [ROCM] fix some unittests (#32129) · bd2a4e23
      ronnywang 提交于
      * [ROCM] fix test_gru_rnn_op
      
      * [ROCM] fix test_expand_op
      
      * [ROCM] fix test_cross_entropy_loss
      
      * [ROCM] fix test_conv_nn_grad
      
      * [ROCM] fix test_bilinear_tensor_product_op
      
      * [ROCM] fix elementwise_op_function
      
      * [ROCM] fix test_lstm_cudnn_op
      
      * [ROCM] fix test_gpu_package_without_gpu_device
      
      * [ROCM] fix test_gru_unit_op
      
      * [ROCM] fix test_imperative_optimizer
      
      * [ROCM] fix rnn
      
      * [ROCM] fix group_norm_op
      
      * [ROCM] fix test_pool3d_api
      
      * [ROCM] fix test_pool3d_op
      bd2a4e23
    • L
      d8afe407
    • T
      fix concat_grad on kunlun (#32151) · a2387ef2
      TTerror 提交于
      * fix concat_grad on kunlun
      
      * fix concat_grad on kunlun
      a2387ef2
  4. 10 4月, 2021 1 次提交
  5. 09 4月, 2021 3 次提交
    • N
      make high precision for avg_pool and adaptive_avg_pool when data_type is float16 (#31887) · ec2ffb68
      niuliling123 提交于
      * make high precision for avg_pool
      ec2ffb68
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
    • Y
  6. 08 4月, 2021 2 次提交
  7. 07 4月, 2021 4 次提交
    • D
      add uint8 type for flatten op (#32120) · 297290a8
      danleifeng 提交于
      * add uint8 type for flatten;test=develop
      297290a8
    • Z
      【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3
      zhang wenhui 提交于
      * Ascend rc (#30483)
      
      * Fix compilcation on CANN20.1 and older (#30494)
      
      Fix compilcation on CANN20.1 and older
      
      * Add distribution supported (#30578)
      
      Add distribution supported
      
      * Build praser for Hcom* operators (#30627)
      
      Build praser for Hcom* operators
      
      * Pass device_ids info from launch to trainer. (#30632)
      
      Pass device_ids info from launch to trainer
      
      * Add Hccl program group (#30642)
      
      Add Hccl program group
      
      * Add startup bash files of test_ascend_group. (#30645)
      
      Add startup bash files of test_ascend_group
      
      * cleanup (#30646)
      
      cleanup test_ascend_group.py
      
      * [Feature] Build parser to support distributed training (#30658)
      
      [Feature] Build parser to support distributed training
      
      * fix compilation on ascend-20.1 (#30722)
      
      fix compilation on ascend-20.1
      
      * Dev/fix ascend string (#30749)
      
      Dev/fix ascend string
      
      * code style (#30781)
      
      code style
      
      * Merge ascend_optimizer and ascend_parser. (#30776)
      
      Merge ascend_optimizer and ascend_parser.
      
      * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)
      
      Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug
      
      * Add paddle ascend distribution training supported (#30796)
      
      Add paddle ascend distribution training supported
      
      * pass cxx_flags to gloo cmake (#30857)
      
      * Destroy session first. (#30954)
      
      Destroy session first.
      
      * merge
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style, test=develop
      
      * fix, test=develop
      
      * fix
      
      * fix log fatal, test=develop
      
      * fix enforce style, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix rccl, test=develop
      
      * fix test, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix node_num, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Ndingsiyu <18369187719@163.com>
      Co-authored-by: NOleNet <olenet@126.com>
      8c7c53b3
    • O
      improve performance of DepthwiseConv(NHWC) (#31677) · 363b25aa
      Ouyang Chao 提交于
      * improve performance of DepthwiseConv(NWHC)
      363b25aa
    • T
      Struct SparseValue && Bug Fix (#31721) · a881b4d5
      tangwei12 提交于
      * add PullSparseValue for pull sparse
      
      * fix bug for PullSparseValue
      
      * add test mode in lookuptable
      
      * revert API change
      
      * add comment for is_training
      a881b4d5
  8. 06 4月, 2021 3 次提交
  9. 03 4月, 2021 1 次提交
  10. 02 4月, 2021 2 次提交
  11. 01 4月, 2021 6 次提交
    • Q
      a4b30a12
    • H
      remove useless code (#32001) · 9c5d0286
      hutuxian 提交于
      9c5d0286
    • Z
      [Paddle-TRT] add anchor generator op plugin (#31730) · b807e408
      zlsh80826 提交于
      * add anchor generator op plugin
      
      * add anchor generator unit_test
      
      * remove dbg info
      
      * remove redundant line
      
      * replace assertion with paddle enforce
      
      * dynamic plugin replaces assertion with paddle enforce
      
      * anchor generator support dynamic shape on spatial axis
      
      * anchor generator test with fp16, dynamic shape
      
      * add anchor generator test all
      
      * add back main
      
      * reduce test input size to not exceed the timelimit of ci
      
      * change super to InferencePassTest for python2 compatibility
      
      * reuse paddle operator anchor generator
      
      * move creator construct to header with default
      
      * add cuda ifdef
      
      * reduce line
      
      * change super to InferencePassTest for python2 compatibility
      
      * fix anchor generator fp16 serialize setting
      
      * split unittest from test_all
      
      * restrict anchor generator input format before version 7234
      
      * anchor generator only support greater than trt7.1
      
      * change min_graph_size to 2
      
      * min_graph size to 3 if dynamic shape
      
      * reduce dynamic shape size to avoid trt search tactic too long to exceed time limit
      
      * remove anchor from fetch list
      
      * anchor generator support all trt version
      
      * fix memory not allocated but if serialized
      b807e408
    • Z
      Optimize the perf of SameDimsAdd CUDA Kernel (#31872) · 4acc87be
      Zhang Zheng 提交于
      4acc87be
    • Z
      Support uint8_t for fill_constant_op (#31911) · 980227f9
      Zhang Zheng 提交于
      980227f9
    • K
      new group (#31682) · 07741593
      kuizhiqing 提交于
      * new group
      
      * ci compatible fix
      
      * assert nccl
      07741593
  12. 31 3月, 2021 7 次提交
    • K
      fix one error massage (#31904) · 6f85e241
      Kqnonrime 提交于
      * fix one error massage
      
      * fix a error message
      
      * new fix three error messages
      
      * new fix three error messages
      
      * new fix some error
      
      * new fix one error message
      6f85e241
    • T
      delete cuda9 code (#31883) · ea738dda
      tianshuo78520a 提交于
      ea738dda
    • W
      Update eigen version to f612df27 (#31832) · 495e7f9c
      wuhuanzhou 提交于
      * update eigen version to f612df27, test=develop
      
      * fix compilation error, test=develop
      
      * remove patch command in eigen, test=develop
      
      * fix compilation error caused by call Eigen function with float16 and bfloat16, test=develop
      
      * fix unittest error, test=develop
      
      * fix unittest error caused by precision, test=develop
      
      * remove patch files used by old version eigen, test=develop
      495e7f9c
    • W
      update compilation with C++14 (#31815) · 587d99ae
      wuhuanzhou 提交于
      * update compilation with C++14, test=develop
      
      * fix compilation error in eigen, test=develop
      587d99ae
    • T
      fix split core (#31892) · 393b3bd6
      Thunderbrook 提交于
      * fix split core
      
      * format
      393b3bd6
    • T
      fix some bug in transformer training in xpu (#31918) · 52b05bac
      taixiurong 提交于
      52b05bac
    • F
      [ROCM] Add ROCm support for warpctc op (#31817) · ef8323d4
      furnace 提交于
      * bugfix for warpctc
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * fix WARPCTC_WITH_HIP invalid
      
      * Add logs to find out why can not dlopen libwarpctc.so
      
      * fix warpctc commit id
      
      * fix unit test test_warpctc_op
      
      * Optime failed log for dlopen
      
      * Optime failed log for dlopen
      
      * Delete extra changes
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * Add is_compiled_with_rocm for test_warpctc_op
      
      * fix warpctc commit id
      
      * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed
      
      * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed
      
      * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed
      
      * fix code style problems
      ef8323d4
  13. 30 3月, 2021 2 次提交
  14. 29 3月, 2021 3 次提交