1. 09 4月, 2021 4 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
    • S
      fix unittest timeour (#32161) · a73cb679
      Shang Zhizhou 提交于
      a73cb679
    • A
      [Dy2Stat] Fix undefined var used in For (#32153) · 4636d136
      Aurelius84 提交于
      * fix undefind var in For
      
      * fix code style
      4636d136
    • A
      [Dy2Stat] Support DictCmp and zip grammer (#32159) · 55730d95
      Aurelius84 提交于
      * support DictCmp and zip grammar
      
      * fix code style
      55730d95
  2. 08 4月, 2021 3 次提交
  3. 07 4月, 2021 4 次提交
    • D
      add uint8 type for flatten op (#32120) · 297290a8
      danleifeng 提交于
      * add uint8 type for flatten;test=develop
      297290a8
    • Z
      【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3
      zhang wenhui 提交于
      * Ascend rc (#30483)
      
      * Fix compilcation on CANN20.1 and older (#30494)
      
      Fix compilcation on CANN20.1 and older
      
      * Add distribution supported (#30578)
      
      Add distribution supported
      
      * Build praser for Hcom* operators (#30627)
      
      Build praser for Hcom* operators
      
      * Pass device_ids info from launch to trainer. (#30632)
      
      Pass device_ids info from launch to trainer
      
      * Add Hccl program group (#30642)
      
      Add Hccl program group
      
      * Add startup bash files of test_ascend_group. (#30645)
      
      Add startup bash files of test_ascend_group
      
      * cleanup (#30646)
      
      cleanup test_ascend_group.py
      
      * [Feature] Build parser to support distributed training (#30658)
      
      [Feature] Build parser to support distributed training
      
      * fix compilation on ascend-20.1 (#30722)
      
      fix compilation on ascend-20.1
      
      * Dev/fix ascend string (#30749)
      
      Dev/fix ascend string
      
      * code style (#30781)
      
      code style
      
      * Merge ascend_optimizer and ascend_parser. (#30776)
      
      Merge ascend_optimizer and ascend_parser.
      
      * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)
      
      Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug
      
      * Add paddle ascend distribution training supported (#30796)
      
      Add paddle ascend distribution training supported
      
      * pass cxx_flags to gloo cmake (#30857)
      
      * Destroy session first. (#30954)
      
      Destroy session first.
      
      * merge
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style, test=develop
      
      * fix, test=develop
      
      * fix
      
      * fix log fatal, test=develop
      
      * fix enforce style, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix rccl, test=develop
      
      * fix test, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix node_num, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Ndingsiyu <18369187719@163.com>
      Co-authored-by: NOleNet <olenet@126.com>
      8c7c53b3
    • J
      [3D-parallelism] Hybrid Model Parallelism (#32074) · 1e60a0c4
      JZ-LIANG 提交于
      1e60a0c4
    • C
  4. 06 4月, 2021 3 次提交
  5. 02 4月, 2021 3 次提交
    • J
    • W
      support save/load single tensor (#31756) · 43367e4b
      WeiXin 提交于
      * support save/load single tensor
      
      * compatibility modification according to unnittest
      
      * Some python2.7 don't have 'copyreg' modules
      
      * Handle a syntax error.
      
      * Dealing with compatibility problems on Mac.
      
      * Dealing with compatibility problems on Mac.
      
      * edit unittest to improve coverage.
      
      * Modify the code according to the review comments
      
      * Reduce redundant code.
      
      * support for static graph loading dygraph state_dict
      
      * edit code according to CI
      
      * edit unittest
      
      * edit unnittest
      
      * delete redundant file
      
      * edit code according to Comments
      
      * edit english doc
      
      * edit english doc
      
      * edit English DOC.
      
      * get/set_tensor->get/set_value; return_numpy=False
      
      * get/set_tensor->get/set_value; return_numpy=False
      
      * edit unnittest
      
      * edit unnittest
      
      * polish code.
      43367e4b
    • S
      graph engine (#31226) · 94736d60
      seemingwang 提交于
      * graph engine demo
      
      * upload unsaved changes
      
      * fix dependency error
      
      * fix shard_num problem
      
      * py client
      
      * remove lock and graph-type
      
      * add load direct graph
      
      * add load direct graph
      
      * add load direct graph
      
      * batch random_sample
      
      * batch_sample_k
      
      * fix num_nodes size
      
      * batch brpc
      
      * batch brpc
      
      * add test
      
      * add test
      
      * add load_nodes; change add_node function
      
      * change sample return type to pair
      
      * resolve conflict
      
      * resolved conflict
      
      * resolved conflict
      
      * separate server and client
      
      * merge pair type
      
      * fix
      
      * resolved conflict
      
      * fixed segment fault; high-level VLOG for load edges and load nodes
      
      * random_sample return 0
      
      * rm useless loop
      
      * test:load edge
      
      * fix ret -1
      
      * test: rm sample
      
      * rm sample
      
      * random_sample return future
      
      * random_sample return int
      
      * test fake node
      
      * fixed here
      
      * memory leak
      
      * remove test code
      
      * fix return problem
      
      * add common_graph_table
      
      * random sample node &test & change data-structure from linkedList to vector
      
      * add common_graph_table
      
      * sample with srand
      
      * add node_types
      
      * optimize nodes sample
      
      * recover test
      
      * random sample
      
      * destruct weighted sampler
      
      * GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * WeightedGraphEdgeBlob to GraphEdgeBlob
      
      * pybind sample nodes api
      
      * pull nodes with step
      
      * fixed pull_graph_list bug; add test for pull_graph_list by step
      
      * add graph table;name
      
      * add graph table;name
      
      * add pybind
      
      * add pybind
      
      * add FeatureNode
      
      * add FeatureNode
      
      * add FeatureNode Serialize
      
      * add FeatureNode Serialize
      
      * get_feat_node
      
      * avoid local rpc
      
      * fix get_node_feat
      
      * fix get_node_feat
      
      * remove log
      
      * get_node_feat return  py:bytes
      
      * merge develop with graph_engine
      
      * fix threadpool.h head
      
      * fix
      
      * fix typo
      
      * resolve conflict
      
      * fix conflict
      
      * recover lost content
      
      * fix pybind of FeatureNode
      
      * recover cmake
      
      * recover tools
      
      * resolve conflict
      
      * resolve linking problem
      
      * code style
      
      * change test_server port
      
      * fix code problems
      
      * remove shard_num config
      
      * remove redundent threads
      
      * optimize start server
      
      * remove logs
      
      * fix code problems by reviewers' suggestions
      Co-authored-by: NHuang Zhengjie <270018958@qq.com>
      Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
      Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
      Co-authored-by: Nluobin06 <luobin06@baidu.com>
      Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
      94736d60
  6. 01 4月, 2021 7 次提交
    • S
      Support control flow in DataParallel (#31625) · 8460698b
      ShenLiang 提交于
      * support control flow
      
      * supoort sync_parameters_buffers
      
      * fix the bug of sparse embedding
      8460698b
    • C
      add custom init grad for backward function (#31540) · 83b953f5
      chentianyu03 提交于
      * add custom init grad for backward function
      
      * add custom init grad for backward function
      
      * handle when the grad_tensor is none
      
      * handle when the grad_tensor is none
      
      * fix the args type error on windows platform
      
      * modify the args order and doc
      
      * format code
      
      * add grad_tensor to xpu
      
      * modify the grad_tensor type check
      
      * add paddle.backward api to support multi tensors gradient compute
      
      * add paddle.backward api to support multi tensors gradient compute
      
      * add paddle.atuograd module and backward api
      
      * change tensor.backward func args
      
      * modify tensor backward api
      
      * remove create_graph intputs args
      
      * add doc and examplex code for backward api
      
      * when have the same tensor, throw error
      
      * modify test Init func args
      
      * modify the execute.Init func args in test files
      
      * add paddle.autograd package in setup.py.in
      
      * modify error msg, remove _run_backward method in class Tensor
      
      * add test cases for backward api
      83b953f5
    • T
      LOG CLEAN (#31819) · 0589ed21
      tangwei12 提交于
      * upgrade vlog
      
      * train from dataset fetch optimize
      0589ed21
    • Z
      [Paddle-TRT] add anchor generator op plugin (#31730) · b807e408
      zlsh80826 提交于
      * add anchor generator op plugin
      
      * add anchor generator unit_test
      
      * remove dbg info
      
      * remove redundant line
      
      * replace assertion with paddle enforce
      
      * dynamic plugin replaces assertion with paddle enforce
      
      * anchor generator support dynamic shape on spatial axis
      
      * anchor generator test with fp16, dynamic shape
      
      * add anchor generator test all
      
      * add back main
      
      * reduce test input size to not exceed the timelimit of ci
      
      * change super to InferencePassTest for python2 compatibility
      
      * reuse paddle operator anchor generator
      
      * move creator construct to header with default
      
      * add cuda ifdef
      
      * reduce line
      
      * change super to InferencePassTest for python2 compatibility
      
      * fix anchor generator fp16 serialize setting
      
      * split unittest from test_all
      
      * restrict anchor generator input format before version 7234
      
      * anchor generator only support greater than trt7.1
      
      * change min_graph_size to 2
      
      * min_graph size to 3 if dynamic shape
      
      * reduce dynamic shape size to avoid trt search tactic too long to exceed time limit
      
      * remove anchor from fetch list
      
      * anchor generator support all trt version
      
      * fix memory not allocated but if serialized
      b807e408
    • Z
      Support uint8_t for fill_constant_op (#31911) · 980227f9
      Zhang Zheng 提交于
      980227f9
    • K
      new group (#31682) · 07741593
      kuizhiqing 提交于
      * new group
      
      * ci compatible fix
      
      * assert nccl
      07741593
    • C
      Refactor and simplify hook design & add Tensor.register_hook API (#31775) · dbeb3ea4
      Chen Weihang 提交于
      * refactor and simplify hook design
      
      * fix reducer add hook error
      
      * add Tensor.register_hook basic impl
      
      * refine prepare data impl
      
      * revert prepare data change
      
      * support register_hook for Tensor
      
      * add hook test in model
      
      * polish tests and doc example
      
      * fix double grad test failed
      
      * remove reduce hook func
      
      * fix set empty error
      
      * polish code by comments
      
      * change reduce_hook to mutable_hook
      
      * remove useless tmp_ins
      
      * fix shape code format error
      
      * fix shape code format error
      dbeb3ea4
  7. 31 3月, 2021 4 次提交
    • W
      Update eigen version to f612df27 (#31832) · 495e7f9c
      wuhuanzhou 提交于
      * update eigen version to f612df27, test=develop
      
      * fix compilation error, test=develop
      
      * remove patch command in eigen, test=develop
      
      * fix compilation error caused by call Eigen function with float16 and bfloat16, test=develop
      
      * fix unittest error, test=develop
      
      * fix unittest error caused by precision, test=develop
      
      * remove patch files used by old version eigen, test=develop
      495e7f9c
    • T
      fix some bug in transformer training in xpu (#31918) · 52b05bac
      taixiurong 提交于
      52b05bac
    • W
      support minus-int idx to LayerList (#31750) · 5394194e
      Wenyu 提交于
      * support minus-int idx to LayerList
      * update layerlist test
      5394194e
    • F
      [ROCM] Add ROCm support for warpctc op (#31817) · ef8323d4
      furnace 提交于
      * bugfix for warpctc
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * fix WARPCTC_WITH_HIP invalid
      
      * Add logs to find out why can not dlopen libwarpctc.so
      
      * fix warpctc commit id
      
      * fix unit test test_warpctc_op
      
      * Optime failed log for dlopen
      
      * Optime failed log for dlopen
      
      * Delete extra changes
      
      * fix warpctc commit id
      
      * fix warpctc commit id
      
      * Add is_compiled_with_rocm for test_warpctc_op
      
      * fix warpctc commit id
      
      * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed
      
      * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed
      
      * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed
      
      * fix code style problems
      ef8323d4
  8. 30 3月, 2021 8 次提交
    • L
    • J
      Added int8 kernel for oneDNN LSTM op (#31894) · 6dca7a1d
      jakpiase 提交于
      6dca7a1d
    • Z
      fix bug when dtype of to_tensor is core.VarType (#31931) · 245252b8
      Zhou Wei 提交于
      245252b8
    • W
      fe284868
    • C
      add deprecated for softmax_with_cross_entropy (#31722) · 73a6fa3e
      chajchaj 提交于
      * add deprecated for softmax_with_cross_entropy, test=develop
      
      * test for deprecated in english doc, test=develop
      
      * test deprecated for softmax_with_cross_entropy in english doc, test=develop
      
      * fix readme and English doc for cross_entropy, test=develop
      
      * rm test for softmax_with_cross_entropy deprecated, test=develop
      
      * update readme for CrossEntropyLoss, test=develop
      
      * fix readme format, test=develop
      
      * fix readme format, test=develop
      
      * fix readme format for cross_entropy, test=develop
      
      * add softmax_switch and fix softlabel for cross_entropy, test=develop
      
      * 1)recovery softmax_with_cross_entropy in fluid 2) change softmax_switch to use_softmax 3) add example for softlabel for cross_entropy, test=develop
      
      * fix Example number for cross_entropy, test=develop
      
      * fix code format, test=develop
      
      * fix for CI-Coverage, test=develop
      
      * fix for CI-Coverage, test=develop
      
      * fix ci-coverage for Non-ASCII character '\xe2' in file, test=develop
      
      * fix ci-coverage for Non-ASCII character '\xe2' in nn.layer.loss.py, test=develop
      
      * update description for doc when use_softmax=Fasle, test=develop
      
      * fix some docs and code example for cross_entropy, test=develop
      
      * delete redundant description for soft_label parameter of cross_entropy, test=develop
      
      * fix some comment for test_cross_entropy_loss.py, test=develop
      73a6fa3e
    • S
      fix batchnorm when inpu dims < 3 (#31933) · 8084b759
      Shang Zhizhou 提交于
      * fix batchnorm when inpu dims < 3
      
      * add unittest for batchnorm dims = 2
      8084b759
    • Z
      [Paddle-TRT] yolobox (#31755) · 64ee255f
      zlsh80826 提交于
      * yolobox converter and plugin
      
      * yolobox unittest
      
      * add dynamic shape restriction
      
      * fix git merge log
      64ee255f
    • A
      Fix segment Fault from set_value (#31891) · c4b60efa
      Aurelius84 提交于
      * Avoid raising warning while import paddle
      
      * fix segment fault of set_value
      
      * fix code style
      c4b60efa
  9. 29 3月, 2021 4 次提交
    • L
      525c32e3
    • R
      123949eb
    • Z
      [Paddle-TRT] roi_align_plugin (#31732) · e3a38d79
      zlsh80826 提交于
      * add roi_align_plugin
      
      * add roi align unit_test
      
      * add roi align serialization
      
      * remove roi align static plugin because of batch dim issue
      
      * refine roi align unittest and add fp16/serialization
      
      * add trt roi align condition to op_teller
      
      * refine error message
      
      * remove unnecessary reshape layer
      e3a38d79
    • Z
      [Paddle-TRT] trt affine channel converter (#31628) · bfb5cf55
      zlsh80826 提交于
      * trt affine channel converter
      
      * add trt affine channel base test
      
      * add trt affine channel NHWC
      
      * remove asterisk for python2 compatibility
      
      * trt affine channel converter
      
      * add trt affine channel base test
      
      * add trt affine channel NHWC
      
      * remove asterisk for python2 compatibility
      
      * fix rebase
      
      * move LodTensor to Tensor
      
      * add dbg info
      
      * affine channel converter only support NCHW
      
      * scale,bias are parameters, use create_parameters api
      
      * reduce test input size to not exceed the timelimit of ci
      
      * refine affine channel unittest and add serialization/dynamic test
      
      * change super to InferencePassTest for python2 compatibility
      
      * change super to InferencePassTest for python2 compatibility
      
      * fix affine channel fp16 serialize setting
      bfb5cf55