1. 16 9月, 2021 1 次提交
  2. 14 9月, 2021 1 次提交
    • C
      Add api paddle.device.cuda.empty_cache to release idle gpu memory hold by allocator。 (#35427) · 83932715
      chenenquan 提交于
      * Add empty_cache api to release idle gpu memory hold by allocator,test=develop
      
      * Add empty_cache api to release idle gpu memory hold by allocator,test=develop
      
      * Add empty_cache api to release idle gpu memory hold by allocator,test=develop
      
      * Fix test coverage problem for empty_cache
      
      * delete redundant check for empty_cache
      
      * fix the problem of empty_cache's doc
      
      * delete the nvidia-smi comment in doc of empty_cache, test=document_fix
      83932715
  3. 08 9月, 2021 1 次提交
  4. 31 8月, 2021 1 次提交
  5. 26 8月, 2021 1 次提交
  6. 24 8月, 2021 1 次提交
    • W
      add fetch, test=develop (#35019) · a5060b55
      wanghuancoder 提交于
      * add fetch, test=develop
      
      * fix fetch2op, test=develop
      
      * fix fetch2op, test=develop
      
      * refine, test=develop
      
      * fix fetch ctx, test=develop
      
      * add wait, test=develop
      
      * rename fetch2 to fetch_v2, test=develop
      
      * merge, test=develop
      a5060b55
  7. 18 8月, 2021 2 次提交
    • W
      code refactoring for new executor (#34970) · 40d4d834
      wanghuancoder 提交于
      * code refactoring, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      40d4d834
    • Z
      Add function to disable paddle signal handler (#34577) · dd533dd3
      Zhanlue Yang 提交于
      * Add function to disable paddle signal handler
      
      Paddle used google::InstallFaultSignalHandler to handle selected system signals,
      mainly for debugging and bug report purposes.
      
      However, this can be conflicted with other python packages whoever captures similar signals.
      Such python package involves tvm and more
      
      To resolve this issue, we support a function to disable signal handler
      
      * Remove signal test from WIN32 platform
      
      * Remove redundant return from disable_signal_handler() function
      
      * Add detailed messages to en_doc
      dd533dd3
  8. 17 8月, 2021 2 次提交
    • C
      Copy boost optional to Paddle (#34780) · 9be41447
      chentianyu03 提交于
      * copy boost optional.hpp to paddle
      
      * copy boost optional.hpp to paddle
      
      * move directions
      
      * del fluid/utils
      
      * modify .hpp to .h
      
      * move directions
      
      * modify to paddle::optional
      
      * add modification description
      
      * format code stype for the files in paddle/utils
      
      * format code stype
      9be41447
    • Z
      Add some passes which can be applied to Program (#34730) · 8046e33d
      Zeng Jinle 提交于
      * add inplace passes and tests
      
      * update
      
      * fix use_cuda undefined
      fix compile error of op compat
      
      * add more ut
      
      * fix CPU CI error
      
      * check adam unique
      
      * fix mac/windows ci, improve coverage
      
      * fix ci error
      
      * follow weihang's comment
      
      * fix BlockDesc::MoveFrom
      
      * follow qiuliang's comment
      
      * update
      
      * follow huihuang's comments
      8046e33d
  9. 13 8月, 2021 1 次提交
  10. 11 8月, 2021 2 次提交
  11. 06 8月, 2021 1 次提交
  12. 05 8月, 2021 1 次提交
    • H
      New executor dev (#34407) · 012d12b5
      hong 提交于
      * first test version
      
      * add test exec;
      
      * add data transfer; test=develop
      
      * add new exec head;
      
      * add memcpy; test=develop
      
      * add python fetch
      
      * add new test
      
      * add graph node; test=develop
      
      * remove useless new executor test; test=develop
      
      * remove gperf dependency; test=develop
      
      * fix compile bugs; test=develop
      
      * remove useless code; test=develop
      
      * remove useless code; test=develop
      
      * add uni test; test=develop
      
      * polish code; test=develop
      
      * polish code; test=develop
      
      * add interpreter cmakefile; test=develop
      
      * remove useless code; test=develop
      012d12b5
  13. 03 8月, 2021 1 次提交
  14. 02 8月, 2021 1 次提交
  15. 29 7月, 2021 1 次提交
    • Z
      add fix op run order pass (#34427) · 79e758c6
      Zeng Jinle 提交于
      * add fix op run order pass
      
      * add ut for fix_op_run_order
      
      * fix ci error
      
      * improve coverage
      
      * improve coverge again and fix cpu test case
      
      * follow some comments
      79e758c6
  16. 27 7月, 2021 1 次提交
  17. 23 7月, 2021 1 次提交
  18. 22 7月, 2021 2 次提交
  19. 19 7月, 2021 1 次提交
    • C
      Add Cuda event and stream API (#32460) · 9c7f6af5
      chentianyu03 提交于
      * add cuda event and stream api
      
      * add cuda event and stream api
      
      * add get_current_stream api
      
      * add get_current_stream api
      
      * init streams
      
      * modify get_current_stream
      
      * modify get_cuttent_stream
      
      * add synchronize func
      
      * add current_stream doc and test file
      
      * move get_current_stream into CUDA macro
      
      * move CudaEvent into CUDA macro
      
      * move _get_current_stream and _device_synchronize into cuda macro
      
      * modify the macro of cuda stream and event
      
      * add test case for synchronize
      
      * add paddle.devices.cuda module
      
      * event and stream support hip
      
      * add doc for stream and event class
      
      * move cuda stream and event into single pybind
      
      * add cuda_streams_py.cc to cmakelist
      
      * add _device_synchronize and _get_current_stream to core module
      
      * add test case for cudastream and cudaevent
      
      * move __all__ in streams.py
      
      * fix test fail
      
      * add cuda to devices __all__
      
      * fix current_stream doc writing error
      
      * move devices to device direction, and merge device.py into __init__.py
      
      * add required:gpu to sample codes
      
      * remove cuda direction from device/__init__.py
      9c7f6af5
  20. 15 7月, 2021 1 次提交
  21. 13 7月, 2021 1 次提交
  22. 06 7月, 2021 1 次提交
  23. 30 6月, 2021 1 次提交
  24. 29 6月, 2021 1 次提交
  25. 23 6月, 2021 1 次提交
    • W
      optimize attr default value (#33357) · 5d2eb678
      wanghuancoder 提交于
      * optimize attr default value, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * fix bug in AttrReader, test=develop
      
      * fix bug, test=develop
      
      * fix double_grad, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * fix checker null, test=develop
      
      * for test, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      5d2eb678
  26. 09 6月, 2021 1 次提交
    • W
      paddle.save support object save to memory. (#32999) · cdd6437a
      WeiXin 提交于
      * support state_dict save to memory.
      
      * Perfect unittest
      
      * perfect unittest.
      
      * suport saving binary var to memory
      
      * polish code.
      
      * packag save/load files into pybind/io.py
      
      * polish code .
      
      * add example for save to memory; remove useless save load function(_load_static_dict,_save_dygraph_dict)
      
      * delete _load_static/dygraph_dict;_save_static/dygraph_dict
      
      * edit example of paddle.save/load
      cdd6437a
  27. 25 5月, 2021 1 次提交
  28. 22 4月, 2021 2 次提交
    • W
      support save/load binary format tensor. (#32211) · f4d9adc7
      WeiXin 提交于
      * support save/load binary format tensor
      
      * Fix error when create cudaplace
      
      * Fix error when create cudaplace
      
      * Fix error when create cudaplace
      
      * get devive context from pool.
      
      * move define of 'SerializeToStream' and 'DeserializeFromStream' to 'lod_tensor.cc' and 'selected_rows.cc'.
      
      * improve coverage.
      
      * improve coverage.
      
      * polish API
      
      * deal with conflict
      
      * disable save/load large file in unnittest
      
      * split unnittest.
      f4d9adc7
    • T
      e58c705b
  29. 21 4月, 2021 1 次提交
  30. 19 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
      Leo Chen 提交于
      * [NPU] support GarbageCollector for npu (#31874)
      
      * support GarbageCollector for npu
      
      * fix typo
      
      * fix gather_grad
      
      * disable NPUDefaultStreamGarbageCollector on NPU
      
      * [NPU] support npu for memcpy op (#31808)
      
      * support npu for memcpy op
      
      * add ut
      
      * fix ut
      
      * fix typo
      
      * 【NPU】fix bug of using temp vector (#31963)
      
      * fix bug when beta1_pow on cpu (#31995)
      
      * [NPU] support npu profiler (#31684)
      
      * support npu profiler
      
      * add python api
      
      * fix bugs
      
      * add wrapper for incomplete type
      
      * update profile proto
      
      * record npu wait
      
      * add xpu placeholder
      
      * fix adam (#32016)
      
      * [NPU] enable async copy and  add wait before sync operation (#31956)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * make TensorFromVector/TensorToVector sync
      
      * [NPU] Support dataloader on npu place. (#31867)
      
      * [NPU] Wait on NPUPlace (#32086)
      
      * [NPU] fix cast op (#32121)
      
      * fix npu kernel of cast op to handle casting to same dtype
      
      * add comments
      
      * [NPU] support cann 20.3 (#32044)
      
      * fix compile problem on cann 20.3
      
      * fix ut
      
      * fix test_mul
      
      * fix check_finite_and_scale
      
      * fix lookup_table_v2_grad
      
      * fix cmake
      
      * support print op
      
      * [NPU] Support npu save load (#31893)
      
      * support save load for NPU
      
      * add save load npu unittest
      
      * support np.array transform in NPU
      
      * fix errors
      
      * delete dygraph in unittest
      
      * add Wait
      
      * fix unittest
      
      * fix review comment
      
      * fix unittest problem
      
      * fix little problem
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
      
      * refine code
      
      * fix NPUDeviceContext in all c++ unittest (#32198)
      
      * fix NPUDeviceContext in all c++ unittest
      
      * refine log
      Co-authored-by: Npangyoki <pangyoki@126.com>
      
      * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * change TensorFromVector to FillNpuTensorWithConstant
      
      * fix ignored api
      
      * delete extra unittest
      
      * fix little error
      
      * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
      
      * change TensorCopySync to TensorCopy
      
      * delete useless Wait and add StreamWait
      
      * fix npu_stream error
      
      * fix check_finite_and_unscale_op_npu TensorCopy
      
      * only save stream wait
      
      * fix NPUDeviceContext in all c++ unittest
      
      * delete wait
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * delete useless unittest file (#32206)
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix adam bug again (#32246)
      
      * fix compile
      
      * fix ut
      
      * fix ut
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      cbe5c9f8
  31. 15 4月, 2021 2 次提交
    • 1
      tree-based-model (#31696) · a8c3a902
      123malin 提交于
      * add index_dataset and index_sampler for tree-based model
      a8c3a902
    • T
      heterps support pscore (#32093) · 9f8c8f96
      Thunderbrook 提交于
      * pscore support heterps
      
      * fleet cmake
      
      * fleet wrapper
      
      * macro
      
      * solve conflict
      
      * solve conflict
      
      * add unitest
      
      * paddle enforce
      
      * unitest
      
      * unitest
      
      * unitest
      9f8c8f96
  32. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  33. 08 4月, 2021 1 次提交
  34. 07 4月, 2021 1 次提交
    • Z
      【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3
      zhang wenhui 提交于
      * Ascend rc (#30483)
      
      * Fix compilcation on CANN20.1 and older (#30494)
      
      Fix compilcation on CANN20.1 and older
      
      * Add distribution supported (#30578)
      
      Add distribution supported
      
      * Build praser for Hcom* operators (#30627)
      
      Build praser for Hcom* operators
      
      * Pass device_ids info from launch to trainer. (#30632)
      
      Pass device_ids info from launch to trainer
      
      * Add Hccl program group (#30642)
      
      Add Hccl program group
      
      * Add startup bash files of test_ascend_group. (#30645)
      
      Add startup bash files of test_ascend_group
      
      * cleanup (#30646)
      
      cleanup test_ascend_group.py
      
      * [Feature] Build parser to support distributed training (#30658)
      
      [Feature] Build parser to support distributed training
      
      * fix compilation on ascend-20.1 (#30722)
      
      fix compilation on ascend-20.1
      
      * Dev/fix ascend string (#30749)
      
      Dev/fix ascend string
      
      * code style (#30781)
      
      code style
      
      * Merge ascend_optimizer and ascend_parser. (#30776)
      
      Merge ascend_optimizer and ascend_parser.
      
      * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)
      
      Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug
      
      * Add paddle ascend distribution training supported (#30796)
      
      Add paddle ascend distribution training supported
      
      * pass cxx_flags to gloo cmake (#30857)
      
      * Destroy session first. (#30954)
      
      Destroy session first.
      
      * merge
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style, test=develop
      
      * fix, test=develop
      
      * fix
      
      * fix log fatal, test=develop
      
      * fix enforce style, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix rccl, test=develop
      
      * fix test, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix node_num, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix ids str, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      
      * fix style code, test=develop
      Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Ndingsiyu <18369187719@163.com>
      Co-authored-by: NOleNet <olenet@126.com>
      8c7c53b3