1. 10 9月, 2021 1 次提交
    • H
      add cumprod op (#35185) · 4e509f46
      hlygit66666 提交于
      * add test_cumprod_op
      
      * Revert "add test_cumprod_op"
      
      This reverts commit c96cf6dff5d09ae7d8cc72c1e8ae4369a153aa19.
      
      * recommit
      
      * add error message
      
      * test input(x) initialize
      
      * test use cpu
      
      * update test code
      
      * add test type
      
      * add test case
      
      * solve ci problem
      
      * add complex case test
      
      * add complex case test
      
      * fix review problem
      
      * fix conflict
      
      * fix some docs
      
      * change test case
      
      * change test case
      
      * fix review problems again
      
      * fix docs
      
      * fix inclusivescan bug
      4e509f46
  2. 02 9月, 2021 1 次提交
    • X
      Add SVD Op and it's GPU and CPU kernel (#34953) · 7e5fb462
      xiongkun 提交于
      * Add SVD Op and it's GPU and CPU kernel
      
      * Remove CUDAPlace in test_svd_op, make the test available in CPU package
      
      * modfity the file
      
      * fix windows bug/ fix ROCM / fix test timeout
      
      * for pass the CIs
      
      * improve error report
      
      * for code review
      
      * some modification to test_svd_op
      
      * change python code style
      
      * expose the svd interface for document
      7e5fb462
  3. 31 8月, 2021 1 次提交
    • Z
      New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d
      Zhanlue Yang 提交于
      [Background]
      Expansion in code size can be irreversible in the long run, leading to huge release packages which
      not only hampers user experience but also exceeds a hard limit of pypi.
      
      In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
      arches supported.
      
      This PR aims to prune this NV_FATBIN.
      
      [Solution]
      In the new release strategy, two types of whl packages will be involved:
      
      Cubin PIP package:
      PIP package maintains a smaller window for GPU arches support, containing
      sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches
      
      JIT release package:
      This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
      compute_70, compute_75, compute_80, with best performance and GPU arches coverage.
      
      However, it takes around 10 min to install due to the JIT compilation.
      
      [How to use]
      The new release strategy is disabled by default.
      To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
      To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL
      2f3b393d
  4. 20 8月, 2021 1 次提交
  5. 18 8月, 2021 1 次提交
    • Z
      Add function to disable paddle signal handler (#34577) · dd533dd3
      Zhanlue Yang 提交于
      * Add function to disable paddle signal handler
      
      Paddle used google::InstallFaultSignalHandler to handle selected system signals,
      mainly for debugging and bug report purposes.
      
      However, this can be conflicted with other python packages whoever captures similar signals.
      Such python package involves tvm and more
      
      To resolve this issue, we support a function to disable signal handler
      
      * Remove signal test from WIN32 platform
      
      * Remove redundant return from disable_signal_handler() function
      
      * Add detailed messages to en_doc
      dd533dd3
  6. 16 8月, 2021 1 次提交
    • D
      add unique_consecutive_op (#34334) · 875cfd57
      duanboqiang 提交于
      * add unique_consecutive_op
      
      * add unique_consecutive_op
      
      * add unique_consecutive_op
      
      * add unique_consecutive_op
      
      * add unique_consecutive_op
      
      * add unique_consecutive_op
      
      * add unique_consecutive_op
      
      * add unique_consecutive_op
      
      * remove unity build
      
      * add unique_consecutive op
      
      * add unique_consecutive op
      
      * add enable static
      
      * add noqa
      
      * add space line
      
      * add default case.
      
      * add comma
      
      * add space line
      
      * modify unique_consecutive unittest
      
      * optimize ut coverage
      
      * rebase develop
      
      * improve coverage
      
      * update en docs
      
      * update en docs
      
      * update en docs
      
      * update en docs
      
      * update en docs
      
      * update en doc
      875cfd57
  7. 13 8月, 2021 1 次提交
    • T
      New Einsum API (#33821) · 8c8667f0
      Tongxin Bai 提交于
      * OP dot: refactor CPU kernels and get better loop performance.
      
      * Minor fix on code format.
      
      * Fixed minor errors.
      
      * Add new API: einsum
      
      * Update the Einsum unit test.
      
      One case failed with matmul_v2, where the dtype is int64:
      
      a = np.arange(2 * 3 * 1).reshape(2, 3, 1)
      b = np.arange(1)
      paddle.einsum("...i, ...i", a, b)
      
      * Test cases in test_einsum test floating point dtypes only.
      
      As of now Paddle only supports float/double dtypes in matmul, which is
      one of building blocks of this Einsum implementation. We decide not to
      test einsum against other dtypes.
      
      * Polish format.
      
      * More formatting.
      
      * Format...
      
      * Einsum: improve test coverage.
      
      * Einsum: bug fixes and more testcases for testing error messages
      
      * Einsum: fix format..
      
      * Einsum: fixed typo and format.
      
      * Einsum: format again...
      
      * Einsum: applied suggested changes.
      
      * Einsum API: improve API documentation.
      
      * Einsum API: apply suggested changes.
      
      * Einsum API: Add dygraph only note.
      
      * Einsum API: Add dygraph only note.
      
      * Einsum API: fixed unittest.
      8c8667f0
  8. 28 7月, 2021 1 次提交
  9. 19 7月, 2021 1 次提交
    • C
      Add Cuda event and stream API (#32460) · 9c7f6af5
      chentianyu03 提交于
      * add cuda event and stream api
      
      * add cuda event and stream api
      
      * add get_current_stream api
      
      * add get_current_stream api
      
      * init streams
      
      * modify get_current_stream
      
      * modify get_cuttent_stream
      
      * add synchronize func
      
      * add current_stream doc and test file
      
      * move get_current_stream into CUDA macro
      
      * move CudaEvent into CUDA macro
      
      * move _get_current_stream and _device_synchronize into cuda macro
      
      * modify the macro of cuda stream and event
      
      * add test case for synchronize
      
      * add paddle.devices.cuda module
      
      * event and stream support hip
      
      * add doc for stream and event class
      
      * move cuda stream and event into single pybind
      
      * add cuda_streams_py.cc to cmakelist
      
      * add _device_synchronize and _get_current_stream to core module
      
      * add test case for cudastream and cudaevent
      
      * move __all__ in streams.py
      
      * fix test fail
      
      * add cuda to devices __all__
      
      * fix current_stream doc writing error
      
      * move devices to device direction, and merge device.py into __init__.py
      
      * add required:gpu to sample codes
      
      * remove cuda direction from device/__init__.py
      9c7f6af5
  10. 12 7月, 2021 1 次提交
  11. 23 6月, 2021 1 次提交
  12. 22 6月, 2021 1 次提交
    • Z
      [API/OP]Add a new API paddle.diagonal (#33586) · ad106290
      zhangbo9674 提交于
      * new api diagonal, test=develop
      
      * add new api diagonal, test=develop
      
      * new api diagonal, test=develop
      
      * add new api paddle.diagonal, test=develop
      
      * use framework::stride replace ComputeDimStride
      
      * replace cudaMalloc/cudaMemcpy by TensorFormVector in cudaKernel and cudaGradKernel
      
      * perfect funciton: when attr(offset) is exceed attr(axis1) or attr(axis2), set the diagonal dim is 0
      
      * fix RP-Mac-CI bug: replace framework::stride() by ComputDimStride.
      
      * perfect code-block
      
      * perfect code of python API diagonal
      
      * api supports dtype of float16 and bool
      
      * api supports dtype of float16 and bool
      
      * modify unittest code
      
      * modify unittest code
      
      * perfect dtype describe
      
      * perfect code-block
      ad106290
  13. 21 6月, 2021 1 次提交
  14. 17 6月, 2021 1 次提交
  15. 16 6月, 2021 2 次提交
  16. 15 6月, 2021 1 次提交
    • Z
      Add digamma_op and unittest (#33278) · 02a6d49a
      zyfncg 提交于
      * Add digamma_op and unittest
      
      * add digamma_op api
      
      * remove special DigammaCudaKernel and correct some docs
      
      * remove unused headers
      
      * fix api doc error
      02a6d49a
  17. 11 6月, 2021 2 次提交
  18. 09 6月, 2021 2 次提交
  19. 27 5月, 2021 1 次提交
  20. 07 5月, 2021 1 次提交
    • Z
      remove packages in __all__ (#32759) · a77ade0e
      zhiboniu 提交于
      * [OPs] Bug fix, fix the segment mean for illegal syncthreads usage. (#32596) (#32610)
      
      * [OPs] Bug fix, fix the segment mean for illegal syncthreads usage.
      
      * remove packages in __all__
      
      * create new public api level paddle.callbacks;paddle.hub;paddle.utils.unique_name
      Co-authored-by: NZhong Hui <zhonghui.net@gmail.com>
      a77ade0e
  21. 27 4月, 2021 2 次提交
  22. 25 4月, 2021 1 次提交
  23. 24 4月, 2021 1 次提交
  24. 22 4月, 2021 2 次提交
  25. 14 4月, 2021 1 次提交
  26. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  27. 01 4月, 2021 1 次提交
    • C
      add custom init grad for backward function (#31540) · 83b953f5
      chentianyu03 提交于
      * add custom init grad for backward function
      
      * add custom init grad for backward function
      
      * handle when the grad_tensor is none
      
      * handle when the grad_tensor is none
      
      * fix the args type error on windows platform
      
      * modify the args order and doc
      
      * format code
      
      * add grad_tensor to xpu
      
      * modify the grad_tensor type check
      
      * add paddle.backward api to support multi tensors gradient compute
      
      * add paddle.backward api to support multi tensors gradient compute
      
      * add paddle.atuograd module and backward api
      
      * change tensor.backward func args
      
      * modify tensor backward api
      
      * remove create_graph intputs args
      
      * add doc and examplex code for backward api
      
      * when have the same tensor, throw error
      
      * modify test Init func args
      
      * modify the execute.Init func args in test files
      
      * add paddle.autograd package in setup.py.in
      
      * modify error msg, remove _run_backward method in class Tensor
      
      * add test cases for backward api
      83b953f5
  28. 15 1月, 2021 1 次提交
    • P
      Add Inplace strategy (Output reuse Input Varbase) in dygraph (#30103) · 13d75736
      pangyoki 提交于
      * add view strategy on squeeze,unsqueeze,reshape,flatten
      
      * add squeeze unittest
      
      * add unittests
      
      * use View strategy as name rather than Reuse Allacation
      
      * fix view api doc
      
      * fix format
      
      * use core.ops when input of reshape2 is Tensor
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * fix test_cross_entropy_loss error because of reshape2
      
      * add inplace strategy
      
      * add elementwise_add sub
      
      * let backward op not use inplace
      
      * grad op do not use inplace
      
      * fix memory increase error and add leaf error message
      
      * delete selected_rows
      
      * change op_function
      
      * little change
      
      * solve HandleViewBetweenInputAndOutput
      
      * add unittest and leaf error message
      
      * merge view error
      
      * optimize op_function_generator format and support sum inplace op
      
      * fix format of basic_engine
      
      * fix format for framework
      
      * little change of variable wrapper
      
      * add reshape, squeeze, unsqueeze, scatter api
      
      * add relu elu tanh softmax inplace api
      
      * fix test_squeeze_op unittest
      
      * fix test_relu_op unittest
      
      * fix comment problems
      
      * delete sample code of inplace api
      
      * add reference of grad_pending_nodes in basic_engine
      
      * fix unittest name
      
      * add inplace apis into wlist
      
      * fix error message
      
      * add PADDLE_ENFORCE for set grad op twice
      
      * fix head file error
      13d75736
  29. 07 1月, 2021 1 次提交
  30. 17 12月, 2020 2 次提交
    • C
      add conj op for complex types (#29527) · 71063b81
      chentianyu03 提交于
      * add conj op for complex types
      
      * add conj for complex types
      
      * add more test case
      
      * add conj_op test
      
      * modify conj api and impl
      
      * add complex type for fill_constant_op xpu
      
      * add setConstant for complex type
      
      * remove complex conj test file
      
      * user define grad for test_conj_op
      
      * add test case for static mode of conj api
      
      * modify conj doc
      
      * change input args name to x
      
      * remove useless codes
      
      * conj support real types
      
      * add conj test case for real number
      71063b81
    • C
      [Complex] Add real & imag op and api for complex tensor (#29672) · 6cfa59de
      Chen Weihang 提交于
      * add complex real op & api & unittest
      
      * add imag op & api & unittest
      
      * refactor op impl
      
      * revert simplify writing due to complile failed
      
      * polish details
      
      * polish grad op code
      6cfa59de
  31. 09 12月, 2020 2 次提交
  32. 07 12月, 2020 1 次提交
  33. 01 12月, 2020 1 次提交