- 09 2月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 08 2月, 2022 1 次提交
-
-
由 Wilber 提交于
* gpu_context.. * update * update * update
-
- 06 2月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 27 1月, 2022 1 次提交
-
-
由 Feiyu Chan 提交于
-
- 25 1月, 2022 1 次提交
-
-
由 Weilong Wu 提交于
* Added selected_rows and rw_lock to pten * Renamed the unit test target to fix CI * Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid * Remove rw_lock.h,rw_lock_test.cc in fluid * Use pten::RWLock and pten::AutoRDLock, fix CI * Use pten::SelectedRows * Use pten::SelectedRows * Fix to pass NPU CI * Use pten::SelectedRows, to pass NPU CI * To fix NPU CI * To fix NPU CI again
-
- 24 1月, 2022 2 次提交
-
-
由 YuanRisheng 提交于
[Pten]Refactor elementwise_add grad / double grad / triple grad Kernel and move them to pten (#39048) * refactor elementwise add grad * fix compile bugs * fix unit test bugs * fix file conflicts * fix bugs when buildPtenContext
-
由 z8hanghuan 提交于
* support sparse of adam, *test=kunlun * add pre-commit-config.yaml * support sparse of adam in KL2,*test=kunlun * support sparse of adam in KL2, *test=kunlun * modify xpu.cmake, *test=kunlun * support sparse of adam, rm some wait, *test=kunlun * support sparse of adam, rm some wait, *test=kunlun * support sparse of adam, *test=kunlun * support sparse of adam, *test=kunlun * support sparse of adam, *test=kunlun * support sparse of adam, *test=kunlun * support sparse of adam, *test=kunlun
-
- 21 1月, 2022 4 次提交
-
-
由 chentianyu03 提交于
-
由 Weilong Wu 提交于
-
由 Aurelius84 提交于
* Migrate Dim and DDim from paddle::framework into pten namespace * fix paddle::framework::Array * fix framework::Array
-
由 Wilber 提交于
* add cpu_context. * update * update * update * update * update * fix ci problem * fix npu ci problem * update * fix ci compile
-
- 20 1月, 2022 1 次提交
-
-
由 Aurelius84 提交于
* Migrate bfloat16/float16/complex from platform into pten::common * fix typo * fix code style
-
- 18 1月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* Merged LoDTensor with Tensor,test=allcases * Patched python level LoDTensor * Patched python level LoDTensor * Merge Tensor into DenseTensor * Fixed namespace issues,test=allcases * Fixed merge issues * Fixed inference issues * Fixed NPU test issues * Fixed merge issues
-
- 17 1月, 2022 2 次提交
-
-
由 Wilber 提交于
* add pten::Place data structure. * update ci problem * fix ci problem * update * using platform::Place=pten::Place * remove BOOST_GET_CONST for CPUPlace and GPUPlace * compile pass 25%. * compile pass 45% * compile pass 60% * remove boost_get for xpu npu mlu and ipu * compile pass on cpu and gpu. * fix compile problem * fix compile error. * update * fix ci problem * update * ci approve * fix ci problem * fix ci eager test problem * remove BOOST_GET_CONST * fix npu compile
-
由 sneaxiy 提交于
-
- 15 1月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* Merged LoDTensor with Tensor,test=allcases * Patched python level LoDTensor * Fixed example code failure * Polished function names, removed duplicated forward declarations
-
- 13 1月, 2022 1 次提交
-
-
由 石晓伟 提交于
-
- 12 1月, 2022 2 次提交
-
-
由 Chen Weihang 提交于
* remove hybird dir * resolve conflit
-
由 limingshu 提交于
* first commit * fix wrong filename * fix the wrong spell name * fix gpu config warper * modify according to pr advices * fix GpuLauchConfig1D api bugs * change the config for dropout grad * fix bugs * modification according to pr advices * modification according to pr advices
-
- 11 1月, 2022 1 次提交
-
-
由 zyfncg 提交于
* refactor matmul directory in pten * fix merge conflict * add dot_grad kernel * add dot_grad kernel in pten * add matmul_grad kernel * update the code * delete useless code in fluid * fix some bug of running matmul grad kernel * fix merge conflict * refactor some code * refactor code
-
- 10 1月, 2022 2 次提交
-
-
由 Zhanlue Yang 提交于
* Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor * Modified framework::Tensor to inherit from DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes * Rearranged cfunction calls from tensor.data<void>() to tensor.data() * Fixed CI issues * Fixed lite issues * Fixed data() interface issues,test=allcases * Resolved IsInitialized() issues * Fixed ResetHolder() issues * Fixed MKLDNN & Storage issues * Resolved ShareBufferWith() issues * Fixed LoD issues
-
由 andyjpaddle 提交于
* add maxunpool3d op * update doc for maxunpool3d op * update doc for maxunpool3d op * update doc for maxunpool3d op * update sample code for maxunpool3d * add maxunpool1d op * update some code for maxunpool1d
-
- 30 12月, 2021 2 次提交
-
-
由 Haohongxiang 提交于
* add cpu kernel of lstsq * update * modify code style * modify unittest * remove support for complex
-
由 zhangkaihuo 提交于
将cuSparse的handle与DeviceContext进行绑定,避免op中进行创建和销毁 添加对cuSparse中dense和sparse转换的API进行封装 添加对封装的API的单测
-
- 24 12月, 2021 1 次提交
-
-
由 zhiboniu 提交于
-
- 20 12月, 2021 2 次提交
- 17 12月, 2021 1 次提交
-
-
由 zlsh80826 提交于
From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block. There are two ways to resolve this problem: Reduce the threads per block launch configuration add __launch_bound__ to give information to nvcc compiler for reducing registers usage this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
-
- 09 12月, 2021 2 次提交
-
-
由 jianghaicheng 提交于
-
由 Chen Weihang 提交于
-
- 08 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* fix CUDA Graph H2D bug again * fix no return bug
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 30 11月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
* support data_format='NHWC' for prelu channel mode
-
- 27 11月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] reorganization for device API abstraction * [NPU] delete old files * [NPU] fix npu_collective_helper * [NPU] fix collective_helper * [NPU] fix ut * [NPU] mod memory allocation and hccl_helper * [NPU] fix place_type * [NPU] split enfoce.h * move acl* call into npu_info * merge conflict * fix merge * merge conflict * merge conflict
-
- 25 11月, 2021 1 次提交
-
-
由 Zhanlue Yang 提交于
* Added GradTensorHolder to Eager Dygraph * Added accumulation codes to Eager Dygraph * Fix windows-ci issue * Fix NPU-CI issue * Fixed CI-Coverage issue
-
- 24 11月, 2021 1 次提交
-
-
由 zyfncg 提交于
* add scalar and scalar_array * remove DenseTensor include from Scalar and ScalarArray * remove inner header from scalar_array * refactor the method of fill_constant and add some comment
-
- 23 11月, 2021 1 次提交
-
-
由 ronnywang 提交于
* Added HCCL backend support in dynamic graph mode * fix segmentation fault * add ut
-
- 09 11月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* try to fix CUDA Graph H2D copy bug * remove useless code * fix ci * fix ROCM CI * fix CUDA_VERSION * improve CI coverage
-
- 29 10月, 2021 1 次提交
-
-
由 zhouweiwei2014 提交于
* add new API: paddle.linalg.triangular_solve * add new API/OP: paddle.linalg.triangular_solve * add new API/OP: paddle.linalg.triangular_solve * fix comment
-
- 28 10月, 2021 1 次提交
-