- 21 1月, 2022 2 次提交
-
-
由 Aurelius84 提交于
* Migrate Dim and DDim from paddle::framework into pten namespace * fix paddle::framework::Array * fix framework::Array
-
由 Wilber 提交于
* add cpu_context. * update * update * update * update * update * fix ci problem * fix npu ci problem * update * fix ci compile
-
- 20 1月, 2022 1 次提交
-
-
由 Aurelius84 提交于
* Migrate bfloat16/float16/complex from platform into pten::common * fix typo * fix code style
-
- 18 1月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* Merged LoDTensor with Tensor,test=allcases * Patched python level LoDTensor * Patched python level LoDTensor * Merge Tensor into DenseTensor * Fixed namespace issues,test=allcases * Fixed merge issues * Fixed inference issues * Fixed NPU test issues * Fixed merge issues
-
- 17 1月, 2022 2 次提交
-
-
由 Wilber 提交于
* add pten::Place data structure. * update ci problem * fix ci problem * update * using platform::Place=pten::Place * remove BOOST_GET_CONST for CPUPlace and GPUPlace * compile pass 25%. * compile pass 45% * compile pass 60% * remove boost_get for xpu npu mlu and ipu * compile pass on cpu and gpu. * fix compile problem * fix compile error. * update * fix ci problem * update * ci approve * fix ci problem * fix ci eager test problem * remove BOOST_GET_CONST * fix npu compile
-
由 sneaxiy 提交于
-
- 15 1月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* Merged LoDTensor with Tensor,test=allcases * Patched python level LoDTensor * Fixed example code failure * Polished function names, removed duplicated forward declarations
-
- 13 1月, 2022 1 次提交
-
-
由 石晓伟 提交于
-
- 12 1月, 2022 2 次提交
-
-
由 Chen Weihang 提交于
* remove hybird dir * resolve conflit
-
由 limingshu 提交于
* first commit * fix wrong filename * fix the wrong spell name * fix gpu config warper * modify according to pr advices * fix GpuLauchConfig1D api bugs * change the config for dropout grad * fix bugs * modification according to pr advices * modification according to pr advices
-
- 11 1月, 2022 1 次提交
-
-
由 zyfncg 提交于
* refactor matmul directory in pten * fix merge conflict * add dot_grad kernel * add dot_grad kernel in pten * add matmul_grad kernel * update the code * delete useless code in fluid * fix some bug of running matmul grad kernel * fix merge conflict * refactor some code * refactor code
-
- 10 1月, 2022 2 次提交
-
-
由 Zhanlue Yang 提交于
* Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor * Modified framework::Tensor to inherit from DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes * Rearranged cfunction calls from tensor.data<void>() to tensor.data() * Fixed CI issues * Fixed lite issues * Fixed data() interface issues,test=allcases * Resolved IsInitialized() issues * Fixed ResetHolder() issues * Fixed MKLDNN & Storage issues * Resolved ShareBufferWith() issues * Fixed LoD issues
-
由 andyjpaddle 提交于
* add maxunpool3d op * update doc for maxunpool3d op * update doc for maxunpool3d op * update doc for maxunpool3d op * update sample code for maxunpool3d * add maxunpool1d op * update some code for maxunpool1d
-
- 30 12月, 2021 2 次提交
-
-
由 Haohongxiang 提交于
* add cpu kernel of lstsq * update * modify code style * modify unittest * remove support for complex
-
由 zhangkaihuo 提交于
将cuSparse的handle与DeviceContext进行绑定,避免op中进行创建和销毁 添加对cuSparse中dense和sparse转换的API进行封装 添加对封装的API的单测
-
- 24 12月, 2021 1 次提交
-
-
由 zhiboniu 提交于
-
- 20 12月, 2021 2 次提交
- 17 12月, 2021 1 次提交
-
-
由 zlsh80826 提交于
From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block. There are two ways to resolve this problem: Reduce the threads per block launch configuration add __launch_bound__ to give information to nvcc compiler for reducing registers usage this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
-
- 09 12月, 2021 2 次提交
-
-
由 jianghaicheng 提交于
-
由 Chen Weihang 提交于
-
- 08 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* fix CUDA Graph H2D bug again * fix no return bug
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 30 11月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
* support data_format='NHWC' for prelu channel mode
-
- 27 11月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] reorganization for device API abstraction * [NPU] delete old files * [NPU] fix npu_collective_helper * [NPU] fix collective_helper * [NPU] fix ut * [NPU] mod memory allocation and hccl_helper * [NPU] fix place_type * [NPU] split enfoce.h * move acl* call into npu_info * merge conflict * fix merge * merge conflict * merge conflict
-
- 25 11月, 2021 1 次提交
-
-
由 Zhanlue Yang 提交于
* Added GradTensorHolder to Eager Dygraph * Added accumulation codes to Eager Dygraph * Fix windows-ci issue * Fix NPU-CI issue * Fixed CI-Coverage issue
-
- 24 11月, 2021 1 次提交
-
-
由 zyfncg 提交于
* add scalar and scalar_array * remove DenseTensor include from Scalar and ScalarArray * remove inner header from scalar_array * refactor the method of fill_constant and add some comment
-
- 23 11月, 2021 1 次提交
-
-
由 ronnywang 提交于
* Added HCCL backend support in dynamic graph mode * fix segmentation fault * add ut
-
- 09 11月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* try to fix CUDA Graph H2D copy bug * remove useless code * fix ci * fix ROCM CI * fix CUDA_VERSION * improve CI coverage
-
- 29 10月, 2021 1 次提交
-
-
由 zhouweiwei2014 提交于
* add new API: paddle.linalg.triangular_solve * add new API/OP: paddle.linalg.triangular_solve * add new API/OP: paddle.linalg.triangular_solve * fix comment
-
- 28 10月, 2021 1 次提交
-
- 26 10月, 2021 1 次提交
-
-
由 feng_shuai 提交于
-
- 28 9月, 2021 1 次提交
-
-
由 Lijunhui 提交于
* Add paddle.linalg.eig op * remove comments * remove comments * extend batch_size to the origin * add real times complex functor & destroy the backward complex output bug * terminate output diff when input real tensors * correct tiny doc errors * move functions from eig_helper to svd_helper and remove eig_helper * remove tensor.Resize * remove no longer used code * use existing lapack functions * reply review comments 21/27 * remove .cu as this op is only executed on CPU * remove const_cast & add const in argument list for read-only references * fix sample code error in CI * remove template typename Tbase and more * remove eig exposure in paddle.* * add 'name=None' in eig python implementation * handle the unittest * try to solve the unittest * solve CI coverage * remove no longer used code * polish API doc and more * reply review comments * polish unittest, commit plan B * polish unittest
-
- 26 9月, 2021 1 次提交
-
-
由 crystal 提交于
-
- 24 9月, 2021 1 次提交
-
-
由 Weilong Wu 提交于
* Add linalg.solve op, test=develop * Fix a bug caused by accidental deletion * updated description and fix a bug: missing a comma * Add linalg.solve op, test=develop * updated solve op backward logic * updated solve op backward logic again * Add linalg.solve Op, test=develop * Updated and modified to fit CI requirements * Fix a bug * 1)Add more test cases; 2)Fix a wrong usage in reduces operation; 3)Remove redundant code * Remove redundant comments * 1)Removed redundant code; 2)Updated to enhance code robustness * Removed redundant code * Updated API documents
-
- 23 9月, 2021 1 次提交
-
-
由 From00 提交于
-
- 22 9月, 2021 2 次提交
-
-
由 TeslaZhao 提交于
* Pass compat of conv_transpose_bias_mkldnn_fuse_pass * Fix a bug of strided_slice op, about the axes parameter access memory out of bounds * Fix a bug of transpose op, about accessing memory out of bounds of the perm param * op:transpose_op supports bool type
-
由 zhouweiwei2014 提交于
* support extern third_party lapack on Linux/Windows/Mac * fix ci
-
- 21 9月, 2021 1 次提交
-
-
由 Adam Osewski 提交于
* Create stateful OneDNNAXPYHandler object. This makes it possible to call it multiple times without recreating the oneDNN primitives every time. * Prepare SGDOpKernel to reuse its implementation from OneDNN kernel. * OneDNN SGD kernel. * Update call to use new OneDNNAXPYHandler object api. * Setup seed in proper place. * Enable OneDNN kernel only for single case. * For dense param and sparse grad. * Small refactor. * Enable oneDNN by op attr or by cmd line flag. * Use int64_t type for number of elements. * Support dense param and grad from OneDNN kernel. * Enable SGD OneDNN kernel when use MP BF16 optimizer. * Force non-copyable/movable OneDNNAXPYHandler. * Reuse OneDNNAXPYHandler for spare tensors in SUM op. * Fix SFINAE rules. * Remove recording event inside AXPY. * Get rid of internal primitive caching. * Stop use PP cache mechanims to store mem and primitive obj. * Handler obj store and reuse needed desc & prim * Do not derive from MKLDNNHandlerT
-
- 19 9月, 2021 1 次提交
-
-
由 limingshu 提交于
* Optimization of pool2d grad, first commit. * remove useless print codes * refine codes * refine codes * seal more operation into template specialization * fix template struct error in MaxPool2dGrad. * Fix header including error * refine code with comment * Seal the param-preparation codes into function for common use. * Seal the param-preparation codes into function for common use. * Seal the param-preparation into funciton and make it common for other kernels * polish code and erase useless template speicalization * Rerun triger * rerun trigger
-