- 29 11月, 2021 3 次提交
-
-
由 TTerror 提交于
* add expand_v2/expand_as_v2 for kunlun * update expand_as_v2 * update expand_as_v2 * support float16/bool * update xpu.cmake
-
由 piotrekobiIntel 提交于
-
由 wanghuancoder 提交于
* suport fetch lodtensor array, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
-
- 27 11月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] reorganization for device API abstraction * [NPU] delete old files * [NPU] fix npu_collective_helper * [NPU] fix collective_helper * [NPU] fix ut * [NPU] mod memory allocation and hccl_helper * [NPU] fix place_type * [NPU] split enfoce.h * move acl* call into npu_info * merge conflict * fix merge * merge conflict * merge conflict
-
- 26 11月, 2021 2 次提交
-
-
由 zhaocaibei123 提交于
* test * test * rm test * update * update * update * add unittest * update * update save
-
由 Chen Weihang 提交于
-
- 25 11月, 2021 7 次提交
-
-
由 zyfncg 提交于
* add scalar and scalar_array * remove DenseTensor include from Scalar and ScalarArray * remove inner header from scalar_array * refactor the method of fill_constant and add some comment * add fill_constant kernel using ScalarArray * modify some prompt * remove fill_constant kernel with no shape
-
由 furnace 提交于
* [NPU] add int64 support for argsort op * [NPU] delete debug codes
-
由 furnace 提交于
* [NPU] add NPU kernel for prior_box op * [NPU] delete debug codes
-
由 Yiqun Liu 提交于
-
由 Zhen Wang 提交于
-
由 Zhanlue Yang 提交于
* Added GradTensorHolder to Eager Dygraph * Added accumulation codes to Eager Dygraph * Fix windows-ci issue * Fix NPU-CI issue * Fixed CI-Coverage issue
-
由 xiongkun 提交于
* clear LoDTensorArray * fix bugs * fix * fix gpu
-
- 24 11月, 2021 5 次提交
-
-
由 piotrekobiIntel 提交于
* Add second batch of deprecated mkldnn namespace and macro changes * Unlock CI * Fix temporary namespace alias placing
-
由 Aurelius84 提交于
-
由 YuanRisheng 提交于
* elementwise_mul refactor * perfect code in test * delete redundant code * fix bugs when run test_multiply * adjust the location of macro * fix bugs when run ci
-
由 zyfncg 提交于
* add scalar and scalar_array * remove DenseTensor include from Scalar and ScalarArray * remove inner header from scalar_array * refactor the method of fill_constant and add some comment
-
由 feng_shuai 提交于
-
- 23 11月, 2021 6 次提交
-
-
由 Qi Li 提交于
* [XPU] Reorganize xpu device codes in platform, test=develop * fix xpu_header.h, test=develop
-
由 Li Min 提交于
Add support for bias is none for fused_attention op.
-
由 sneaxiy 提交于
* enhance scatter err msg check * fix ci error
-
由 YuanRisheng 提交于
* elementwise_div refactor * fix compile bugs in windows ci
-
由 ronnywang 提交于
* Added HCCL backend support in dynamic graph mode * fix segmentation fault * add ut
-
由 Aurelius84 提交于
* Add transfer_layout/dtype op * clean useless codes * fix unused var * add optest in white.txt * split into data_transfer.cc * fix cmake * modify according reviewer comment * replace cast_op with transfer_dtype_op
-
- 22 11月, 2021 6 次提交
-
-
由 Feiyu Chan 提交于
* disable copying of datatype when sharing buffer between two tensors. * fix for mkldnn operator kernels (elementwise_add, sum, softplus, softmax, scale, activation), mannually set the data type when reusing memory by ShareBufferWith.
-
由 andyjpaddle 提交于
* add isclose op, test=develop * add isclose op, test=develop * add isclose api, test=develop * rm useless code * rm useless code * update python api of isclose * add some unittest of isclose op, test=develop
-
由 zhupengyang 提交于
-
由 zyfncg 提交于
* support zero dim for slice op * support zero dim Tensor in set_value op * polish some debug log
-
由 chentianyu03 提交于
* add cast kernel * add cast cuda kernel * add cast kernel * make cast kernel output dtype undefined * get cast dtype from vardesc * move cast to manipulation and add test case * add castinfershape * avoid reinitilaze variable * InitializeVariable support datatype * merge develop branch * fix merge bug * revert modify initializeVariable * revert modify on InitializeVariable * revert modify on InitializeVariable * mutable support reset dtype * enable make pten tensor from variable when def_arg.type is undefined * fix build pten ctx start_idx error * copy pten out tensor to variable * merge develop branch * fix non pten kernel cast failed * add reset allocation place for remake tensor * fix inplace realloc error * add mutable on pten kernles and remove unused cast files * rename function names * fix output type error * fix conflict with develop branch * set data type to variable with pten's dtype * fix test_cast_api type mismatch * densorTensro mutable_data support 0 bytes value * fix the inplace bug of reshape kernel * fix pten.backend != variable.place when moving storage, palce mismatch bug * fix conflict with develop branch * Fix bug of paddle::experimental::MovesStorage * fix ReMakePtenDenseTensor place mismatch bug * Revert "fix ReMakePtenDenseTensor place mismatch bug" This reverts commit 86336032f60b8a15eacd2c1ff2fa513f5d8dfd1a. * fix ReMakePtenDenseTensor place mismatch bug * reverts the set_lod interface, test=develop * modify by the review options * modify error message * add & for const input arguments * add reference in params * elementwise_sub add mutable_data * fix ResetHolderWithType check size bug * add dependence pten_tensor to test_cast_api object * remove unused code to pass ci coverage Co-authored-by: NChen Weihang <chenweihang@baidu.com> Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com> Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 Leo Chen 提交于
-
- 19 11月, 2021 6 次提交
-
-
由 lilong12 提交于
-
由 jiangcheng 提交于
* optimize cache-key by replace GraphToProgram to Dot string * fix compile failure bug
-
由 wuhuanzhou 提交于
* GeneratePass support attr condition and mapping, test=develop * fix coverage, test=develop * Add fuse_resnet_unit pass, test=develop * fix CI errors, test=develop * fix CI errors, test=develop * fix unittest error when compiling without CUDA, test=develop * fix static ci error, test=develop * limit kernel size must equal 1, test=develop
-
由 Feiyu Chan 提交于
-
由 Siming Dai 提交于
* add cpu version, using set: sum, min, max * add cpu version: mean * improve cpu code and fix dynamic memory allcation problem * fix arg error, add index judge, delete fp16 * fix bug in CudaAtomicMax and CudaAtomicMin * add CUDA version * fix grad_op bug for index * add op test, add correct cpu grad op * Add correct CUDA Mean grad * [Add] Successful MEAN and SUM * [Add] Successful MIN and MAX in CPU * [Add] Successful MIN and MAX in CUDA * fix windows dtype ci * fix ROCM ci by adding HIP flag * rename fused_gather_scatter to send_recv * unify name as send and recv * change zero index return time * add send_recv incubate api * fix index data type, add unittest case for API * delete redundant input tensor * fix en example and docs, add default value in pool_type * add shape judge and max grid judge * fix comment * fix index type bug * add const & * fix en docs * delete numpy in examples * add unittest for int input * fix send_recv comment * change send_recv to graph_send_recv
-
由 LiYuRio 提交于
-
- 18 11月, 2021 4 次提交
-
-
由 Li Min 提交于
* fix bug to support dropout eval grad computing. * Remove useless code.
-
由 YuanRisheng 提交于
* elementwise_add kernel refactor * fix compile bugs in elementwise_add refactor * fix compile bugs when run in npu/xpu * fix bugs when run unit test * fix bugs when run ci-windows * modify code as recommended * code format adjust * fix bugs when run ci * fix compile bug when run in ci-windwos * elementwise_sub refactor * add PD_DLL_DECL for elementwise_sub * fix bugs when compilei
-
由 Zhen Wang 提交于
* Add the `GetFetchNames` method in CinnGraphSymbolization. * Use unordered_set instead vector as the type of fetch_var_names. * Reuse the definition of kCompilationKey. * Use CompileOptions to set fetch_var_ids. * Update the argument passing of GraphCompiler.Build. * Fix some bugs in CinnGraphSymbolization::GetFetchIds.
-
由 zhangkaihuo 提交于
topk中有cub和手写kernel两种实现,而cub是通过排序来获取topk,通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。
-