- 24 12月, 2021 13 次提交
-
-
由 chentianyu03 提交于
* combine reduce_cuda codes * support float16 in pten redcue_mean * replace ReduceCudaKernel impl with pten reduce impl * mv reduce funcs into reduce_cuda_impl * rm unsed codes and headers * mv GetReduceDim into reduce_cuda_impl * recover GetReduceDim in reduce_op.h * add new dispatch macro * fix pool op output not inited and cause transform to pten::denseTensor error * fix output tensor not initialized error * rename new dispatch macro and format code style * rm reduce_functor_op.h file
-
由 Zhanlue Yang 提交于
[Unify Tensors PR #1] Replaced pten::Allocation with shared_ptr<memory::Allocation> for Storage (#38301) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage
-
由 zmxdream 提交于
* remove pre-init id in common_sparse_tabl.cc
-
由 zhouweiwei2014 提交于
* add new API/OP:paddle.Tensor.exponential_ * fix CI
-
由 努力努力在努力丶 提交于
* [MLU]add mlu op interface * [MLU]fix alpha of activation op
-
由 yaoxuefeng 提交于
add pull gpups sparse op
-
由 Baibaifan 提交于
-
由 王明冬 提交于
-
由 Chen Weihang 提交于
-
由 zhiboniu 提交于
-
由 zhouweiwei2014 提交于
* add new API/OP:paddle.poisson * fix comment
-
由 baoachun 提交于
* add conv+hard_sigmoid fuse pass ut * update conv_elementwise_add_mkldnn_fuse_pass ut * update conv_hard_sigmoid_mkldnn_fuse_pass ut * update conv+hard_sigmoid and conv+hard_swish fuse pass ut * update ut * update ut
-
由 Jiabin Yang 提交于
* Rearranged Eager AutoCodeGen directory structure * Removed USE_OP in Eager AutoCodeGen * Enabled generation for Operators without Grad/Inputs/Outputs * Resolved operators without input * Fixed merge conflicts * Enabled Eager AutoCodeGen for 10+ more operators * Refactored Eager AutoCodeGen with more organized helper objects * Enabled Eager AutoCodeGen for operators with multiple OpBases * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen * Adjusted function generation/call between Python-C API & Dygraph API * Synchronized auto-generated Python-C API with Dygraph Forward Functions * support more eager tensor api * fix merge compile error * fix compile error and fit develop code * support pure CPU * fix some logic error in eager_mode * support _varbase_creator in eager mode * Added safe_initialized interface to EagerTensor for use in processing dispensable inputs * for eager mode * refine * support multiple constructor for eager tensor * add place related code * polish code * specific randint with dtype of int64 * Support pure cpu test * eager logic * refine test in pure cpu * eager logic * eager logic * eager logic, test=develop * skip core.eager when in inference, test=develop * refine, test=develop * refine, test=develop * call RetainGrad after run forward kernel, test=develop * refine, test=develop * support dygraph util, meta, guard test * support inference test * refine test and fix initializer failed Co-authored-by: Njim19930609 <jim19930609@gmail.com> Co-authored-by: NWang Huan <wanghuan29@baidu.com>
-
- 23 12月, 2021 16 次提交
-
-
由 Chen Weihang 提交于
-
由 Jacek Czaja 提交于
* First set of fixes * - Make more likely to GetBlob find a blobs * - Lint
-
由 Sing_chan 提交于
* block warning when build demo_ci and infer_ut * use build pipe line clone to test
-
由 yaoxuefeng 提交于
add mem pool
-
由 wuhuanzhou 提交于
* add erfinv API, test=develop * fix gradient accuracy error, test=develop * fix cuda compilation error on Windows, test=develop * fix M_2_SQRTPI undeclared identifier on Windows, test=develop
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * update EventsWater * fix * split workqueue files * add more tests * fix * bugfix * bugfix * update Co-authored-by: Nliutiexing <liutiexing@google.com>
-
由 zyfncg 提交于
* add empty and empty_like kernel in pten * add empty dev_api
-
由 Wilber 提交于
* support external stream. * update * update * update
-
由 houj04 提交于
-
由 baoachun 提交于
* add mkldnn conv_elementwise_add_mkldnn_fuse_pass ut * update mkldnn conv_elementwise_add_mkldnn_fuse_pass ut * update conv_elementwise_add_mkldnn_fuse_pass ut * update conv_elementwise_add_mkldnn_fuse_pass ut * update conv_elementwise_add_mkldnn_fuse_pass ut * restrict conv2d data_format in conv_elementwise_add_mkldnn_fuse_pass * update conv_elementwise_add_mkldnn_fuse_pass OpCompat * update conv_elementwise_add_mkldnn_fuse_pass ut * update ut
-
由 王明冬 提交于
-
由 zhouweiwei2014 提交于
* add new API: paddle.clone;Tensor.element_size;nn.utils.parameters_to_vector * fix comment
-
由 heliqi 提交于
* add flatten2_matmul squeeze2_matmul reshape2_matmul test case * modify skip func to ignore_pass_case func * rebuild CI * add test_xx_matmul_fuse_pass timeout * add test_map_xx_pass timeout * add max_duration of test cast * add trt skip * add timeout * del commented code
-
由 Chen Weihang 提交于
-
由 Chen Weihang 提交于
* move dot kernel impl * remove needless cmake items
-
由 石晓伟 提交于
* updates the pten allocation, test=develop * avoids an error message, test=develop
-
- 22 12月, 2021 11 次提交
-
-
由 tianshuo78520a 提交于
-
由 crystal 提交于
* optimize gelu backward * optimize gelu backward * optimize code * Number to expression * Replacement number
-
由 Yang 提交于
-
由 Chen Weihang 提交于
* change functions to funcs * remove useless code
-
由 Chen Weihang 提交于
* add pten kernel cmake * add pten kernel cmake function * fix compile error * add enforce include for full kernel * fix compile failed * change cuda to gpu * fix cmake function error
-
由 baoachun 提交于
* add mkldnn reshape_transpose_matmul fuse pass ut and op version check * update reshape_transpose_matmul_mkldnn_fuse_pass ut * update ut
-
由 baoachun 提交于
* update mkldnn batch_norm_activation fuse pass ut * update ut * update mkldnn batch_norm_act_fuse_pass ut * update batch_norm_act_fuse_pass ut * update ut
-
由 王明冬 提交于
-
由 LiYuRio 提交于
-
由 Chen Weihang 提交于
-
由 YuanRisheng 提交于
* move flatten * fix bugs of test * modify header file * add copy declare * fix compile bugs
-