- 27 1月, 2022 11 次提交
-
-
由 Aganlengzi 提交于
* [Demo] custom kernel based on pten kernel * merge and npu custom work well * del comments * delete other code * fix CUDAContext * fix not found small_vector.h * support NPU * fix NPUContext * fix DeviceContext support * add UT * fix call * add UT * fix * fix for comments and ut * add MACRO control * fix multi input output * support env CUSTOM_DEVICE_ROOT * deal with special cases * fix for Windows * try coverage with test_custom_kernel_dot.py * fix test_custom_kernel_dot * fix test_custom_kernel_dot * fix merge * fix merge * fix CI * update * merge and fix * remove WITH_CUSTOM_KERNEL * fix merge * merge and fix * fix ut * fix ut for mac * add more UT * add more UT * fix
-
由 zhouweiwei2014 提交于
-
由 joanna.wozna.intel 提交于
* Upadate pass in quant2_int8_mkldnn_pass * Back to the previous scale_matmul order * Change place of cpu_quantize_placement_pass
-
由 wenbin 提交于
* shuffle channel pass * add ut * timeout fix * makefile fix
-
由 caozhou 提交于
* update planner * update unitest * update dist matmul * update auto converter
-
由 QingshuChen 提交于
* optimize kunlun/xpu softmax_with_cross_entropy add add unitest *test=kunlun * minor *test=kunlun * minor *test=kunlun * minor *test=kunlun * minor *test=kunlun
-
由 zhangkaihuo 提交于
* fix bug: 1. atten: set the default value of attn_dropout_rate to None 2. ffn: add activation parameter * for pure fp16 * Add a SparseCsrTensor * remove unused functional * remove const * remove SetMemoberTensor * remove non_zero_nums_, the number of non zero elements of each batch can be obtained from the crows * SparseCooTensor * add SetMember * merge upstream; add SetMember * merge upstream * merge upstream; add newline at end of file * add newline at end of file * remove newline at end of file * remove newline at end of file * stash * user pten::framework::make_ddim * user pten::framework::make_ddim * merge upstream; use the latest mutable_data * merge upstream; use the latest mutable_data * return mutable dense tensor
-
由 caozhou 提交于
* update dist param grad for pass * update unitest * update unitests * fix conflict
-
由 Wangzheee 提交于
* Paddle-Inference:fix_concat_slice * Paddle-Inference:fix_concat_slice * Paddle-Inference:fix_concat_slice * Paddle-Inference:fix_concat_slice * [Paddle-Inference]: fix concat slice * [Paddle-Inference]: fix concat slice * [Paddle-Inference]: fix concat slice
-
由 huangxu96 提交于
Support the cases that the indices shape size is larger than the arr shape size
-
由 zhangbo9674 提交于
* add master weight for opt state_dict * check empty of master weight * strict gpu test * refine unittest
-
- 26 1月, 2022 10 次提交
-
-
由 hlygit66666 提交于
* add fuse_relu_depthwise_conv_pass unittest * fix atol and rtol * fix according to review * add FuseBatchNormAddActPass and unittest * Update test_dist_fuse_bn_add_act_pass.py * solve conflict
-
由 Weilong Wu 提交于
* Added selected_rows and rw_lock to pten * Renamed the unit test target to fix CI * Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid * Remove rw_lock.h,rw_lock_test.cc in fluid * Use pten::RWLock and pten::AutoRDLock, fix CI * Use pten::SelectedRows * Use pten::SelectedRows * Fix to pass NPU CI * Selected_Rows inherits from TensorBase * Use pten::SelectedRows, to pass NPU CI * To fix NPU CI * To fix NPU CI again * Use paddle/pten/core/enforce and polish code * Support imperative selected_rows_to_lod_tensor * Polish code
-
由 qipengh 提交于
* [MLU]Add conv2d op * [MLU]fix comment * [MLU]adapt NCHW of conv2d op
-
由 yaozhixin 提交于
-
由 yaozhixin 提交于
-
由 zyfncg 提交于
-
由 Li Min 提交于
* Optimize layer_norm fwd when cols is 1024.
-
由 yaozhixin 提交于
-
由 houj04 提交于
* add sigmoid cross entropy with logits to kl2. test=kunlun * add sigmoid cross entropy with logits to kl2. test=kunlun * follow comments. test=kunlun
-
由 joeqiao12 提交于
-
- 25 1月, 2022 17 次提交
-
-
由 YuanRisheng 提交于
-
由 hlygit66666 提交于
* add fuse_relu_depthwise_conv_pass unittest * fix atol and rtol * fix according to review * Add fuse_bn_act_pass unittest * rm others * add fuse_bn_act_pass
-
由 Zhang Jun 提交于
* [inference] update convert reduce op&ut,test=develop * update * update * update * add int32 support * add int32 support * add comments * trt < 7.0 do not support int32 * test=develop * update * test=develop
-
由 joeqiao12 提交于
* [MLU]add mlu kernel for fill_constant op * delete device_context DEPS
-
由 石晓伟 提交于
-
由 feng_shuai 提交于
-
由 sneaxiy 提交于
* assert _compile_dir include file existence * polish
-
由 fwenguang 提交于
-
由 joeqiao12 提交于
* [MLU]add mlu kernel for concat and split op * delete device_context DEPS
-
由 Yuang Liu 提交于
-
由 Baibaifan 提交于
-
由 Haohongxiang 提交于
* support param groups in grad_clip * update * modify for review
-
由 kuizhiqing 提交于
-
由 TTerror 提交于
-
由 Noel 提交于
-
由 caozhou 提交于
* update reshard for newest completion * update unitest * merge newest
-
由 Zhanlue Yang 提交于
-
- 24 1月, 2022 2 次提交
-
-
由 Tongxin Bai 提交于
* [autograd] static Jacobian pass tests. * [autograd] apply CR suggested changes. * [autograd] more tests. * [autograd] add CPUPlace in tests. * [autograd] bug fixes. * [autograd] reformatted.
-
由 sneaxiy 提交于
-