- 11 2月, 2022 5 次提交
-
-
由 Feiyu Chan 提交于
* move operators/math/math_function_* to pten/kernels/func * namespace from `paddle::operators::math` to `pten::funcs`
-
由 Zhang Zheng 提交于
* Optimize performance of softmax_bwd when axis!=-1 * fix * fix * fix * fix
-
由 Lijunhui 提交于
* bilinear_fw init * optimize code * pre-compute linear_interp input index
-
由 Chen Weihang 提交于
* move grad get expected pten kernel args * fix reduce sum error * fix element_sub_grad failed * revert kernel judge change
-
由 Zhang Ting 提交于
* improve backward performance * support different dtypes for elementwise ops
-
- 10 2月, 2022 7 次提交
-
-
由 fwenguang 提交于
* [MLU] add mlu kernel for accuracy op * fix license format * fix error message
-
由 furnace 提交于
[NPU] add reduce_min
-
由 hong 提交于
* move masked select cpu kernel * add masked selected gpu kernel; test=develop * fix bugs; test=develop * bug fix; test=develop * bug fix; test=develop * add namespace to set mask array; test=develop * fix bug; test=develop * fix bugs; test=develop * fix ddim bug; test=develop * fix npu op bug; test=develop * fix xpu dependecy bug; test=develop * move kernel args to sig.cc; test=develop
-
由 crystal 提交于
* optimize conv1d forward * add conv opt * Optimize memory copy * delete share data with * set num_filters=512 * add nlc optimize * Optimize num_filter=512 data on A100 and V100 * Fix the workspace_size size setting of filter
-
由 zhangbo9674 提交于
* add squeeze unsqueeze stack * add unittest * add cpu kernel
-
由 zhangbo9674 提交于
* add dropout * add reshape * add slice * refien slice unittest * refine slice unittest * add cpu bf16 kernel
-
由 Leo Chen 提交于
* update isnan registration * fix compile
-
- 09 2月, 2022 13 次提交
-
-
由 Zhang Zheng 提交于
* Optimize performence of softmax_fwd when axis!=-1 * use functor * support hip * fix functor
-
由 niuliling123 提交于
-
由 mhhhh1 提交于
-
由 fwenguang 提交于
-
由 fwenguang 提交于
-
由 fwenguang 提交于
-
由 Jiabin Yang 提交于
* merge legacy to fluid * Remove legacy code * Remove legacy code * Remove DataType test * Using Tensor directly instead of using EagerTensor * support gradient_accumulation * make test_imperative_lod_tensor_to_selected_rows longer * make test_imperative_lod_tensor_to_selected_rows longer
-
由 Yiqun Liu 提交于
-
由 hong 提交于
* add trace op * bug fix * bug fix; test=develop * thrust bug fix; test=develop * remove useless register; test=develop * fix bug; test=develop * update trace kernel; test=develop * move kernel args to trace_sig; test=develop
-
由 Chen Weihang 提交于
-
由 sneaxiy 提交于
-
由 huzhiqiang 提交于
-
由 hong 提交于
* add norm cpu * update code; * norm bug fix * move norm op to pten; test=develop * move norm op to pten; test=develop * add norm util; test=develop * fix norm npu bug; test=develop * fix norm kernel bug; test=develop * move kernel args to pten; test=develop * move kernel args to pten sig; test=develop
-
- 08 2月, 2022 7 次提交
-
-
由 sneaxiy 提交于
* add more int id type support for embedding * add ut * add more ut * fix ci error
-
由 Yiqun Liu 提交于
-
由 Jacek Czaja 提交于
* - 38126 potential fix * - fix * - build fix * - another candidate fix * - compilation fix * - another fix * - Fix to activation of NHWC being first oneDNN op in chain on oneDNN ops * - compilation fix * - added NHWC reotating for elementwise being first op * - compilation fix * - compilation fix * - Added UT * - cosmetic fixes
-
由 zhangbo9674 提交于
* add concat & split * add concat kernel * add concat unittest * add split unittest
-
由 Wilber 提交于
* gpu_context.. * update * update * update
-
由 niuliling123 提交于
* Replace clip, bce_loss, full and full_like with elementwise
-
由 Chen Weihang 提交于
* adapt selectedrows in execution * impl selected rows branch * support selectedrow in infershape utils * fix device compile failed * fix new exe test failed * revert some changes
-
- 07 2月, 2022 2 次提交
-
-
由 tanzhipeng 提交于
-
由 jakpiase 提交于
* Added adam kernel * CI rerun
-
- 06 2月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 04 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
-
- 02 2月, 2022 1 次提交
-
-
由 Jiabin Yang 提交于
-
- 30 1月, 2022 1 次提交
-
-
由 fwenguang 提交于
-
- 29 1月, 2022 2 次提交
-
-
由 Li Min 提交于
* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op. * Remove useless code. * Remove useless code. * Optimize layer_norm fwd when cols is 1024. * Remove useless code. * Minors. * Minors. * Modifications accordding to reviews. * Minors. * Optimize layer_norm bwd kernel when cols is 1024. * Polish layer_norm_bwd_1024 kernel. * Limit ln_bwd_1024_kernel to paddle_with_cuda. * Fix double type compile error. * Add optimization of ln bwd for fused_dropout_add_ln op. * Polish codes.
-
由 Chen Weihang 提交于
* open header for custom kernel * add core utils * tidy core code * tify header * tidy include * tidy namespace * resolve conflit * fix unittest and coverage * remove platform using * resolve conflict * resolve conflict * fix digamma namespace error * fix xpu full kernel error * fix xpu full kernel error * polish details * add place for lib storage
-