- 24 10月, 2022 1 次提交
-
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * support pure bfloat16 * support bf16 linear * update PR to pass CI * tiny fix where_grad_kernel.cu * Support bfloat16 type for reducer and sharding. * Fix some bug. * Polish code. * Polise code. * Add bfloat16 datatype in fill_grad kernels. Co-authored-by: Nsneaxiy <sneaxiy@126.com> Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
- 21 10月, 2022 1 次提交
-
-
由 JingZhuangzhuang 提交于
* Add infer prune function * add fusion op
-
- 20 10月, 2022 4 次提交
-
-
由 Yiqun Liu 提交于
* Simplify the codes of conv. (#45966) * Enable to record whether the conv algo is got by exhaustive search to fix autotune cache bug. (#47065)
-
由 liu zhengxi 提交于
Add value check & error message for gather_tree cherry-pick #47051
-
由 sneaxiy 提交于
Fix some operators when the tensor.numel() > INT32_MAX
-
由 sneaxiy 提交于
support pure bfloat16 for more ops
-
- 19 10月, 2022 2 次提交
-
-
由 Zhang Ting 提交于
* strided_slice grad add fp16 support
-
由 xiongkun 提交于
* [Dy2Static] Support TypeHint for function decorated by @to_static (#47121) * Add TypeHint Transformer * add unittest for typehint transformer * [Dy2Static] Remove GradTransformer (#47063) * [Dy2Static] Remove GradTransformer 1. fix einsum infershape bugs. 2. remove grad_transformer and unify paddle.grad and paddle.static.gradient. 3. add dygraph_and_dy2static_only decorator for dy2static. * fix bugs * rename
-
- 18 10月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
新增sparse.is_same_shape、sparse.reshape、sparse.transpose 三个API
-
由 Wang Bojun 提交于
* draft with debug print * remove debug print * bug fix for ci
-
- 17 10月, 2022 3 次提交
-
-
由 zhangkaihuo 提交于
cherry-pick : #46322, #46245 Sparse API 支持静态图
-
由 Zhang Zheng 提交于
Optimize performance of depthwise_conv Config: input[2048, 1024, 4, 4], filter[1024, 1, 4, 4], stride=1, pad=0, dilation=1
-
由 Zhang Zheng 提交于
为了提升性能,将label的边界检查从python端转移到kernel内,减少额外op的调用,如min、max和同步拷贝等 当前的模板参数IgnoreIndex仅在ignore_index取值范围在[0, dim)时才生效,但是当某个label值超出了边界,ignore_index等于该label,这种情况下是应该仍然能正常计算。虽然当前的计算逻辑在结果上不会出错,但逻辑上仍是有问题的,且模板参数IgnoreIndex是没有必要的
-
- 13 10月, 2022 2 次提交
-
-
由 傅剑寒 提交于
Fix set_value failure when source tensor is fp16 Dtype and destiny value is a number (dev PR link:#46801)
-
由 Sławomir Siwek 提交于
* Revert pool+grad oneDNN kernel conversion (#45989) * [PHI] transpose2_grad op migration (#46139) * op migrated, Copy(OneDNNContext, ...) added * mutable_data & op registration in fluid removed * refactoring * OneDNNGetDataType to uppercase * missing cpu check added, handler moved to .h file * name changed to transpose_grad * Copy changed back to TensorCopy * Resizing corrected, Copy(OneDNNContext) removed Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com> Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>
-
- 12 10月, 2022 1 次提交
-
-
由 niuliling123 提交于
Cherry-pick 46541 保证Reset50 TSM deeplabv3模型零修改下实现Layout自动调优
-
- 11 10月, 2022 6 次提交
-
-
由 Feiyu Chan 提交于
-
由 Sławomir Siwek 提交于
-
由 Sławomir Siwek 提交于
-
由 Sławomir Siwek 提交于
* [PHI] Migrate gelu kernels (#45596) * gaussian random * mkldnn to onednn renaming * fix merge conflicts * remove fluid code * onednn renaming * gelu fwd * sort activations * gelu gradient * remove unused macros * merge conflicts * fix merge conflicts * remove extra contraint from gelu op * [PHI] relu6_grad kernel (#46501) * Relu6 * remove fluid handler * add individual kernel signature * coding style * replace bounded_relu with clip * whitespace * code style
-
由 Sławomir Siwek 提交于
Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
-
由 YuanRisheng 提交于
* fix concat bug * fix ci bugs * fix ci bugs
-
- 10 10月, 2022 5 次提交
-
-
由 Sławomir Siwek 提交于
[cherry-pick] [PHI] Migrate concat+grad, expand+grad, fill_constant … oneDNN kernels (#45863) (#46727) * [PHI] Migrate concat+grad, expand+grad, fill_constant, nearest_interp and bilinear_interp oneDNN kernels (#45863) * Migrate concat+grad, expand+grad, fill_constant, nearest_interp_v2 and bilinear_interp_v2 oneDNN kernels to PHI * Remove old namespace variable * Fix invalid out dims error * Add mutable_data method to concat output * Add check for -1 dim before computing out_dims * Capitalize oneDNNGetDataType function name * Change fill_constant kernel to correct PHI kernel * Attempt to fix dims error * Fix fill_constant (full) kernel * update dependencies Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
-
由 Sławomir Siwek 提交于
* [PHI] Migrate sgd and stack oneDNN kernels (#46374) * Convert slice+grad oneDNN fluid kernels to PHI * Change mutable_data to Alloc * Refactor licences * update dependencies Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
-
由 Sławomir Siwek 提交于
* Convert split, pad and pad3d kernels * Convert slice+grad oneDNN fluid kernels to PHI * change out->mutable_data to dev_ctx.Alloc Co-authored-by: NPiotr Paturej <48731682+piotrekobi@users.noreply.github.com>
-
由 Sławomir Siwek 提交于
* init * remove softmaxop * merge dev * correct dir * style
-
由 Sławomir Siwek 提交于
* First approach * Shape kernel corrected * Compilation error fixed * Resize corrected * Registered types added * Mistake corrected & types added * sum kernel deleted Co-authored-by: NPaulina Gacek <paulina.gacek.pl@gmail.com>
-
- 29 9月, 2022 3 次提交
-
-
由 傅剑寒 提交于
Add FP16 support for uniform in dygraph mode on Nvidia GPU Dev PR link PR46212
-
由 zyfncg 提交于
* set flag of clip_extra in save_inference_model to true (#46151) * open the clip_extra flag in paddle.static.save_inference_model, test=allcase (#46456) * Open the clip_extra flag in TracedLayer.save_inference_model (#46473) * open the clip_extra flag in paddle.static.save_inference_model, test=allcase * set the defalut value of clip_extra in TracedLayer from False to True, test=allcase * update english doc of paddle.static.save_inference_model, test=document_fix (#46484) * Fix clip_extra logic in remove_training_info (#46534) * fix clip_extra code in remove_training_info * revert rnn opmaker clear
-
由 Lin Manhui 提交于
[CherryPick][Fix] Remove std::trunc() in FloorDivideFunctor and InverseFloorDivideFunctor (#45051) (#46504)
-
- 28 9月, 2022 1 次提交
-
-
由 zyfncg 提交于
[cherry-pick] Clear extra attrs of some ops in OpMaker (#46150, #46321, #46418, #46451, #46457) (#46553) * Clear extra attributes of some Op in OpMaker (Part4) (#46060) * clear extra attr of some ops in opmaker * revert clear use_cudnn for pool * fix test_operator_desc * fix Attr interface of OperatorBase * clear extra attrs of condition op in opmaker (#46150) * Clear extra attrs of lookup_table_v2 in OpMaker (#46321) * clear extra attrs of look_up_table_v2 in opmaker * fix bug * clear extra attrs of quantize op in opmaker (#46418) * delete repeated item * clear extra attrs of distribute op in opmaker (#46451) * clear extra atts of sequence_softmax in opmaker (#46457)
-
- 27 9月, 2022 2 次提交
-
-
由 zhaoyingli 提交于
-
由 zyfncg 提交于
* Clear extra attrs of elementwise op in OpMaker (#45845) * clear extra attrs of elementwise op in opmaker * fix op_debug_string_test * fix bug of grad_add * fix sort of runtime attrs * Clear extra attrs of scale in OpMaker (#45984) * clear extra attr of scale in opmaker * fix sum bug * fix merge conflict * fix minus * Clear extra attributes of some Op in OpMaker (Part4) (#46060) * clear extra attr of some ops in opmaker * revert clear use_cudnn for pool * fix test_operator_desc * fix Attr interface of OperatorBase * fix code stype
-
- 26 9月, 2022 1 次提交
-
-
由 Hui Zhang 提交于
* fix sub sign reverse for mkldnn * refactor code as comment * remove useless
-
- 20 9月, 2022 6 次提交
-
-
由 houj04 提交于
* [XPU] update xdnn activations. (#46246) * [XPU] update xpu cmake. test=kunlun
-
由 HongyuJia 提交于
* polish code comments * polish data_device_transform.cc
-
由 Jiabin Yang 提交于
* [Eager] Fix ocr (#46124) * fix linspace error in amp * fix log * fix amp error * fix ocr error which caused by amp * add more check * rename dtype ns * [Eager Bug fix]Fix Detection (#46147) * fix linspace error in amp * fix log * fix amp error * Revert "Simplify size op impl (#45808)" This reverts commit c252b1de. * fix_seg * fix detection Co-authored-by: NChen Weihang <sunny_cwh@163.com> Co-authored-by: NChen Weihang <sunny_cwh@163.com>
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Cherry-pick of PR 46045 * Fix bug of reduce_sum kp op. * Fix bug of reduce_sum kp operator compilation. If compilation device is XPU, eigen kernel should be ignored.
-
由 WangZhen 提交于
* Fix TransDataBackend Error when call unsqueeze using MKL Tensor * Add UT * Refine UT
-
由 zhangkaihuo 提交于
cherry-pick : #46016, #46021, #45974 * [Sparse]Sparse add support gpu (#45974) * [Sparse]Remove unused code (#46021) * [Sparse] Add infer meta (#46016)
-