- 11 4月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Fix scale kernel for low precision, cherry pick #50998. * Fix the FP16 precision problem of add_n. (#50129) * Change squared_l2_norm to reuse ReduceKernel, and register fp16 and bf16 kernel, which is cherry pick #48315. * Cherry-pick the fix of MPTypeTrait in KP, which is implemented in #50993. * Cherry-pick the multi-precision support of AdamW for bf16, #48041. * Fix compiling error. * Cherry-pick the fix of CubTensorReduceImpl for bfloat16 in #50993. * Fix unittest. --------- Co-authored-by: Nliuruyan <44316842+liuruyan@users.noreply.github.com>
-
- 09 4月, 2023 2 次提交
-
-
由 Yiqun Liu 提交于
* Cherry-pick the register of bfloat16 for amp_kernel, pull request #45541. * Cherry-pick the master_grad support of adamw, pull request #51141. * add bf16 for some ops in static mode (#51582) * Add bfloat16 support for some api in static mode. * Fix codestyle. * Revert the change of layer_function_generator.py. --------- Co-authored-by: Shaojie WANG <wsjmessi@163.com>
-
由 Yiqun Liu 提交于
* Register exp/expm1/logit bf16 activation op kernels (#48702) * register more bf16 ops * update to register coresponding backward ops * Addition of bf16 type support for Compare OP (#46413) * first commit * clarify the quotes * change code style format * support bfloat16 * add bfloat16 support for more ops (#48272) * [Bfloat16]register bfloat16 datatype for squared l2 norm (#50908) * Sync the pull request #51903. * Add some header files back. * modify cmake file for cuda11.8 compile (#49020) * modify cmake file for cuda11.8 compile * add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor) * Fix compling error. * Cherry-pick pull request #51396. --------- Co-authored-by: Nsneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: Nlimingshu <61349199+JamesLim-sy@users.noreply.github.com> Co-authored-by: Shaojie WANG <wsjmessi@163.com> Co-authored-by: Nzqw_1997 <118182234+zhengqiwen1997@users.noreply.github.com>
-
- 20 3月, 2023 1 次提交
-
-
由 LiYuRio 提交于
-
- 09 1月, 2023 1 次提交
-
-
由 Haohongxiang 提交于
-
- 29 12月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* cherry-pick 45860 * [BUG FIX]Fix MetaTensor's bug when run infermeta (#46265) * fix sum bug * fix ci bugs * fix ci bugs * update code according comment
-
- 02 11月, 2022 1 次提交
-
-
由 Siming Dai 提交于
-
- 24 10月, 2022 1 次提交
-
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * support pure bfloat16 * support bf16 linear * update PR to pass CI * tiny fix where_grad_kernel.cu * Support bfloat16 type for reducer and sharding. * Fix some bug. * Polish code. * Polise code. * Add bfloat16 datatype in fill_grad kernels. Co-authored-by: Nsneaxiy <sneaxiy@126.com> Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
- 20 10月, 2022 2 次提交
-
-
由 liu zhengxi 提交于
Add value check & error message for gather_tree cherry-pick #47051
-
由 sneaxiy 提交于
support pure bfloat16 for more ops
-
- 17 10月, 2022 2 次提交
-
-
由 Zhang Zheng 提交于
Optimize performance of depthwise_conv Config: input[2048, 1024, 4, 4], filter[1024, 1, 4, 4], stride=1, pad=0, dilation=1
-
由 Zhang Zheng 提交于
为了提升性能,将label的边界检查从python端转移到kernel内,减少额外op的调用,如min、max和同步拷贝等 当前的模板参数IgnoreIndex仅在ignore_index取值范围在[0, dim)时才生效,但是当某个label值超出了边界,ignore_index等于该label,这种情况下是应该仍然能正常计算。虽然当前的计算逻辑在结果上不会出错,但逻辑上仍是有问题的,且模板参数IgnoreIndex是没有必要的
-
- 11 10月, 2022 1 次提交
-
-
由 Feiyu Chan 提交于
-
- 29 9月, 2022 1 次提交
-
-
由 傅剑寒 提交于
Add FP16 support for uniform in dygraph mode on Nvidia GPU Dev PR link PR46212
-
- 27 9月, 2022 1 次提交
-
-
由 zhaoyingli 提交于
-
- 20 9月, 2022 2 次提交
-
-
由 HongyuJia 提交于
* polish code comments * polish data_device_transform.cc
-
由 Jiabin Yang 提交于
* [Eager] Fix ocr (#46124) * fix linspace error in amp * fix log * fix amp error * fix ocr error which caused by amp * add more check * rename dtype ns * [Eager Bug fix]Fix Detection (#46147) * fix linspace error in amp * fix log * fix amp error * Revert "Simplify size op impl (#45808)" This reverts commit c252b1de. * fix_seg * fix detection Co-authored-by: NChen Weihang <sunny_cwh@163.com> Co-authored-by: NChen Weihang <sunny_cwh@163.com>
-
- 19 9月, 2022 3 次提交
-
-
由 RichardWooSJTU 提交于
[vision.ops.nms] Fix return order error and duplicate results with specific inputs (#46148) (#46193) * fix return order error and duplicate results with specific inputs
-
由 sneaxiy 提交于
-
由 Chen Weihang 提交于
This reverts commit c252b1de.
-
- 14 9月, 2022 1 次提交
-
-
由 engineer1109 提交于
修复cuda11.7编译出错的问题
-
- 13 9月, 2022 1 次提交
-
-
由 JingZhuangzhuang 提交于
-
- 09 9月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* simplify size op * trans to cuda manuly * fix copy error
-
- 07 9月, 2022 2 次提交
- 06 9月, 2022 4 次提交
-
-
由 ykkk2333 提交于
-
由 xiaohemaikoo 提交于
-
由 LielinJiang 提交于
* fix grad error of grounorm op when cuda version==11.7
-
由 Wen Sun 提交于
-
- 05 9月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 02 9月, 2022 2 次提交
-
-
由 Yuanle Liu 提交于
-
由 thunder95 提交于
* add dist cuda kernel * reuse some funcs in phi * 使用pnorm * fix code style - explicit * fix code sytle * fix bug * remove unused headers
-
- 01 9月, 2022 2 次提交
-
-
由 HongyuJia 提交于
* copy kernel file to phi * delete some code * migrate uniform_random, test=kunlun * fix input error, test=kunlun * fix gpu register error, test=kunlun * add include file, test=kunlun * try fix error from CI, test=kunlun * polish other PR * fix CI-coverage error, test=kunlun
-
由 Leo Chen 提交于
* refine cmake of framework * add deps for dense tensor * fix deps * remove alloc(ctx) * add depends on mkldnn
-
- 31 8月, 2022 3 次提交
-
-
由 Aurelius84 提交于
* [OpAttr]output_size of unpool support Tensor type * fix coverage * fix contain_var * fix coverage
-
由 Charles-hit 提交于
* fix split bug * solve function redefine * fix fluid.layers.split and add unit test * delete splitInferMeta register in unary.cc * modify test_split_op GPU unit test * modify test_split_op GPU unit test place param * refactor split op and fix infershape bugs * add () in && and || * fix split C++ unit test * fix split infershape
-
由 Li Min 提交于
-
- 30 8月, 2022 3 次提交