- 25 7月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
cherry-pick #43626
-
- 19 7月, 2022 1 次提交
-
-
由 chenjian 提交于
* add serialization for new field in event node (#43405) * add serialization for new field in event node * fix a bug * add more field to memory record (#43578) * Add infer shape in dygraph (#43822) * record memory and op supplement info * update * update * fix a bug * fix memory recording * fix a bug * update * update * fix a bug * update * fix a bug * fix a bug * fix a bug * update dygraph record * add infer shape record * fix * fix * fix * add comments * fix a bug * fix * fix * add record op info * fix file mode * add op input shape info * fix dependency
-
- 12 7月, 2022 1 次提交
-
-
由 chenjian 提交于
* add new field for event node * fix * fix bug * fix bug * fix clang * fix clang format * fix code format
-
- 30 6月, 2022 2 次提交
-
-
由 Wangzheee 提交于
* fix emb pass for ernie3.0 * fix emb pass for ernie3.0 * fix emb pass for ernie3.0
-
由 JingZhuangzhuang 提交于
-
- 29 6月, 2022 1 次提交
-
-
由 ronnywang 提交于
* cherry pick 43890
-
- 28 6月, 2022 2 次提交
-
-
由 pangyoki 提交于
-
由 zhoutianzi666 提交于
* elementwise support * commit
-
- 27 6月, 2022 2 次提交
-
-
由 Guanghua Yu 提交于
* update quantization clip and round * fix quantization clip and round Attribute * fix typo
-
由 Chen Weihang 提交于
* Create Tensor by paddle::empty in custom operator (#41840) * create tensor by empty in custom op * fix some bug * update relu custom op demo (#43173) * Fix incompatible error for custom op Placetype (#43749) * fix incompatible error * rmeove default constructor * add macro * fix cpu make error * add DefaultGPUPlace api Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>
-
- 25 6月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* lazy creating work queue * fix dry_run
-
- 24 6月, 2022 2 次提交
-
-
由 Aganlengzi 提交于
* Use all sitepackages path as the library/include path (#42940) * Fix several unit tests and increase the unit tests stability (#43670) * Reduce gather op unit tests size and increase the timeout * Add NVIDIA_TF32_OVERRIDE for multi-processes environment * Remove record test for device event ut * Fix 3 unittest errors (#43532) * Fix test_fuse_resnet_unit failure * Fix test_imperative_auto_mixed_precision failure * Fix sparse_attention_op error * Fix sparse_attention_op error * Use fixed random seed (#43659) * for CI test_collective_sendrecv_api Co-authored-by: Nzlsh80826 <rewang@nvidia.com> Co-authored-by: NShijie <505749828@qq.com>
-
由 wawltor 提交于
-
- 23 6月, 2022 5 次提交
-
-
由 lidanqing 提交于
-
由 zyfncg 提交于
-
由 WJJ1995 提交于
-
由 heliqi 提交于
* cherry pick form develop 43621 * code format * paddle2onnx update to 0.9.8
-
由 lidanqing 提交于
* Correct elementwise quantization (#43693) * [Bug fix] Do not quantize weights Y when matmul X and Y both other ops outputs (#43297) * fix some matmul that X and Y both other ops outputs, do not dequantize the Y. * fix CI format * fix according to review Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>
-
- 22 6月, 2022 6 次提交
-
-
由 ccrrong 提交于
* add bilinear_interp_v2 converter * update op_teller.cc * add unittest for bilinear_interp_v2 converter * code format * bug fix * code format and add unittest * remove merged modify in op_teller.cc * code format * code format * fix scale init error
-
由 xiaoxiaohehe001 提交于
-
由 Yiqun Liu 提交于
cherry-pick #42750。 QA反馈,#42750 优化后,solov2模型性能可提升6%,故cherry-pick到2.3。因#41096 将linspace python实现从fluid.layers.tensor挪到了paddle.tensor.creation下,该pr不在release/2.3分支中,故将#42750 中python修改同步到fluid.layers.tensor.linspace中。
-
由 Zhang Ting 提交于
[cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax (#43719) [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax cherry-pick #43635 #43681 #43474
-
由 Sing_chan 提交于
Only cherry pick format tool(clang-format, yapf, cmake-format) upgrade to release/2.3, lint tool such as cpplint will not move, because we are not going to fix cpplint error in release/2.3 pre_commit.sh also is moved to release/2.3 so that both PR-CI-pre-commit and PR-CI-pre-commit-23 can works. pre install clang-format to avoid repeat installation due to pre-commit's multi-thread running.
-
由 zyfncg 提交于
-
- 21 6月, 2022 4 次提交
-
-
由 Guanghua Yu 提交于
* cherry pick #43088 #40664 * fix clang format
-
由 chalsliu 提交于
* Update CUDA and TensorRT version for CI * disable ut * Update TensorRT for CUDA 10.2
-
由 niuliling123 提交于
删除 layout autotune 中的多余打印 背景 :layout autotune log会导致模型打印信息增多
-
由 zhoutianzi666 提交于
-
- 20 6月, 2022 1 次提交
-
-
由 xiongkun 提交于
* cherry pick from #43397 * fix code
-
- 17 6月, 2022 2 次提交
-
-
由 weishengying 提交于
-
由 WangXi 提交于
* Rename dropout is test (#43098) * replace dropout_is_test with is_test. * improve atol on a100. * fused_attention fused_feedforward api support Model Tensor Parallel (#42985) * fix is_test bug in fused_feedforward. (#43508) Co-authored-by: NLi Min <11663212+limin2021@users.noreply.github.com>
-
- 15 6月, 2022 1 次提交
-
-
由 zyfncg 提交于
* fix bug of strided_slice (#43388) * fix stride_slice bug * fix bug * fix bug of infer shape for slice (#43443)
-
- 14 6月, 2022 1 次提交
-
-
由 xiongkun 提交于
* [EinsumOp] Polish forward logic and backward logic for optimize (#42603) * change logic for optimize * modifty * merge * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010) * [EinsumOp] Make EinsumOp support bfloat16. (#43085) * change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 * make EInsumOP support bf16 * add unittest for BF16 * add condition for test_BF16 * fix bugs * fix * change the backward api to fit einsum op
-
- 09 6月, 2022 1 次提交
-
-
由 zhupengyang 提交于
-
- 08 6月, 2022 3 次提交
-
-
由 niuliling123 提交于
Reduce amax/amin frobenius_norm_kerne原始实现为Eigen实现,文件编译时间较长,因此本PR将其替换为KP实现 删除DefaultElementwiseOperator中重复功能支持,减少elementwise_double_grad OP编译时间
-
由 tianshuo78520a 提交于
删除在2.3 对比whl包大小。
-
由 heliqi 提交于
解决onnxruntime后端依赖的protobuf跟框架或外部protobuf版本冲突问题
-
- 07 6月, 2022 1 次提交
-
-
由 niuliling123 提交于
Delete ElementwiseKernel in BroadcastKernel 减少所有Broadcast中重复功能调用,同时减少编译时间和问题体积
-
- 06 6月, 2022 1 次提交
-
-
由 niuliling123 提交于
删除Broadcast function中rank例化以及Elementwise调用,降低编译时间。 从develop分支中的#42645 PR修改而来,由于develop分支与release分支相差较大,无法实现cherry-pick,因此针对release2.3重新提交PR. Broadcast中关于rank的例化会导致底层模板展开较多,造成reduce_sum_grad_kernel.cu.o文件体积过大,修改后可以降低.o体积及编译时间
-
- 31 5月, 2022 1 次提交
-
-
由 tianshuo78520a 提交于
删除判断build目录大小和预测库大小检查功能。该功能是和develop比较,会存在差异,在release任务中取消判断
-