- 26 10月, 2021 1 次提交
-
-
由 xiongkun 提交于
[cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO to speed up training (#35745) (#36605) * User specified backend (#35745) * remove tensordot
-
- 25 10月, 2021 8 次提交
-
-
由 WangXi 提交于
* Revert "Add fused_dropout wrapper to ease use. (#36185) (#36640)" This reverts commit 05d7e2fd. * [hybrid] seed and dropout op support force-cpu (#35820) * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid * [HIP] fix op not support AMD GPU bug * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] fix seed ci failed issue * add AsExtra for force_cpu of seed op * Add fused_dropout wrapper to ease use. (#36185) * [hybrid] static model parallel dropout support deterministic RandomSeedGenerator (#36228) Co-authored-by: Nxiayanming <41795079@qq.com> Co-authored-by: NLi Min <11663212+limin2021@users.noreply.github.com>
-
由 Wilber 提交于
-
由 Wilber 提交于
-
由 JingZhuangzhuang 提交于
-
由 wenbin 提交于
-
由 Liu-xiandong 提交于
Add paddle.nn.functional.sparse_attention API 本个PR主要将sparse_attention功能在python层进行了一层封装,OP的主体代码见:#PR35676 此外,对于封装的python 接口,增加了相应的单测。
-
由 zhangbo9674 提交于
Add fp16 kernel for clip_op.
-
由 From00 提交于
* Add new API tensordot cherry-pick #36273
-
- 24 10月, 2021 1 次提交
-
-
由 Jack Zhou 提交于
* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs
-
- 21 10月, 2021 2 次提交
-
-
由 littletomatodonkey 提交于
* fix replicate pad when input size is 0 * add unit test
-
由 0x45f 提交于
* remove no_value using var.name
-
- 20 10月, 2021 2 次提交
- 19 10月, 2021 1 次提交
-
-
由 Liu-xiandong 提交于
The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically. The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR. The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.
-
- 18 10月, 2021 1 次提交
-
-
由 0x45f 提交于
[Cherry-pick][Dy2stat]fix no_grad context error in train mode when using save/load (#36434) (#36463) 修复使用jit.save/load接口加载模型后,在train模式和no_grad上下文中,显存会一直增长的问题
-
- 15 10月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
* [WIP]Verify the correctness of graph rewrited by GeneratePass, test=develop * add delete subgraph and unittest, test=develop * check simple pass, test=develop * fix coverage, test=develop * limit with input_spec via Paddle API, test=develop
-
- 13 10月, 2021 1 次提交
-
-
由 jakpiase 提交于
-
- 12 10月, 2021 1 次提交
-
-
由 Aurelius84 提交于
* Fix stop_gradient in RunProgramOp * fix reference
-
- 11 10月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
对于__getattr__重载后不满足条件的参数,全部抛出AttributeError异常,达到与未重载版本一致。 (cherry picked from PR #36229)
-
- 30 9月, 2021 3 次提交
-
-
由 zhaoyingli 提交于
* update func name * skip cpu * update unittest * update unittest
-
由 李季 提交于
* fix raw optim * pre-commit test file Co-authored-by: Nsneaxiy <sneaxiy@126.com> Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
由 李季 提交于
-
- 29 9月, 2021 2 次提交
-
-
由 Lijunhui 提交于
向PaddlePaddle中的线性代数库添加eig算子,该算子计算一般方阵的特征分解。 cherry-pick 自#35674.
-
由 hlygit66666 提交于
* add get_device_name and get_device_capability * fix docs * fix docs * fix decs
-
- 27 9月, 2021 4 次提交
-
-
由 Yanxing Shi 提交于
* Initial Commit * fix py2 error * fix wrong words and doc * test=document_fix * fix _gpuDeviceProperties
-
由 zhiboniu 提交于
remove recent linalg api in paddle.init; add args 'name' in some new linalg api interface
-
由 LJQ❤️ 提交于
-
由 JYChen 提交于
cherry-pick from #35352 Add new detection api paddle.vision.ops.psroi_pool and paddle.vision.ops.PSRoIPool
-
- 26 9月, 2021 4 次提交
-
-
由 Huihuang Zheng 提交于
This PR added det and slogdet API to release/2.2 It is cherry-pick from #34992 and #36013
-
由 Weilong Wu 提交于
This PR supports linalg.solve calculation for linear algorithm module of Paddle. One may call paddle.linalg.solve to use it.
-
由 ronnywang 提交于
* add randperm_op_npu * fix test_set_value_op_npu
-
由 zhangbo9674 提交于
1、Split function GradScaler::minimize() to GradScaler::step() + GradScaler::update() 2、Add GradScaler::unscale_(optimizer)
-
- 24 9月, 2021 2 次提交
-
-
由 Huihuang Zheng 提交于
Add basic Cost Model, it uses executor to run program and profile it to get op time. This is an early basic version, we will add more functions in the future.
-
由 JingZhuangzhuang 提交于
-
- 23 9月, 2021 3 次提交
-
-
由 crystal 提交于
cherry-pick #35812,修复Eigh OP
-
由 wangguanzhong 提交于
-
由 TeslaZhao 提交于
* Pass compat of conv_transpose_bias_mkldnn_fuse_pass * Fix a bug of strided_slice op, about the axes parameter access memory out of bounds * Fix a bug of transpose op, about accessing memory out of bounds of the perm param * op:transpose_op supports bool type
-
- 22 9月, 2021 2 次提交
-
-
由 baoachun 提交于
-
由 zhangbo9674 提交于
[cherry-pick]increase test_imperative_auto_mixed_precision time PROPERTIES TIMEOUT (#35863) (#35898) Increase test_imperative_auto_mixed_precision PROPERTIES TIMEOUT from 120s to 300s.
-