- 26 10月, 2021 10 次提交
-
-
由 zhangkaihuo 提交于
This is a fusion operator to compute feed forward layer in transformer model architecture.
-
由 feng_shuai 提交于
-
由 smallv0221 提交于
* Add bincount op * upload cpu version * fix unitest * fix unittest * fix unittest * fix en doc * add more test * fix en doc * add more test case * fix test * fix input vailidation * fix input check * fix unittest * fix test * fix en doc cherry-pick
-
由 Yulong Ao 提交于
-
由 xiongkun 提交于
Support various length support for SelectedRows in GLOO::AllGather (#36637) In cpu parallel using gloo, add various length support for SelectedRows
-
由 Leo Chen 提交于
* refine amp level * fix typo * update tracer._amp_level
-
由 yaoxuefeng 提交于
* add slotrecord datafeed (#36099) * fix multi-node (#36329)
-
由 HydrogenSulfate 提交于
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
由 xiongkun 提交于
[cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO to speed up training (#35745) (#36605) * User specified backend (#35745) * remove tensordot
-
- 25 10月, 2021 8 次提交
-
-
由 WangXi 提交于
* Revert "Add fused_dropout wrapper to ease use. (#36185) (#36640)" This reverts commit 05d7e2fd. * [hybrid] seed and dropout op support force-cpu (#35820) * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid * [HIP] fix op not support AMD GPU bug * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] fix seed ci failed issue * add AsExtra for force_cpu of seed op * Add fused_dropout wrapper to ease use. (#36185) * [hybrid] static model parallel dropout support deterministic RandomSeedGenerator (#36228) Co-authored-by: Nxiayanming <41795079@qq.com> Co-authored-by: NLi Min <11663212+limin2021@users.noreply.github.com>
-
由 Wilber 提交于
-
由 Wilber 提交于
-
由 JingZhuangzhuang 提交于
-
由 wenbin 提交于
-
由 Liu-xiandong 提交于
Add paddle.nn.functional.sparse_attention API 本个PR主要将sparse_attention功能在python层进行了一层封装,OP的主体代码见:#PR35676 此外,对于封装的python 接口,增加了相应的单测。
-
由 zhangbo9674 提交于
Add fp16 kernel for clip_op.
-
由 From00 提交于
* Add new API tensordot cherry-pick #36273
-
- 24 10月, 2021 1 次提交
-
-
由 Jack Zhou 提交于
* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs
-
- 21 10月, 2021 2 次提交
-
-
由 littletomatodonkey 提交于
* fix replicate pad when input size is 0 * add unit test
-
由 0x45f 提交于
* remove no_value using var.name
-
- 20 10月, 2021 2 次提交
- 19 10月, 2021 2 次提交
-
-
由 Liu-xiandong 提交于
The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically. The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR. The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.
-
由 ceci3 提交于
* quant support matmul_v2 * fix format
-
- 18 10月, 2021 1 次提交
-
-
由 0x45f 提交于
[Cherry-pick][Dy2stat]fix no_grad context error in train mode when using save/load (#36434) (#36463) 修复使用jit.save/load接口加载模型后,在train模式和no_grad上下文中,显存会一直增长的问题
-
- 15 10月, 2021 2 次提交
-
-
由 wuhuanzhou 提交于
* [WIP]Verify the correctness of graph rewrited by GeneratePass, test=develop * add delete subgraph and unittest, test=develop * check simple pass, test=develop * fix coverage, test=develop * limit with input_spec via Paddle API, test=develop
-
由 Yanxing Shi 提交于
* add sparse_embedding doc * modify sample code * fix sample code error
-
- 13 10月, 2021 2 次提交
- 12 10月, 2021 1 次提交
-
-
由 Aurelius84 提交于
* Fix stop_gradient in RunProgramOp * fix reference
-
- 11 10月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
对于__getattr__重载后不满足条件的参数,全部抛出AttributeError异常,达到与未重载版本一致。 (cherry picked from PR #36229)
-
- 30 9月, 2021 3 次提交
-
-
由 zhaoyingli 提交于
* update func name * skip cpu * update unittest * update unittest
-
由 李季 提交于
* fix raw optim * pre-commit test file Co-authored-by: Nsneaxiy <sneaxiy@126.com> Co-authored-by: Nsneaxiy <sneaxiy@126.com>
-
由 李季 提交于
-
- 29 9月, 2021 2 次提交
-
-
由 Lijunhui 提交于
向PaddlePaddle中的线性代数库添加eig算子,该算子计算一般方阵的特征分解。 cherry-pick 自#35674.
-
由 hlygit66666 提交于
* add get_device_name and get_device_capability * fix docs * fix docs * fix decs
-
- 27 9月, 2021 3 次提交
-
-
由 Yanxing Shi 提交于
* Initial Commit * fix py2 error * fix wrong words and doc * test=document_fix * fix _gpuDeviceProperties
-
由 Jiawei Wang 提交于
* fix unique unstack dim 0 * fix unique_op format
-
由 zhiboniu 提交于
remove recent linalg api in paddle.init; add args 'name' in some new linalg api interface
-