- 23 11月, 2021 3 次提交
- 
- 
由 wangguanqun 提交于* save/load in ps runtime(the_one_ps) (#36097) * add trainer desc config to distributed strategy * code style modified * data_feed set lod * fix bug * code style * fix bug * save load * save load * save unittest * add unittest of the_one_ps * unittest * add todo in communicator sendsparse * fix bug in save_inference_model (#37362) 
- 
由 0x45f 提交于[Dy2stat]Allow users to switch eval/train mode when using @to_static to decorate a function (#37383) (#37432) 本PR之前使用@to_static装饰一个单独的function时,对于生成的Program无法切换train/eval模式,只能运行在train模式下。这也就导致动转静后用户多次调用function显存会一直增长。 本PR之后,使用@to_static装饰一个单独的function时,可以通过function.train()或者function.eval()的方式来切换train/eval模式。 
- 
由 Wilber 提交于
 
- 
- 22 11月, 2021 2 次提交
- 
- 
由 ceci3 提交于* fix a quantization bug Co-authored-by: NXGZhang <46363693+XGZhang11@users.noreply.github.com>
- 
由 Siming Dai 提交于* Add paddle.incubate.graph_send_recv API * fix bug in CudaAtomicMin and CudaAtomicMax * add empty line 
 
- 
- 19 11月, 2021 1 次提交
- 
- 
由 0x45f 提交于该PR使得动转静模块能够正确转换如下的for i in [1, 2, 3]语句。 
 
- 
- 16 11月, 2021 2 次提交
- 
- 
由 zhangkaihuo 提交于修复了fused_transformer_encoder_layer fine-tune过程发现的一些问题: fused_attention_op添加attn_mask=None的支持:PR pre_layer_norm处理问题:PR 参数处理,计算错误的问题:PR add_bias计算错误问题:PR 添加pure fp16的支持:PR
- 
由 zyfncg 提交于修复了一维Tensor在使用省略号(...)索引时维度检测异常的问题。 
 
- 
- 15 11月, 2021 1 次提交
- 
- 
由 Zeng Jinle 提交于* add mlperf optimization PRs * update 
 
- 
- 10 11月, 2021 1 次提交
- 
- 
由 Jack Zhou 提交于* fix rnn grad bug when num_layers is set 2 and dropout_prob is set 0 * add more test for rnn 
 
- 
- 08 11月, 2021 2 次提交
- 
- 
由 Weilong Wu 提交于Renamed the variable and function Removed the original template function Removed the tests_properties in CMakeLists.txt
- 
由 zyfncg 提交于att,Fix issue:36902 
 
- 
- 01 11月, 2021 1 次提交
- 
- 
由 Liu-xiandong 提交于* fix cusparse compile bug in CUDA11.2, test=develop * fix bug 
 
- 
- 30 10月, 2021 1 次提交
- 
- 
由 Yiqun Liu 提交于Cherry-pick #36525 
 
- 
- 29 10月, 2021 1 次提交
- 
- 
由 Feiyu Chan 提交于2. add complex data type support for paddle.shape at graph assembly. 
 
- 
- 28 10月, 2021 10 次提交
- 
- 
由 0x45f 提交于
- 
由 pangyoki 提交于Cherry-pick PR #36511 
- 
由 zhaoyingli 提交于
- 
由 Ligoml 提交于* fix device docs;test=document_fix * update __init__.py 
- 
由 pangyoki 提交于* add paddle.version.cuda and paddle.version.cudnn API * fix little bug * fix bug * add doc string * fix mkdir error * fix windows path * fix new paddle/version path * fix unittest * fix format 
- 
由 XGZhang 提交于* [cherry-pick 2.2]support quantization of bert support quantization for maumul_v2 * Update quantization_pass.py 
- 
由 Li Min 提交于* Fix bug when pre_layer_norm is false. 
- 
由 Xiaoxu Chen 提交于* update fft api path (#36219) * update fft api path * add sample code for ihfft2 Co-authored-by: Nchenfeiyu <chenfeiyu@baidu.com> * fix fft axis (#36321) fix: `-1` is used when fft's axis is `0` * use unified external error message for cufft api (#36114) * fft: modify sample code result (#36325) * dynamic load mkl as a fft backend when it is avaialble and requested (#36414) * add rocm support for fft api (#36415) * move signal apis * move fft and signal API path (#2) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos in signal.py (#3) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * disable Cache when CUFFT_VERSION >= 10200 (#4) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * Add LRUCache for fft plans * add LRUCache for cuff and hipfft (#5) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code Co-authored-by: NXiaoxu Chen <chenxx_id@163.com> * remove debug message of cufftHandler * roll_op: support Tensor as input for shifts (#36727) * fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift Co-authored-by: Nzhiboniu <31800336+zhiboniu@users.noreply.github.com> Co-authored-by: Nchenfeiyu <chenfeiyu@baidu.com> Co-authored-by: LJQ ❤ ️ <33169170+lijiaqi0612@users.noreply.github.com>
- 
由 0x45f 提交于show paddle traceback after last user code traceback 
 
- 
- 27 10月, 2021 4 次提交
- 
- 
由 zhangkaihuo 提交于本PR是fused_transformer的layer层代码,包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。 
- 
由 Huihuang Zheng 提交于Update `cond` English document 
- 
由 huangjun12 提交于
- 
由 Li Min 提交于功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据; 
 
- 
- 26 10月, 2021 11 次提交
- 
- 
由 Wilber 提交于
- 
由 baoachun 提交于* fix wrong trt dim when input dim is 2 * update leaky_relu and instance_norm converter unit test * add instance_norm input dim check 
- 
由 Wangzheee 提交于[Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul, mul) convert pass, fix (matmul, mul) op_teller (#36652) (#36737) 
- 
由 Steffy-zxf 提交于* Add FasterTokenizer Operator (#34491) Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent. * support the text string as an input Tensor * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization. * It first applies basic tokenization, followed by wordpiece tokenization. * optimize fast tokenizer * remove const_cast Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com> Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com> 
- 
由 zhangkaihuo 提交于* add op: fused_feedforward(backward) (#35611) 这个PR是fused_feedforward反向的代码 相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。 * Move fused_attention and fused_feedforward functional api path to incubate (#36704) 将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。 
- 
由 Haohongxiang 提交于* fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer (#36237) * fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer * update * update * fix bugs in mp_layers、pp_layers and HybridParallelClipGrad (#36144) * fix calling bug of HybridParallelClipGrad * fix bugs of HybridParallelClipGrad * add unittest of pp with HybridParallelClipGrad * fix bugs in mp_layers.py * update * fix bugs in pp_layers.py * update * [HybridParallel]Rebuild code for pipeline (#36396) * add no_sync for parameters sync * add pipeline for moe * [HybridParallel]Support fp16 in dygraph hybrid parallel (#36420) * [HybridParallel]Support fp16 in dygraph hybrid parallel * update * update * update for recompute * add unittest of pp+fp16 * add unittest of recompute+fp16 * update * modify ut * modify ut of cond (#36475) * fix bugs of ClipGradByGlobalNorm in HybridParallel (#36555) * fix bugs of ClipGradByGlobalNorm * add unittests * add unittests * [HybridParallel]fix bug of check_inf in fleet_base.py (#36651) * fix bug of check_inf * fix allreduce * support ClipGradByGlobalNorm in sharding (#36012) * support ClipGradByGlobalNorm in sharding * support ClipGradByGlobalNorm in sharding * test=allcase * Update test_linalg_cond.py * Update hybrid_parallel_util.py * Update hybrid_parallel_util.py Co-authored-by: NShenLiang <1422485404@qq.com> Co-authored-by: Nzhaoyingli <86812880+zhaoyinglia@users.noreply.github.com> 
- 
由 zhangkaihuo 提交于This is a fusion operator to compute feed forward layer in transformer model architecture. 
- 
由 feng_shuai 提交于
- 
由 smallv0221 提交于* Add bincount op * upload cpu version * fix unitest * fix unittest * fix unittest * fix en doc * add more test * fix en doc * add more test case * fix test * fix input vailidation * fix input check * fix unittest * fix test * fix en doc cherry-pick 
- 
由 Yulong Ao 提交于
- 
由 xiongkun 提交于Support various length support for SelectedRows in GLOO::AllGather (#36637) In cpu parallel using gloo, add various length support for SelectedRows
 
- 
