- 26 11月, 2021 2 次提交
-
-
由 zhouweiwei2014 提交于
cherry-pick #36714
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * [heterps]bug fix for _run_from_dataset * fix heter_server.cc * fix launch_utils.py * fix heter_section_worker.cc * fix. test=develop * fix. test=develop
-
- 25 11月, 2021 3 次提交
-
-
由 Steffy-zxf 提交于
* fix data parallel when VOCAB var in program * fix ci coverage
-
由 kuizhiqing 提交于
-
由 pangyoki 提交于
Cherry-pick PR 37420, fix inplace bug when the first grad_var(loss_grad) is inplace var (#37420) (#37488) fix inplace bug,Cherry pick PR #37420
-
- 24 11月, 2021 1 次提交
-
-
由 Li Min 提交于
Add support for bias is none for fused_attention op.
-
- 23 11月, 2021 6 次提交
-
-
由 lilong12 提交于
-
由 zmx 提交于
* bug fix for DeserializeSelectedRows. test=develop (#36520) * fix SerializeSelectedRows (#36543) * bug fix for DeserializeSelectedRows. test=develop * fix bug for SerializeSelectedRows. test=develop * update. test=develop * [Heterps]Refactor Heter Pipeline Parameter Server (#36845) * change username * fix * fix * fix * fix * fix * update * update * update unittests * fix * update * fix * update * fix * fix * fix * update * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update send_and_recv op. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix unit. notest,test=coverage * fix ut. notest, test=coverage * update. notest,test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix. notest, test=coverage * fix. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * add func. notest, test=coverage * fix ut. notest, test=coverage * fix. test=develop * fix. test=develop * Fix unit test for send_and_recv_cpu & send_and_recv_gpu (#37129) * [heterps]fix ut for heter_pipeline_trainer.cc (#37136) * fix ut. test=develop * fix ut. test=develop * [heterps]bug fix for local training with --heter_worker_num (#37166) * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * [heterps]Refactor heterogenous worker (#37244) * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * refactor heter trainer. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * [heterps]add heterps mode judgement (#37298) * [heterps]change default executor for heter trainer (#37314) * fix pslib. test=develop * add device to train_from_dataset. test=develop * refine fleet.stop_worker. test=develop * fix ut. test=develop * fix ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * [heterps]remove api for heter pipeline ps (#37396) * fix api. test=develop * fix api. test=develop * fix code style. test=release/2.2 * fix CMakeLists. test=develop (#37454)
-
由 zhupengyang 提交于
-
由 wangguanqun 提交于
* save/load in ps runtime(the_one_ps) (#36097) * add trainer desc config to distributed strategy * code style modified * data_feed set lod * fix bug * code style * fix bug * save load * save load * save unittest * add unittest of the_one_ps * unittest * add todo in communicator sendsparse * fix bug in save_inference_model (#37362)
-
由 0x45f 提交于
[Dy2stat]Allow users to switch eval/train mode when using @to_static to decorate a function (#37383) (#37432) 本PR之前使用@to_static装饰一个单独的function时,对于生成的Program无法切换train/eval模式,只能运行在train模式下。这也就导致动转静后用户多次调用function显存会一直增长。 本PR之后,使用@to_static装饰一个单独的function时,可以通过function.train()或者function.eval()的方式来切换train/eval模式。
-
由 Wilber 提交于
-
- 22 11月, 2021 2 次提交
-
-
由 ceci3 提交于
* fix a quantization bug Co-authored-by: NXGZhang <46363693+XGZhang11@users.noreply.github.com>
-
由 Siming Dai 提交于
* Add paddle.incubate.graph_send_recv API * fix bug in CudaAtomicMin and CudaAtomicMax * add empty line
-
- 19 11月, 2021 1 次提交
-
-
由 0x45f 提交于
该PR使得动转静模块能够正确转换如下的for i in [1, 2, 3]语句。
-
- 16 11月, 2021 2 次提交
-
-
由 zhangkaihuo 提交于
修复了fused_transformer_encoder_layer fine-tune过程发现的一些问题: fused_attention_op添加attn_mask=None的支持:PR pre_layer_norm处理问题:PR 参数处理,计算错误的问题:PR add_bias计算错误问题:PR 添加pure fp16的支持:PR
-
由 zyfncg 提交于
修复了一维Tensor在使用省略号(...)索引时维度检测异常的问题。
-
- 15 11月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add mlperf optimization PRs * update
-
- 10 11月, 2021 1 次提交
-
-
由 Jack Zhou 提交于
* fix rnn grad bug when num_layers is set 2 and dropout_prob is set 0 * add more test for rnn
-
- 08 11月, 2021 2 次提交
-
-
由 Weilong Wu 提交于
Renamed the variable and function Removed the original template function Removed the tests_properties in CMakeLists.txt
-
由 zyfncg 提交于
att,Fix issue:36902
-
- 01 11月, 2021 1 次提交
-
-
由 Liu-xiandong 提交于
* fix cusparse compile bug in CUDA11.2, test=develop * fix bug
-
- 30 10月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
Cherry-pick #36525
-
- 29 10月, 2021 1 次提交
-
-
由 Feiyu Chan 提交于
2. add complex data type support for paddle.shape at graph assembly.
-
- 28 10月, 2021 10 次提交
-
-
由 0x45f 提交于
-
由 pangyoki 提交于
Cherry-pick PR #36511
-
由 zhaoyingli 提交于
-
由 Ligoml 提交于
* fix device docs;test=document_fix * update __init__.py
-
由 pangyoki 提交于
* add paddle.version.cuda and paddle.version.cudnn API * fix little bug * fix bug * add doc string * fix mkdir error * fix windows path * fix new paddle/version path * fix unittest * fix format
-
由 XGZhang 提交于
* [cherry-pick 2.2]support quantization of bert support quantization for maumul_v2 * Update quantization_pass.py
-
由 Li Min 提交于
* Fix bug when pre_layer_norm is false.
-
由 Xiaoxu Chen 提交于
* update fft api path (#36219) * update fft api path * add sample code for ihfft2 Co-authored-by: Nchenfeiyu <chenfeiyu@baidu.com> * fix fft axis (#36321) fix: `-1` is used when fft's axis is `0` * use unified external error message for cufft api (#36114) * fft: modify sample code result (#36325) * dynamic load mkl as a fft backend when it is avaialble and requested (#36414) * add rocm support for fft api (#36415) * move signal apis * move fft and signal API path (#2) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos in signal.py (#3) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * disable Cache when CUFFT_VERSION >= 10200 (#4) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * Add LRUCache for fft plans * add LRUCache for cuff and hipfft (#5) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code Co-authored-by: NXiaoxu Chen <chenxx_id@163.com> * remove debug message of cufftHandler * roll_op: support Tensor as input for shifts (#36727) * fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift Co-authored-by: Nzhiboniu <31800336+zhiboniu@users.noreply.github.com> Co-authored-by: Nchenfeiyu <chenfeiyu@baidu.com> Co-authored-by: LJQ
❤ ️ <33169170+lijiaqi0612@users.noreply.github.com> -
由 0x45f 提交于
show paddle traceback after last user code traceback
-
- 27 10月, 2021 4 次提交
-
-
由 zhangkaihuo 提交于
本PR是fused_transformer的layer层代码,包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。
-
由 Huihuang Zheng 提交于
Update `cond` English document
-
由 huangjun12 提交于
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
- 26 10月, 2021 2 次提交