- 19 11月, 2021 2 次提交
-
-
由 0x45f 提交于
set net.forward to original forward function in flops when net is a dy2stat model.
-
由 Liu-xiandong 提交于
* fix cusparse compile bug in CUDA11.2, test=develop * modify sparse_attention docs, test=document_fix (#36554) * modify sparse_attention docs, test=develop * add warning * add warning ,test=document_fix
-
- 17 11月, 2021 4 次提交
-
-
由 Wangzheee 提交于
* fix_qkv_plugin: half_scale * [Paddle-Inference] fix_qkv_plugin: fix half scale
-
由 Wangzheee 提交于
-
由 Wangzheee 提交于
-
由 JingZhuangzhuang 提交于
-
- 16 11月, 2021 3 次提交
-
-
由 zhangkaihuo 提交于
修复了fused_transformer_encoder_layer fine-tune过程发现的一些问题: fused_attention_op添加attn_mask=None的支持:PR pre_layer_norm处理问题:PR 参数处理,计算错误的问题:PR add_bias计算错误问题:PR 添加pure fp16的支持:PR
-
由 zyfncg 提交于
修复了一维Tensor在使用省略号(...)索引时维度检测异常的问题。
-
由 石晓伟 提交于
Co-authored-by: NPei Yang <peiyang@baidu.com>
-
- 15 11月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add mlperf optimization PRs * update
-
- 10 11月, 2021 1 次提交
-
-
由 Jack Zhou 提交于
* fix rnn grad bug when num_layers is set 2 and dropout_prob is set 0 * add more test for rnn
-
- 08 11月, 2021 2 次提交
-
-
由 Weilong Wu 提交于
Renamed the variable and function Removed the original template function Removed the tests_properties in CMakeLists.txt
-
由 zyfncg 提交于
att,Fix issue:36902
-
- 01 11月, 2021 2 次提交
-
-
由 Liu-xiandong 提交于
* fix cusparse compile bug in CUDA11.2, test=develop * fix bug
-
由 Feng Xing 提交于
-
- 30 10月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
Cherry-pick #36525
-
- 29 10月, 2021 2 次提交
-
-
由 Wilber 提交于
-
由 Feiyu Chan 提交于
2. add complex data type support for paddle.shape at graph assembly.
-
- 28 10月, 2021 13 次提交
-
-
由 0x45f 提交于
-
由 pangyoki 提交于
Cherry-pick PR #36511
-
由 zhaoyingli 提交于
-
由 Ligoml 提交于
* fix device docs;test=document_fix * update __init__.py
-
由 pangyoki 提交于
* add paddle.version.cuda and paddle.version.cudnn API * fix little bug * fix bug * add doc string * fix mkdir error * fix windows path * fix new paddle/version path * fix unittest * fix format
-
由 XGZhang 提交于
-
由 XGZhang 提交于
* [cherry-pick 2.2]support quantization of bert support quantization for maumul_v2 * Update quantization_pass.py
-
由 Li Min 提交于
* Fix fused_attention english doc test=document_fix
-
由 feng_shuai 提交于
* change api for support trt8
-
由 Li Min 提交于
* Fix bug when pre_layer_norm is false.
-
由 Xiaoxu Chen 提交于
* update fft api path (#36219) * update fft api path * add sample code for ihfft2 Co-authored-by: Nchenfeiyu <chenfeiyu@baidu.com> * fix fft axis (#36321) fix: `-1` is used when fft's axis is `0` * use unified external error message for cufft api (#36114) * fft: modify sample code result (#36325) * dynamic load mkl as a fft backend when it is avaialble and requested (#36414) * add rocm support for fft api (#36415) * move signal apis * move fft and signal API path (#2) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos in signal.py (#3) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * disable Cache when CUFFT_VERSION >= 10200 (#4) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * Add LRUCache for fft plans * add LRUCache for cuff and hipfft (#5) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code Co-authored-by: NXiaoxu Chen <chenxx_id@163.com> * remove debug message of cufftHandler * roll_op: support Tensor as input for shifts (#36727) * fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift Co-authored-by: Nzhiboniu <31800336+zhiboniu@users.noreply.github.com> Co-authored-by: Nchenfeiyu <chenfeiyu@baidu.com> Co-authored-by: LJQ
❤ ️ <33169170+lijiaqi0612@users.noreply.github.com> -
由 0x45f 提交于
show paddle traceback after last user code traceback
-
- 27 10月, 2021 9 次提交
-
-
由 wenbin 提交于
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
-
由 zhangkaihuo 提交于
本PR是fused_transformer的layer层代码,包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。
-
由 xiongkun 提交于
* bugfix: only check backend when mode == Collecive
-
由 Huihuang Zheng 提交于
Update `cond` English document
-
由 baoachun 提交于
-
由 huangjun12 提交于
-
由 whs 提交于
-
由 Guoxia Wang 提交于
* fix BatchNorm for fp16
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-