- 27 10月, 2021 20 次提交
-
-
由 wuhuanzhou 提交于
* GeneratePass support attr condition and mapping, test=develop * fix coverage, test=develop
-
由 pangyoki 提交于
* add paddle.version.cuda and paddle.version.cudnn API * fix little bug * fix bug * add doc string * fix mkdir error * fix windows path * fix new paddle/version path * fix unittest * fix format
-
由 zhaoyingli 提交于
-
由 wangxinxin08 提交于
* add dcnv2 plugin
-
由 zlsh80826 提交于
-
由 JZ-LIANG 提交于
* revise completion for backward * revise completion for update * revise completion for update * update unitest
-
由 piotrekobiIntel 提交于
* Add WIP version of elementwise_div_mkldnn without working dy grad * Add dy gradient calculation implementation, disable broadcast tests * Readd removed tests from static_mode_white_list * Add bfloat16 gradient tests, remove int8 and uint8 support * - Change the way dy grad is calculated to improve performance - Refactor BinaryMKLDNNHandler to use a default parameter * Change copyright year * Refactor as suggested * Attempt to bypass CI Approval not accepting max_relative_error * Fix formatting issue
-
由 xiaoxiao-luomu 提交于
* gloo hdfs set check & gloo connect retry * add vlog * print gloo connect addr & add vlog * . * modify vlof * modify vlog * modify vlog * Update __init__.py deleted extra clear_model
-
由 fuqianya 提交于
* add DenseNet
-
由 Feiyu Chan 提交于
* WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code * clean debug code Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
-
由 Hui Zhang 提交于
* Layer.to reutrn self * add device required
-
由 zhangkaihuo 提交于
本PR是fused_transformer的layer层代码,包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。
-
由 xiongkun 提交于
* bugfix: only check backend when mode == Collecive * fix bug
-
由 baoachun 提交于
* fix matmul dim error * fix wrong dim check in matmul
-
由 Feiyu Chan 提交于
* fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift
-
由 taixiurong 提交于
-
由 Wilber 提交于
-
由 huangjun12 提交于
* add eigvalsh with is_test * add eigvalsh op * fix backward bug * forward and backward, float and complex, unittest * remove eigvalsh_helper.h * remove changes of cusolver.h * fix unittest * fix unittest bug * update code following eigh * fix test * update lapack * pull develop * update funcor * fix unittest bug * fix details * add tensor_method_func * fix notes
-
由 whs 提交于
-
由 0x45f 提交于
-
- 26 10月, 2021 18 次提交
-
-
由 Jiabin Yang 提交于
* remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
由 Feiyu Chan 提交于
-
由 zhulei 提交于
-
由 Leo Chen 提交于
* cache exception in child thread * add ut * fix ut
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * update * update * update Error MSG * update EventsWaiter * Add Cancel For ThreadPool * Add UT for Cancel
-
由 Huihuang Zheng 提交于
Update `cond` English document
-
由 Li Min 提交于
将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
-
由 Qi Li 提交于
* [NPU] fix argsort op, test=develop * remove debug files, test=develop * fix typo, test=develop * address review comments, test=develop
-
由 baoachun 提交于
* fix wrong trt dim when input dim is 2 * update leaky_relu and instance_norm converter unit test * add instance_norm input dim check
-
由 Zhen Wang 提交于
* Fix the null ptr bug in build_cinn_pass. * Add test for empty&ctrl var.
-
由 Feiyu Chan 提交于
* move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos
-
由 Leo Chen 提交于
-
由 Wangzheee 提交于
[Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul, mul) convert pass, fix (matmul, mul) op_teller (#36652) * new_Matmul2ToMatmulToMul * new_Matmul2ToMatmulToMul * fix paddle_pass_builder * fix paddle_pass_builder * fix paddle_pass_builder * tem * tem * Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass * Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass * add matmul_broadcast_unitest * fix op_teller
-
由 Jack Zhou 提交于
* optimize fast tokenizer
-
由 xiongkun 提交于
* In cpu parallel using gloo, add various length support for SelectedRows * fix bug * fix bugs * fix by code review * remove timeout
-
由 JingZhuangzhuang 提交于
* fix pool2d convert case * add pool2d convert test case for trt6
-
由 feng_shuai 提交于
-
- 25 10月, 2021 2 次提交
-
-
由 zhaocaibei123 提交于
-
由 Aganlengzi 提交于
* [NPU] modifications for model ernie-1.0 * rollback 503003 and change cast to dtype
-