- 27 10月, 2021 1 次提交
-
-
由 whs 提交于
-
- 26 10月, 2021 14 次提交
-
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
由 Feiyu Chan 提交于
-
由 zhulei 提交于
-
由 Leo Chen 提交于
* cache exception in child thread * add ut * fix ut
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * update * update * update Error MSG * update EventsWaiter * Add Cancel For ThreadPool * Add UT for Cancel
-
由 Li Min 提交于
将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
-
由 Qi Li 提交于
* [NPU] fix argsort op, test=develop * remove debug files, test=develop * fix typo, test=develop * address review comments, test=develop
-
由 baoachun 提交于
* fix wrong trt dim when input dim is 2 * update leaky_relu and instance_norm converter unit test * add instance_norm input dim check
-
由 Zhen Wang 提交于
* Fix the null ptr bug in build_cinn_pass. * Add test for empty&ctrl var.
-
由 Leo Chen 提交于
-
由 Wangzheee 提交于
[Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul, mul) convert pass, fix (matmul, mul) op_teller (#36652) * new_Matmul2ToMatmulToMul * new_Matmul2ToMatmulToMul * fix paddle_pass_builder * fix paddle_pass_builder * fix paddle_pass_builder * tem * tem * Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass * Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass * add matmul_broadcast_unitest * fix op_teller
-
由 Jack Zhou 提交于
* optimize fast tokenizer
-
由 xiongkun 提交于
* In cpu parallel using gloo, add various length support for SelectedRows * fix bug * fix bugs * fix by code review * remove timeout
-
由 feng_shuai 提交于
-
- 25 10月, 2021 10 次提交
-
-
由 zhaocaibei123 提交于
-
由 Aganlengzi 提交于
* [NPU] modifications for model ernie-1.0 * rollback 503003 and change cast to dtype
-
由 zhangkaihuo 提交于
这个PR是fused_feedforward反向的代码 相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。
-
由 smallv0221 提交于
* Add bincount op * upload cpu version * fix unitest * fix unittest * fix unittest * fix en doc * add more test * fix en doc * add more test case * fix test * fix input vailidation * fix input check * fix unittest * fix test * fix en doc
-
由 tianshuo78520a 提交于
CI build PR and dev whl
-
由 Zhen Wang 提交于
* Init the functions of CinnCompiler. * Add the unit test for CinnCompiler. * Fix some compilation errors. * Update the UT of cinn_compiler. * Use Decomposer&OpFusion passes in CinnCompiler::CompileGraph. * Update some comments. * Uncomment some includes in build_cinn_pass.cc. * Use refs instead of ptrs as returned types of FindGraph & Compile in CinnCompiler. * Use the merged CinnGraphSymbolization functions in CinnCompiler.
-
由 TTerror 提交于
* add some ops to train ssd on kunlun * add some ops to train ssd on kunlun * add some ops to train ssd on kunlun * update cast op unittest * update cast op unittest * update cast op unittest * update xpu cmake * update cast unittest
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * update * update * update Error MSG * update EventsWaiter
-
由 whs 提交于
-
由 zhangkaihuo 提交于
这个PR只包含fused_feedforward前向的代码。 相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。
-
- 24 10月, 2021 1 次提交
-
-
由 Zhen Wang 提交于
-
- 23 10月, 2021 6 次提交
-
-
由 jiangcheng 提交于
* add cinn graph symbolization * fix some bug * add paddle scope to cinn scope * add paddle scope to CINN scope in Symbolization, and add feed op when build cinn pass * fix some bug * fix some bug by review advices * optimize code problem * revert build_cinn_pass and move the change to https://github.com/PaddlePaddle/Paddle/pull/36503 * fix some bug after co-compilation * perfect single test script * remove scope and rename feed_target to input_tensor * using std::unordered_map instead of absl::flat_hash_map * fix single test bug * revert to preverion for WITH_CINN has add in later PR * full error information for CI * full enfore information for CI pass
-
由 wenbin 提交于
* disable padding if dynamic shape * add parentheses * correct
-
由 baoachun 提交于
-
由 Wilber 提交于
* add file check * add ut
-
由 jiangcheng 提交于
* add transformer of paddle desc and cinn desc * change LOG(FATAL) to PADDLE_THROW for ci * full error imformation for ci * fix some problem as review advice * fix some bug * move vat type utils to tansform_desc header file * add if NOT WITH_CINN control whether compile * build_strategy check whether open WITH_CINN * add control WITH_CINN in cmake
-
由 Huihuang Zheng 提交于
This PR added some changes to match the CINN change for compilation. It also tried to fix JiangCheng's Problem in PR: https://github.com/PaddlePaddle/Paddle/pull/36100 These changes include: 1. Set `CINN_GIT_TAG` to a newer tag 2. CINN now just `make cinnapi -j` 3. We have to add `-DPY_VERSION=${PY_VERSION} -DWITH_TESTING=ON` to CINN cmake args 4. For CINN's third party dependencies, we could just include headers without target_link_libraries 5. Moved `cinn.cmake` from `paddle/cmake` to `paddle/cmake/external` to match old style. External folder contains `lite`, which is the same level of `cinn` 6. CINN added `-DNAMESPACE=cinn_gflags` in `gflags.cmake` to have different gflag namespaces between CINN and Paddle. It solved re-define problem. 7. Change namespace of `::google::` in gflags to `::GFLAGS_NAMESPACE`
-
- 22 10月, 2021 6 次提交
-
-
由 wenbin 提交于
* slice * add UT
-
由 zhangbo9674 提交于
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
由 Leo Chen 提交于
* [hapi] support dygrapg amp O2 * fix problem of static pure fp16 in hapi * fix bug * fix format * fix ut * follow comments * update ut * update amp save/load * fix ut * refine code format
-
由 Weilong Wu 提交于
* Support elementwise_add triple grad Kernel * Change code-format to follow CI std * Removed unreasonable code, and fixed an input uninitialized issue * Support elementwise_add triple grad Kernel * Change code-format to follow CI std * Removed unreasonable code, and fixed an input uninitialized issue
-
由 Wilber 提交于
-
- 21 10月, 2021 2 次提交