- 05 5月, 2023 2 次提交
-
-
由 shentanyue 提交于
-
由 sprouteer 提交于
-
- 04 5月, 2023 1 次提交
-
-
由 weishengying 提交于
-
- 28 4月, 2023 1 次提交
-
-
由 HongyuJia 提交于
-
- 27 4月, 2023 4 次提交
-
-
由 zhupengyang 提交于
-
由 wuhuachaocoding 提交于
-
由 HongyuJia 提交于
* [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily * Add unittest
-
由 Galaxy1458 提交于
* test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop
-
- 26 4月, 2023 1 次提交
-
-
由 Galaxy1458 提交于
* test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop
-
- 25 4月, 2023 3 次提交
-
-
由 sprouteer 提交于
-
由 wuhuachaocoding 提交于
-
由 YuanRisheng 提交于
* add flags for phi * fix compile bugs * fix ci bugs * fix inference bugs * fix cinn' bugs * fix cinn bugs * perfect code according comment * fix ci bugs * fix ci bugs
-
- 24 4月, 2023 4 次提交
-
-
由 niuliling123 提交于
-
由 zhupengyang 提交于
-
由 张春乔 提交于
-
由 Galaxy1458 提交于
* test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test ,test=develop
-
- 23 4月, 2023 2 次提交
-
-
由 risemeup1 提交于
* apply gcc12 to gpups * apply gcc12 to gpups * apply gcc12 to gpups * apply gcc12 to gpups * apply gcc12 to gpups * apply gcc12 to gpups * apply gcc12 to gpips * apply gcc12 to gpups * apply gcc12 to gpups * test * test * apply gcc12 to gpups * apply_gcc12_to_gpups * fix compiler bug * fix compiler bug * test * fix dangling-pointer compiler * fix dangling-pointer compiler * fix dangling-pointer compiler * apply_gcc12_to_gpups * apply gcc12 to gpups * Update cuda_streams_py.cc
-
由 Galaxy1458 提交于
* test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop * test,test=develop
-
- 21 4月, 2023 4 次提交
-
-
由 JYChen 提交于
* support 0-D output and 0-D as indice in __getitem__ * fix tests * fix inference and UT * add unittest for setitem * fix xpu test * fix xpu 0-d
-
由 YuhangLi 提交于
-
由 Ghost Screaming 提交于
* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result is wrong. * Remove climits. * Fix bug of BlockDesc::MoveFrom(). It's used to rebuild main_program_desc from ProgramDesc modified by Fusion Pass. As some fused operators need to create new Variables in modified ProgramDesc, MoveFrom function uses std::move() function to move these VarDesc to main_program_desc. As a result, their pointers holded by modified ProgramDesc become nullptr. When call block()->Program()->proto() function, it will call ProgramDesc::Flush() function at first, which may cause a segmentation fault.
-
由 zhupengyang 提交于
-
- 20 4月, 2023 4 次提交
-
-
由 tianshuo78520a 提交于
This reverts commit 543efcc5.
-
由 HongyuJia 提交于
* [CustomOP error] Add attrs type check * fix global variable order bug * include unordered_set * fix ParseAttrStr compile error
-
由 huangjiyi 提交于
* update * update * Revert "update" * fix bug * update
-
由 risemeup1 提交于
-
- 19 4月, 2023 4 次提交
-
-
由 Sonder 提交于
* trans fused attention to phi * add optional parm * trans fused_attention_grad to phi * add fused attention grad register info * fix include * test=kunlun * add fused attention to static build list * add remove * update remove
-
由 YuanRisheng 提交于
* fix performance bugs * fix ci bugs
-
由 Jiabin Yang 提交于
-
由 csy0225 提交于
-
- 18 4月, 2023 3 次提交
-
-
由 huangjiyi 提交于
* update * fix bug * update * fix bug
-
由 Galaxy1458 提交于
-
由 张春乔 提交于
-
- 17 4月, 2023 4 次提交
-
-
由 zhoutianzi666 提交于
* initial commit for cutlass_teller * second commit for cutlass_teller * add conv2d_depthwise python template * add conv2d_depthwise cutlass template * /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h * refine code in Conv2dFusionCanSupport * add macro in cutlass_teller.h * add 3x3 5x5 teller * add groups not 1 or conv2d_depthwise teller * 只生成ic是8的倍数的conv2d_depthwise 的kernel * add EXPLICIT in cutlass_teller.h * final commit * add split_k_slices in conv2d_depthwise * make stages == 2 * 重构部分代码 * add CutlassFusionType * solve illegal memory * make stride_h=stride_w && make dilation==1 * must check HasAttr(use_cutlass) before GetAttrIfExists * add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String * modify decl.h and util.cu
-
由 Galaxy1458 提交于
-
由 Sonder 提交于
* add register info for eigh and eig_gard * add sync_batch_norm_op.cu register info * add lamb output register info * add unique register info * change type name * change type name * add output register info for check_finite_and_unscale * update cmake and config file * add register info for adagrad * fix build error * add sync to run_unittests.sh * add register info for unique_consecutive * fix build error * add eigh to STATIC_BUILD_TESTS * update eig_kernel.cc * update eig_kernel.cc * fix infer mate error * fix unique register error * fix lamb register info error * fix lamb register info * update lamb register info * fix lamb * remove one Output Register * update static build file * add eigh op to disable_wingpu_test * update run_unittests
-
由 Haohongxiang 提交于
-
- 14 4月, 2023 3 次提交
-
-
由 jjyaoao 提交于
* delete SupportNPU(), SupportMLU() * delete npu branch
-
由 Feiyu Chan 提交于
1. modify set_value op, use Scalars to represent attr `values`, instead of a bunch of attributs of various types; (#52408) 2. add program converter and set_value op as an example, which provides the functionality to convert `paddle::framework::ProgramDesc` between old and new formats(the differences are mainly some operators with incompatible updates in the definition); 3. program version and operator version map now are always saved when serializing `paddle::framework::ProgramDesc` to identify the version; 3. provide an option `legacy_format=false` in serialization of `paddle::framework::ProgramDesc`, it decided whether to convert ProgramDesc back to a legacy format, which is compatible for paddle 2.4.2 or earlier versions to load and execute; 4. deserialization of `paddle::framework::ProgramDesc` is now automatically detecting whether the bytes it receives is in legacy format(contains any of the operators that has been incompatibly updated and have any attribute of type `Scalar`) and convert it to new format. But if you want a faithful deserialization without the automatic conversion, you can use protobuf's deserialization instead. Though it is not recommended, it can be used for the purpose of testing.
-
由 zhupengyang 提交于
-