- 19 5月, 2023 1 次提交
-
-
由 limingshu 提交于
* Reorganize the forward codes of flash-attention. * Fix forward. * Remove some noused codes. * Simplify codes and fix backward. * Change all LOG(INFO) to VLOG and fix the backward. * add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes * decrease the effect of debug print on performance * Unify the initialize of flashattn arguments. * Rewirte the reshape of temp_mask and temp_bias. * API support use_flash_attn. * Fix compiling error on CI. * Try to crop the flash-attention lib. * Correct the condition of whether can use flash-attn. * Remove the softmax_out argument. * Remove is_causal. * Polish codes. * Fix qkv_transpose_out's shape and scaling of Q * K. * Update commit of flash-attention. --------- Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 18 5月, 2023 1 次提交
-
-
由 Hulek 提交于
* Fused elementwises kernels and ops * change fuse pass name * adjust .pbtxt files * adjust quantization attributes * add missing arguments and fix others, review fixed * simplify fused kernel registration * fix elementwise unit tests * reuse one fused elementwise op * adjust proto * Add supported datatypes * Change 'Scale' to 'scale' in tests, change some tests to onednn * Revert breaking changes * Fix unit tests * Delete obsolete test cases * Delete commented out code * Fix codestyle * delete temporary condition * fix conflicts and delete duplicate fusing * Fix code after merge * Move tests to new directory * fix tests volatility * Rename test_elementwise_add_onednn_op.py to test_elementwise_add_mkldnn_op.py * Update CMakeLists.txt add mkldnn op test --------- Co-authored-by: NSilv3S <slawomir.siwek@intel.com>
-
- 15 5月, 2023 1 次提交
-
-
由 ronnywang 提交于
-
- 11 5月, 2023 1 次提交
-
-
由 Kaipeng Deng 提交于
* move DataLoader to paddle.io. test=develop
-
- 06 5月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Add fused_gate_attention API. * Implement FusedDropout API. * Fix doc and add unittest. * Skip for non-gpu device. * Add unittest.
-
- 30 4月, 2023 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 27 4月, 2023 1 次提交
-
-
由 xiaoguoguo626807 提交于
* modify concat_grad add sum comp rule * modify opcompat
-
- 26 4月, 2023 2 次提交
-
-
由 lijialin03 提交于
* modify numel in lbfgs and add a new test case. test=develop * change param 'lr' to 'learning_rate' in lbfgs and its test * add opt LBFGS and change test
-
由 warrentdrew 提交于
* add leaky relu composite rule * add public python api * unset default negative slope * fix unittest case
-
- 23 4月, 2023 1 次提交
-
-
由 LoneRanger 提交于
* relocate metri_op.py * reloacte nn.py * fix bug * fix bug * fix bug * fix bug * fix bug * fix bug * fix variable->tensor and fix __all__ * fix ctr_metric_bundle and sparse_embedding * fix bug of function init * fix bug of importing sparse_embedding and ctr_metric_bundle * fix bug * Update __init__.py
-
- 22 4月, 2023 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 21 4月, 2023 2 次提交
-
-
由 JYChen 提交于
* support 0-D output and 0-D as indice in __getitem__ * fix tests * fix inference and UT * add unittest for setitem * fix xpu test * fix xpu 0-d
-
- 20 4月, 2023 1 次提交
-
-
由 Wang Xin 提交于
* remove ASCEND* keyword * update docstring * bug fixed * bug fixed
-
- 19 4月, 2023 3 次提交
-
-
由 zyfncg 提交于
-
由 Jiabin Yang 提交于
-
由 zhangyuqin1998 提交于
* fix graph_reindex * fix * Update op_compat.yaml
-
- 17 4月, 2023 1 次提交
-
-
由 Chitsing KUI 提交于
* add random control for fused dropout add * add __init__
-
- 14 4月, 2023 2 次提交
-
-
由 Feiyu Chan 提交于
1. modify set_value op, use Scalars to represent attr `values`, instead of a bunch of attributs of various types; (#52408) 2. add program converter and set_value op as an example, which provides the functionality to convert `paddle::framework::ProgramDesc` between old and new formats(the differences are mainly some operators with incompatible updates in the definition); 3. program version and operator version map now are always saved when serializing `paddle::framework::ProgramDesc` to identify the version; 3. provide an option `legacy_format=false` in serialization of `paddle::framework::ProgramDesc`, it decided whether to convert ProgramDesc back to a legacy format, which is compatible for paddle 2.4.2 or earlier versions to load and execute; 4. deserialization of `paddle::framework::ProgramDesc` is now automatically detecting whether the bytes it receives is in legacy format(contains any of the operators that has been incompatibly updated and have any attribute of type `Scalar`) and convert it to new format. But if you want a faithful deserialization without the automatic conversion, you can use protobuf's deserialization instead. Though it is not recommended, it can be used for the purpose of testing.
-
由 Jiabin Yang 提交于
* add more infer var type * fix split error * fix ut * fix top_k infer vartype * fix top_k infer vartype
-
- 12 4月, 2023 2 次提交
-
-
由 Huihuang Zheng 提交于
* [Do NOT merge] Expr PR on Composite * Expr PR on Composite * Revert some compsite experiment * Remove unnecessary composite code * Add rsqrt as sub primitives
-
由 chenjian 提交于
* fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * isamp * gpu * cpu * noamp * fix instance_norm * fix * fix unit test * fix unit test * add unit test * fix * add big data tests * fix * fix * fix * fix * fix * fix * fix * add test case * fix * fix * fix * fix * fix * remove amp test --------- Co-authored-by: Nheyanru01 <429520051@qq.com>
-
- 10 4月, 2023 1 次提交
-
-
由 lzydev 提交于
* autogen segment_pool * delete legacy_dygraph about segment_pool
-
- 04 4月, 2023 3 次提交
-
-
由 JYChen 提交于
-
由 LoneRanger 提交于
* relocate debugger.py * fix bug * fix bug * fix bug * fix bug
-
由 Jiabin Yang 提交于
-
- 03 4月, 2023 2 次提交
-
-
由 cyber-pioneer 提交于
* polish prim arg None check * fix bug
-
由 cyber-pioneer 提交于
* simplify bn vjp code * simplify composite rule * polish name
-
- 31 3月, 2023 2 次提交
-
-
由 Xiaoxu Chen 提交于
-
由 张春乔 提交于
* autofix Co-authored-by: NLiyulingyue <83450930+Liyulingyue@users.noreply.github.com> * revert changes in python/paddle/distributed/fleet/utils/hybrid_parallel_util.py * empty commit, trigger ci * fix test_slice --------- Co-authored-by: NSigureMo <sigure.qaq@gmail.com>
-
- 30 3月, 2023 2 次提交
-
-
由 cyber-pioneer 提交于
* fix_prim * fix bug * add note * fix logic * fix * add note * fix check * fix bug * fix bug * fix bug * add debug * fix check * fix bug * sync print log * fix test case * change default * change test case time
-
由 cyberslack_lee 提交于
[CodeStyle][C416][C417] rewrite unnecessary comprehension with function call and use generator instead of map (#52140) * codestyle c416 c417 * fix error * fix inc * unify all C4 rules into one * fix inc --------- Co-authored-by: NSigureMo <sigure.qaq@gmail.com>
-
- 29 3月, 2023 2 次提交
-
-
由 Yichen Zhang 提交于
* add group_norm composite rule * add test for scale_grad and bias_grad * resolve conflicts * remove amp in composite_rule.py * add float16 test * deal with NHWC format * keep the composite rule in float16 identical as original kernel * resolve conflicts
-
由 sneaxiy 提交于
* fix generate_kernels.py in CUDA 12.0 * fix attrs bug
-
- 28 3月, 2023 5 次提交
-
-
由 kangguangli 提交于
* remove api `class ParallelExecutor` * remove other references
-
由 Nyakku Shigure 提交于
-
由 Kim 提交于
-
由 张春乔 提交于
-
由 Jiabin Yang 提交于
* optimize composite rule by making scalar shape as []1 * fix shape usage for 0D * fix rules * fix 0D error * fix flatten 0D error * fix bn eval mode * fix bn test * fix flatten
-
- 27 3月, 2023 1 次提交
-
-
由 cyber-pioneer 提交于
-