- 07 8月, 2023 1 次提交
-
-
由 umiswing 提交于
* [FlashAttn] add flash randomness control (#52902) * add flash randomness control * fix VLOG undefied * [WIP] Integration flash attention 2 (#55758) * Work for fa-2 padded fwd. Code to be cleaned. * Work for fa2 unpadded fwd. * Work for padded-bwd, dk get small diff on np.random.seed(0) * Anyway I pass paddle's utest, except return softmax without dropout. * Clean code. * Modify interface. * Clean code and add some check. * Easy compile for dev. * Fix ci. * Fix ci-build. * Add std c++17 option again. * Limit max job when compiling fa2. * Remove const_cast * Add fwd params, to be cleaned. * Clean code. * Add bwd params. * Clean code. * Add enforce. * Use v2.0.4 * Pass RNG state to fa2 capi * Fix review. * Add assert * Skip compile for sm less than 80. --------- Co-authored-by: NChitsing KUI <kuizhiqing@msn.com>
-
- 02 8月, 2023 1 次提交
-
-
由 wuhuachaocoding 提交于
-
- 21 7月, 2023 1 次提交
-
-
由 Tian 提交于
* add paddle.async_save to reduce time cost by checkpoint saving * adapt save_for_auto_inference to paddle.async_save * modify UT * modify UT * fix on cpu only version * revert commit on save_auto_inference * fix threading
-
- 18 7月, 2023 2 次提交
-
-
由 zhenhailiu 提交于
* new_frl_shard_redece * add mp guard * add test
-
由 lzy 提交于
* make top_p_sampling supports threshold * delete __nv_bfloat16
-
- 15 7月, 2023 1 次提交
-
-
由 sneaxiy 提交于
* fix new launch * fix ps uit
-
- 12 7月, 2023 1 次提交
-
-
由 sneaxiy 提交于
* fix hybrid_parallel_sharding_model.py * Update hybrid_parallel_sharding_model.py
-
- 04 7月, 2023 1 次提交
-
-
由 Tian 提交于
-
- 29 6月, 2023 1 次提交
-
-
由 pangengzheng 提交于
* support add(x_float32, bfloa16_) or add(x_float32, y_float16) * polisg
-
- 14 6月, 2023 1 次提交
-
-
由 pangengzheng 提交于
* support sharding stage1 * fix unittest * format * pass sharded sharding params_and_grads to inner_opt apply_pptimize * change sharding gradient allreduce to reduce * support save state_dict adptively and support sharding with mp * fix sharding test * test set_state_dict * add more unit test * fix global norm of mp case * polish * hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp * remove print
-
- 13 6月, 2023 1 次提交
-
-
由 Yuang Liu 提交于
-
- 08 6月, 2023 2 次提交
- 29 5月, 2023 1 次提交
-
-
由 lzy 提交于
-
- 26 5月, 2023 1 次提交
-
-
由 Leo Chen 提交于
* add log for memory stats * fix string_split in einsum * Set random seed for test_tensordot (#53004) --------- Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
-
- 23 5月, 2023 1 次提交
-
-
由 Leo Chen 提交于
* add host memory stats * add ut
-
- 22 5月, 2023 1 次提交
-
-
由 LiYuRio 提交于
-
- 19 5月, 2023 2 次提交
-
-
由 Zhang Zheng 提交于
* Add large dim test of log_softmax * fix
-
由 Danyang Zhang 提交于
* delete bf16 of cross entropy in new frl * delete bf16 of cross entropy grad
-
- 14 5月, 2023 2 次提交
-
-
由 wuhuachaocoding 提交于
-
由 ShenLiang 提交于
* add utest * rm hack code
-
- 11 5月, 2023 1 次提交
-
-
由 Yuang Liu 提交于
-
- 27 4月, 2023 1 次提交
-
-
由 ShenLiang 提交于
add utest fix utest
-
- 26 4月, 2023 2 次提交
-
-
由 sneaxiy 提交于
-
由 wuhuachaocoding 提交于
Co-authored-by: Ngongweibao <gongweibao@baidu.com>
-
- 24 4月, 2023 1 次提交
-
-
由 Chitsing KUI 提交于
* save env log for each worker * fix ut
-
- 21 4月, 2023 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 17 4月, 2023 2 次提交
-
-
由 Haohongxiang 提交于
-
由 sneaxiy 提交于
-
- 14 4月, 2023 11 次提交
-
-
由 Zhang Zheng 提交于
-
由 cyberslack_lee 提交于
-
由 cyberslack_lee 提交于
-
由 chenxujun 提交于
* Add digamma, dirichlet tests * Fix code
-
由 superwinner1 提交于
* add erf FP16 test
-
由 chenxujun 提交于
-
由 Feiyu Chan 提交于
1. modify set_value op, use Scalars to represent attr `values`, instead of a bunch of attributs of various types; (#52408) 2. add program converter and set_value op as an example, which provides the functionality to convert `paddle::framework::ProgramDesc` between old and new formats(the differences are mainly some operators with incompatible updates in the definition); 3. program version and operator version map now are always saved when serializing `paddle::framework::ProgramDesc` to identify the version; 3. provide an option `legacy_format=false` in serialization of `paddle::framework::ProgramDesc`, it decided whether to convert ProgramDesc back to a legacy format, which is compatible for paddle 2.4.2 or earlier versions to load and execute; 4. deserialization of `paddle::framework::ProgramDesc` is now automatically detecting whether the bytes it receives is in legacy format(contains any of the operators that has been incompatibly updated and have any attribute of type `Scalar`) and convert it to new format. But if you want a faithful deserialization without the automatic conversion, you can use protobuf's deserialization instead. Though it is not recommended, it can be used for the purpose of testing.
-
由 zhupengyang 提交于
-
由 duanyanhui 提交于
-
由 Kim Yann 提交于
-
由 ronnywang 提交于
-