- 29 1月, 2022 1 次提交
-
-
由 Li Min 提交于
* Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op. * Remove useless code. * Remove useless code. * Optimize layer_norm fwd when cols is 1024. * Remove useless code. * Minors. * Minors. * Modifications accordding to reviews. * Minors. * Optimize layer_norm bwd kernel when cols is 1024. * Polish layer_norm_bwd_1024 kernel. * Limit ln_bwd_1024_kernel to paddle_with_cuda. * Fix double type compile error. * Add optimization of ln bwd for fused_dropout_add_ln op. * Polish codes.
-
- 26 1月, 2022 1 次提交
-
-
由 Li Min 提交于
* Optimize layer_norm fwd when cols is 1024.
-
- 17 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* support multi precision update for LAMB * hide some api * fix ci uts * fix lamb output of dygraph * remove some changes to some PR * try to fix Py3 CI compile error * fix test_imperative_optimizer, add lars ut, add layer_norm ut * fix ut, fix format * fix ut * fix windows ci
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 23 9月, 2021 1 次提交
-
-
由 Li Min 提交于
-
- 08 9月, 2021 1 次提交
-
-
由 zhangkaihuo 提交于
The bug is that access to mean and var is incorrect, and the array will be out of bounds: the shape of mean and var is [batch_size], and the range of thread idx is 0~feature_size, so mean[idx] and var[idx] is incorrect. When batch_size=1, the correct access is mean[0] and var[0], and a unit test with batch_size=1 is added.
-
- 23 8月, 2021 1 次提交
-
-
由 Li Min 提交于
Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op. Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h. Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.
-
- 24 6月, 2021 1 次提交
-
-
由 Leo Chen 提交于
-
- 22 6月, 2021 1 次提交
-
-
由 zhiboniu 提交于
-
- 15 6月, 2021 1 次提交
-
-
由 Shang Zhizhou 提交于
* 1, remove layernorm dynamic fp16; 2, let reshape out in dynamic shape * remove useless code
-
- 12 6月, 2021 1 次提交
-
-
由 zhiboniu 提交于
* Eliminate numerical differences of LayerNorm; fix LayerNorm Nan Bug while large data input * fix bug while large shape of data input
-
- 08 6月, 2021 1 次提交
-
-
由 Shang Zhizhou 提交于
* add dynamic layer_norm plugin * fix bug * fix numpy.allclose * fix format * fix code style * remove shepe in dynamic shape * code format * remove layer norm fp16 * fix format
-
- 19 3月, 2021 1 次提交
-
-
由 ronnywang 提交于
-
- 02 3月, 2021 1 次提交
-
-
由 Qi Li 提交于
-
- 15 1月, 2021 1 次提交
-
-
由 Yang Zhang 提交于
built-in `rsqrt` is shadowed
-
- 14 12月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* fix compile problem when cuda_arch < 6000 * refine code * refine code
-
- 10 12月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* layernorm fw opt * layernorm bw opt * fix typo, test=develop * remove const dim3 for windows CI compatibility * merge develop Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
-
- 07 12月, 2020 1 次提交
-
-
由 Leo Chen 提交于
-
- 02 12月, 2020 1 次提交
-
-
由 furnace 提交于
* add fp16 for layer_norm op * revert layernorm api * fix forward * fix forward * fix backward for layernorm with fp16 * fix unit test for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U> * fix with_mkldnn compile error for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
-
- 14 5月, 2020 1 次提交
-
-
由 lijianshe02 提交于
* API/OP (group_norm, layer_norm, unpool) error message enhancement test=develop
-
- 20 4月, 2020 1 次提交
-
-
由 mapingshuo 提交于
restrict block num of layer_norm_grad cuda kernel to 128, test=develop
-
- 06 1月, 2020 1 次提交
-
-
由 Pei Yang 提交于
* add gelu plugin * align trt bert with gpu * add support for fused fc with relu, * add unittest for bert trt
-
- 05 9月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 08 8月, 2018 1 次提交
-
-
由 sneaxiy 提交于
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 03 2月, 2018 2 次提交
-
-
由 chengduoZH 提交于
-
由 chengduoZH 提交于
-