- 12 4月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 29 8月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
-
- 23 8月, 2022 2 次提交
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
* use yapf to format all python file * yapf exclude two unittests file for they rely on writing and reading file, and format will break them * disable diff_py_file because too many diff files cause command following failed
-
- 28 4月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add gradient merge for DistributedFusedLamb * use master acc gradient * fix CI ut * polish * remove math_function_impl.h change * fix test_update_loss_scaling_op.py * try to fix XPU/NPU CI * add gm ut
-
- 26 4月, 2022 1 次提交
-
-
由 WangXi 提交于
-
- 15 4月, 2022 1 次提交
-
-
由 Allen Guo 提交于
* add mixed-precission support for ipu * restore cast_model_to_fp16 api * update UTs
-
- 16 3月, 2022 1 次提交
-
-
由 qipengh 提交于
-
- 19 2月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add DistributedFusedLamb op * polish code * fix compile error * compatible with pten changement * fix rocm compile error * improve converage * update upstream/develop * fix cast_with_ptr.h * add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1 * fix clip before allreduce * add use_master_param_norm * code polish * fix bug * fix ROCM ci
-
- 07 2月, 2022 1 次提交
-
-
由 arlesniak 提交于
* amp list updated * tests updated * gray list updated * amp list updated * test updated
-
- 13 1月, 2022 1 次提交
-
-
由 jakpiase 提交于
* base changes for mul reimplementation * empty commit * tmp save * full implementation of mul bf16/fp32 fwd bwd * CI fix * CI rerun * changed unity build cmake to avoid gpu issues * removed mul mkldnn from unity build * added skipping tests if not cpu_bf16 * CI fix * CI fix * CI fix
-
- 28 12月, 2021 1 次提交
-
-
由 Li Min 提交于
* Fix scatter_op fp16 perf problem. * Add scatter into black list. * Add scatter into black list for dygraph.
-
- 20 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* support FP16 for more ops * add amp list tests * refine reduce_mean_grad * fix OP benchmark ci * fix fp16 reduce_mean * updat ut, but still have some problems * remove mean/reduce_mean fp16 kernel
-
- 17 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* support multi precision update for LAMB * hide some api * fix ci uts * fix lamb output of dygraph * remove some changes to some PR * try to fix Py3 CI compile error * fix test_imperative_optimizer, add lars ut, add layer_norm ut * fix ut, fix format * fix ut * fix windows ci
-
- 27 10月, 2021 1 次提交
-
-
由 zhangkaihuo 提交于
本PR是fused_transformer的layer层代码,包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。
-
- 14 10月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 21 9月, 2021 1 次提交
-
-
由 Adam Osewski 提交于
* Create stateful OneDNNAXPYHandler object. This makes it possible to call it multiple times without recreating the oneDNN primitives every time. * Prepare SGDOpKernel to reuse its implementation from OneDNN kernel. * OneDNN SGD kernel. * Update call to use new OneDNNAXPYHandler object api. * Setup seed in proper place. * Enable OneDNN kernel only for single case. * For dense param and sparse grad. * Small refactor. * Enable oneDNN by op attr or by cmd line flag. * Use int64_t type for number of elements. * Support dense param and grad from OneDNN kernel. * Enable SGD OneDNN kernel when use MP BF16 optimizer. * Force non-copyable/movable OneDNNAXPYHandler. * Reuse OneDNNAXPYHandler for spare tensors in SUM op. * Fix SFINAE rules. * Remove recording event inside AXPY. * Get rid of internal primitive caching. * Stop use PP cache mechanims to store mem and primitive obj. * Handler obj store and reuse needed desc & prim * Do not derive from MKLDNNHandlerT
-
- 10 9月, 2021 1 次提交
-
-
由 ShenLiang 提交于
-
- 24 8月, 2021 1 次提交
-
-
由 Adam Osewski 提交于
* Small corrections. * Fix lr for bf16. * Revert some changes.
-
- 17 8月, 2021 1 次提交
-
-
由 Roc 提交于
-
- 05 8月, 2021 1 次提交
-
-
由 WangXi 提交于
-
- 22 7月, 2021 2 次提交
- 19 7月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* pass found_inf to adam * add unittest * fix bug * refine unittest * change unit test's directory * disable unittest on cpu
-
- 16 7月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* add clear_float_status op * refine infershape * fix typo * refine check_finite_and_scale * refine code
-
- 05 7月, 2021 1 次提交
-
-
由 jiangcheng 提交于
* reduce sum op default fp32, add into amp black list * reduce_sum default fp32 can avoid return inf when the sum value large than 65504
-
- 01 7月, 2021 1 次提交
-
-
由 taixiurong 提交于
-
- 29 6月, 2021 1 次提交
-
-
由 taixiurong 提交于
-
- 21 6月, 2021 1 次提交
-
-
由 WangXi 提交于
-
- 16 6月, 2021 1 次提交
-
-
由 zhiboniu 提交于
-
- 10 6月, 2021 1 次提交
-
-
由 Baibaifan 提交于
-
- 26 5月, 2021 1 次提交
-
-
由 JZ-LIANG 提交于
-
- 07 5月, 2021 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Add casting initializers for bf16 training * Changes after review * Correct test and add comment
-
- 28 4月, 2021 1 次提交
-
-
由 arlesniak 提交于
-
- 23 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* refactor_check_finite_and_scale_npu_kernel * fix compile * add alloc_float_status op * add alloc_float_status op * add FloatStatus for check_finite_and_unscale * refine code * remove unneccessary logic * refine for fleet
-
- 22 4月, 2021 1 次提交
-
-
由 Yuang Liu 提交于
-
- 21 4月, 2021 2 次提交
- 15 4月, 2021 1 次提交
-
-
由 fangshuixun007 提交于
fix test sync_with_cpp (#32212)
-