- 29 4月, 2022 1 次提交
-
-
由 WangXi 提交于
[cherry-pick 2.3] Add fused_multi_transformer op to optimize transformer generation performance (#42311) * Add fused_multi_transformer op to optimize transformer generation performance (#41814) * fix fused_multi_transformer compile failed in cuda arch < sm53 (#42315) * fix ci timeout
-
- 22 4月, 2022 1 次提交
-
-
由 Allen Guo 提交于
add mixed-precission support for ipu cherry-pick from #41733
-
- 05 4月, 2022 1 次提交
-
-
由 Guanghua Yu 提交于
-
- 01 4月, 2022 1 次提交
-
-
由 danleifeng 提交于
-
- 28 3月, 2022 3 次提交
-
-
由 danleifeng 提交于
* add fused_seqpool_cvm op;test=develop
-
由 Ligoml 提交于
* update docs dtype(core.VarDesc.VarType) * fix code style, test=document_fix fix code style, test=document_fix Co-authored-by: NChen Long <1300851984@qq.com>
-
由 Guanghua Yu 提交于
* add adaround post-quant method
-
- 25 3月, 2022 1 次提交
-
-
由 Jiabin Yang 提交于
* refactor eager flags * fix flags error when we switch from eager to dygraph * fix ci problem * fix ci * fix ci * merge develop and fix code style * merge develop and fix code style * fix op test error * fix op test error * fix op test error * fix op test error * fix op test error * merge develop
-
- 24 3月, 2022 1 次提交
-
-
由 zhangbo9674 提交于
* approve amp for intermediate_dygraph * add amp_utils for intermediate_dygraph * add amp needcast check for mlu & npu * test unittest * add SetGradNode for set_stop_gradient && add checktensor for GradientHooks * refine code * refien unittest of imperative_amp for new dygraph * inplace api skip amp * add test_imperative_qat_amp for intermediate amp * refine code * refine test_amp ci strategy * refine unittest code * refine amp_utils code * refine amp getpromotetype for some special op * refine unittest code
-
- 16 3月, 2022 3 次提交
-
-
由 joanna.wozna.intel 提交于
* Modify save_quant_model.py to support differnet input and output filenames * Correct wrong order of arguments
-
由 Ming-Xu Huang 提交于
-
由 qipengh 提交于
-
- 15 3月, 2022 1 次提交
-
-
由 Guanghua Yu 提交于
* add some op for full_quantization
-
- 11 3月, 2022 1 次提交
-
-
由 Guanghua Yu 提交于
-
- 04 3月, 2022 1 次提交
-
-
由 Jiabin Yang 提交于
-
- 03 3月, 2022 2 次提交
-
-
由 Baibaifan 提交于
-
由 Jiabin Yang 提交于
* eager, test=develop * fix bug, test=develop * eager, test=develop * merge legacy to fluid * eager, test=develop * eager, test=develop * Refactor TensorAdd func by template and remove gradient_accumulation in eager * Remove needless target name * eager, test=develop * eager, test=develop * Use overload instead of template * Remove legacy code * Remove legacy code * selectedrows, test=develop * Remove DataType test * eager, test=develop * eager, test=develop * support gan, test=develop * Using Tensor directly instead of using EagerTensor * support gradient_accumulation * make test_imperative_lod_tensor_to_selected_rows longer * make test_imperative_lod_tensor_to_selected_rows longer * refine code * ptb, test=develop * Rename all EagerTensor to Tensor * Rename some EagerTensor to Tensor * rename EagerTensor to EagerVariable * eager, test=develop * eager, test=develop * eager, test=develop * eager, test=develop * add more test * eager, test=develop * Support copiable selected rows and merge develop * save load, eager, test=develop * save load, eager, test=develop * refine, test=develop * remove useless _set_value method * refine, test=develop * refine, test=develop * revert static_runner, test=develop * EagerTensor to Tensor, test=develop * refine, test=develop * refine, test=develop * clear grad, test=develop * merge, develop * merge, develop * merge, test=develop * merge, test=develop * Support quant and part of slice * support legacy static save * extend slim tests time * remove imperative on inference * remove imperative on inference * merge develop * fix typo * fix typo * split slice related code into 2 part for imperative and eager * split slice from inference * split slice from inference * fix test_tensor_register_hook Co-authored-by: NWang Huan <wanghuan29@baidu.com> Co-authored-by: NWeilong Wu <veyron_wu@163.com> Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>
-
- 01 3月, 2022 2 次提交
-
-
由 joanna.wozna.intel 提交于
* Add mobilenetv3_large performance test * Disable the BF16 test if the device does not support BF16 computations * Change test timeout
-
由 wenbin 提交于
* remove * pass * more pass
-
- 19 2月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add DistributedFusedLamb op * polish code * fix compile error * compatible with pten changement * fix rocm compile error * improve converage * update upstream/develop * fix cast_with_ptr.h * add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1 * fix clip before allreduce * add use_master_param_norm * code polish * fix bug * fix ROCM ci
-
- 14 2月, 2022 1 次提交
-
-
由 Sławomir Siwek 提交于
* mish unit tests * code format * remove unused imports * code format * remove hard-coded shape values * remove timeouts * remove timeouts v2 * restore timeouts
-
- 09 2月, 2022 1 次提交
-
-
由 Wangzheee 提交于
* rebuild matmul pass: trt and gpu_cpu * rebuild matmul pass: trt and gpu_cpu * rebuild matmul pass: trt and gpu_cpu * rebuild matmul pass: trt and gpu_cpu
-
- 07 2月, 2022 1 次提交
-
-
由 arlesniak 提交于
* amp list updated * tests updated * gray list updated * amp list updated * test updated
-
- 27 1月, 2022 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Upadate pass in quant2_int8_mkldnn_pass * Back to the previous scale_matmul order * Change place of cpu_quantize_placement_pass
-
- 21 1月, 2022 1 次提交
-
-
由 ceci3 提交于
-
- 13 1月, 2022 1 次提交
-
-
由 jakpiase 提交于
* base changes for mul reimplementation * empty commit * tmp save * full implementation of mul bf16/fp32 fwd bwd * CI fix * CI rerun * changed unity build cmake to avoid gpu issues * removed mul mkldnn from unity build * added skipping tests if not cpu_bf16 * CI fix * CI fix * CI fix
-
- 12 1月, 2022 1 次提交
-
-
由 Sylwester Fraczek 提交于
* fix conv act int8 scale * add unit test for conv+hard_swish
-
- 06 1月, 2022 1 次提交
-
-
由 minghaoBD 提交于
-
- 05 1月, 2022 2 次提交
-
-
由 Jiaqi Liu 提交于
* make post training quant API support dataloader
-
由 joanna.wozna.intel 提交于
* Quantize nearest_interp and nearest_interp_v2 * Check if avx_core supported * Add depthwise_conv2d to supported quantization list
-
- 28 12月, 2021 1 次提交
-
-
由 Li Min 提交于
* Fix scatter_op fp16 perf problem. * Add scatter into black list. * Add scatter into black list for dygraph.
-
- 22 12月, 2021 1 次提交
-
-
由 Guanghua Yu 提交于
-
- 20 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* support FP16 for more ops * add amp list tests * refine reduce_mean_grad * fix OP benchmark ci * fix fp16 reduce_mean * updat ut, but still have some problems * remove mean/reduce_mean fp16 kernel
-
- 17 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
* support multi precision update for LAMB * hide some api * fix ci uts * fix lamb output of dygraph * remove some changes to some PR * try to fix Py3 CI compile error * fix test_imperative_optimizer, add lars ut, add layer_norm ut * fix ut, fix format * fix ut * fix windows ci
-
- 14 12月, 2021 3 次提交
-
-
由 Sylwester Fraczek 提交于
* add map_matmul passes to quant2_int8_mkldnn_pass * fix fc+act fuse (activation scale) * ci fix, c++17 structured bindings not available * fix ci static check
-
由 Guanghua Yu 提交于
-
由 Sylwester Fraczek 提交于
* reshape+transpose+matmul_v2 * in_name->input_name * fix pr-ci-static-check
-
- 13 12月, 2021 1 次提交
-
-
由 xiongkun 提交于
* fix single card 8 unittests in new executor * fix * fix
-
- 10 12月, 2021 2 次提交
-
-
由 Guanghua Yu 提交于
* Support sub graph quant-post
-
由 Guanghua Yu 提交于
-