未验证 提交 fad4b3b4 编写于 作者: Y Yuang Liu 提交者: GitHub

[hybrid performance] optim the grad fuse for pipeline mode by sorting the grad by dtype (#35070)

上级 b6dc16cb
develop Ligoml-patch-1 ZHUI-patch-1 add_some_yaml_config cherry_undefined_var delete_disable_iterable_dataset_unittest delete_fix_retry_ci delete_fix_undefined_var delete_revert-35069-revert-34910-spinlocks_for_allocator delete_revert-36057-dev/read_flags_in_ut dingjiaweiww-patch-1 disable_iterable_dataset_unittest dy2static enable_eager_model_test final_state_gen_python_c final_state_intermediate fix-numpy-issue fix_concat_slice fix_npu_ci fix_op_flops fix_retry_ci fix_rnn_docs fix_tensor_type fix_undefined_var incubate/infrt inplace_addto make_flag_adding_easier move_embedding_to_phi move_histogram_to_pten move_sgd_to_phi move_slice_to_pten move_temporal_shift_to_phi move_yolo_box_to_phi npu_fix_alloc preln_ernie prv-md-even-more prv-onednn-2.5 pten_tensor_refactor release/2.2 release/2.3 release/2.3-fc-ernie-fix release/2.4 revert-34406-add_copy_from_tensor revert-35069-revert-34910-spinlocks_for_allocator revert-36057-dev/read_flags_in_ut revert-36201-refine_fast_threaded_ssa_graph_executor revert-36985-add_license revert-37318-refactor_dygraph_to_eager revert-37926-eager_coreops_500 revert-37956-revert-37727-pylayer_support_tuple revert-38100-mingdong revert-38301-allocation_rearrange_pr revert-38703-numpy_bf16_package_reupload revert-38732-remove_useless_header_in_elementwise_mul_grad revert-38959-Reduce_Grad revert-39143-adjust_empty revert-39227-move_trace_op_to_pten revert-39268-dev/remove_concat_fluid_kernel revert-40170-support_partial_grad revert-41056-revert-40727-move_some_activaion_to_phi revert-41065-revert-40993-mv_ele_floordiv_pow revert-41068-revert-40790-phi_new revert-41944-smaller_inference_api_test revert-42149-do-not-reset-default-stream-for-stream-safe-cuda-allocator revert-43155-fix_ut_tempfile revert-43882-revert-41944-smaller_inference_api_test revert-45808-phi/simplify_size_op revert-46827-deform_comment support_weight_transpose zhiqiu-patch-1 v2.4.0-rc0 v2.3.2 v2.3.1 v2.3.0 v2.3.0-rc0 v2.2.2 v2.2.1 v2.2.0 v2.2.0-rc0 v2.2.0-bak0
无相关合并请求
......@@ -5216,6 +5216,9 @@ class PipelineOptimizer(object):
if len(grad_param_pairs) == 0:
return
grad_param_pairs = self._sort_grad_param_by_dtype(main_block,
grad_param_pairs)
grad_param_segments = []
merged_suffix = '@MERGED@FP16' if fp16 else '@MERGED'
dtype = paddle.float16 if fp16 else paddle.float32
......@@ -5409,6 +5412,24 @@ class PipelineOptimizer(object):
return fused_merged_gradients
def _sort_grad_param_by_dtype(self, main_block, grad_param_pairs):
# sort the grad param paris by the dtype
fp16_pairs = []
fp32_pairs = []
other_pairs = []
for pairs in grad_param_pairs:
dtype = main_block.var(pairs[0]).dtype
if dtype == paddle.float32:
fp32_pairs.append(pairs)
elif dtype == paddle.float16:
fp16_pairs.append(pairs)
else:
other_pairs.append(pairs)
sorted_pairs = fp16_pairs
sorted_pairs.extend(fp32_pairs)
sorted_pairs.extend(other_pairs)
return sorted_pairs
def _get_var_size(self, var):
dtype_to_size = {
core.VarDesc.VarType.FP16: 2,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册
反馈
建议
客服 返回
顶部