1. 22 12月, 2022 1 次提交
  2. 21 12月, 2022 1 次提交
    • L
      Merge gpugraph to develop (#48507) · 1acddc34
      lxsbupt 提交于
      * merge gpugraph to develop, fix code style
      
      * update for untrainable params for stage3. (#48577)
      
      * merge gpugraph to develop, trigger ci
      
      * [CodeStyle][isort][Dy2St] sort imports in test_error (#48746)
      
      * [CodeStyle][isort][Dy2St] sort imports in test_error
      
      * update lineno
      
      * Clear extra input (Bias, ResidualData) in OpMaker of conv2d (#47579)
      
      * delete Bias and ResidualData in OpMaker of conv2d
      
      * delete extra input of conv3d
      
      * refactor pass of conv_bias_fusion
      
      * fix mkldnn dependency
      
      * fix mkldnn compile
      
      * fix test_conv_bias_mkldnn_fuse_pass
      
      * police some code
      
      * remove useless log
      
      * fix analyzer_vit_ocr_tester
      
      * fix conv_activation_mkldnn_fuse_pass
      
      * fix test_analyzer_ocr
      
      * add fused_conv_sig
      
      * fix performence regression
      
      * fix performance regression
      
      * make bilinear interpolate stable. (#48644)
      
      * make bilinear interpolate stable.
      
      * fix code
      
      * clear tmp var in ptq (#48660)
      
      * merge gpugraph to develop, fix py-api comment
      
      * merge gpugraph to develop, fix mac-python3
      
      * merge gpugraph to develop, fix mac-python3
      
      * [Dy2St] replace deprecated `load_module` with `exec_module` (#48679)
      
      * merge gpugraph to develop, fix mac-python3
      
      * modify d2d copy to xpu::copy in xpu kernel, test=kunlun (#48710)
      
      * rm _test_eager_guard (#48767)
      
      * delete sampling_id api (#48543)
      
      * [NPU] add FLAGS_npu_storage_format env to enable npu storage format, test=develop (#48774)
      
      * optimize nchw<->nhwc kernel in fp16 model (#48692)
      
      * fix: oss just support sm>=75 (#48731)
      
      * update kl1 op list and optimize matmul unitest for kunlun (#48775)
      
      *test=kunlun
      
      * Fix accuracy fp16 kernel return fp32 tensor error (#48803)
      
      * [phi::DenseTensor] Replace Tensor with phi::DenseTensor (#48682)
      
      * [Zero-Dim] Support 0D for paddle.diagflat (#48735)
      
      * [Zero-Dim] Support 0D for paddle.diagflat
      
      * 【fluid api clear】Move batch norm1 (#47965)
      
      * modify slice infershape
      
      * code style
      
      * modify slice_unittest
      
      * temp fix
      
      * batch_norm api move
      
      * code_style
      
      * codestyle
      
      * ci_static
      
      * add __init__
      
      * reset other change
      
      * revert .cc
      
      * add import batchnorm
      
      * conflict and revert
      
      * fix bug
      
      * fix third conflict one day
      
      * fix conflict
      
      * fix conflict bug
      
      * fix conflict bug
      
      * modify api
      
      * code_style
      
      * modify doc
      
      * add lost doc stable
      
      * fix conflict bug
      
      * ci lack of gpu
      
      * [remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceConv RowConv (#48654)
      
      * [remove fluid] PRelu BilinearTensorProduct
      
      * [remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceConv RowConv
      
      * [remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceConv RowConv
      
      * [remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceConv RowConv
      
      * [remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceConv RowConv
      
      * [remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceConv RowConv
      
      * [remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceConv RowConv
      
      * [remove fluid] PRelu BilinearTensorProduct Conv2DTranspose SequenceConv RowConv
      
      * merge gpugraph to develop, rollback graph_send_recv
      
      * fix ci (#48730)
      
      * Remove reduntant numpy output in Example code (1/3), test=document_fix (#48678)
      
      * 修改了英文API文档 (#48219)
      
      * 修改paddle.nn.dynamic_decode,paddle.nn.functional.diag_embed 示例
      
      * mma qk tensor_core (#48087)
      
      * use mma for QK dot computing in fused_multi_transformer.
      * Update fused_multi_transformer_op.cu.h
      
      * remove lrn which is not used in paddle 2.0 (#47945)
      
      * replace scatter_nd and scatter_nd_add with paddle.scatter_nd and (#47960)
      
      paddle.scatter_nd_add
      
      * [PHI] Migrate mul_grad kernel (#48061)
      
      * cleanup unused code
      
      * unify is_int8 is_bfloat16
      
      * Simplify matmul_v2 FWD kernel
      
      * remove RunKernel methods
      
      * remove import namespace
      
      * remove headers
      
      * clean fluid/phi cross imports
      
      * remove fluid axpy_handler
      
      * delete fluid methods
      
      * activations
      
      * OneDNNMemDesc
      
      * MKLDNNFormatForSize
      
      * MatchShapeToLayout
      
      * MKLDNNMemoryFormat
      
      * MKLDNNFormat
      
      * ReorderMKLDNNHandler
      
      * to_void_cast
      
      * review suggestions
      
      * interpolate
      
      * remove fluid depedency
      
      * init
      
      * ExecuteMatMulV2
      
      * rm fluid kernel
      
      * matmul_grad
      
      * remove mutable_data
      
      * mul_grad
      
      * delete unnecessary shape and slice op (#48112)
      
      * 修改英文文档。
      
      * 修改segment operator等英文文档。
      
      * 重新修改了paddle.einsum,paddle.unique_consecutive,
      paddle.disable_signal_handler的英文文档格式。
      
      * 重新修改了英文文档格式。;test=docs_preview
      
      * Update extension.py
      
      * 重新修改了英文文档格式。;test=docs_preview
      
      * 重新修改了英文文档格式。
      待验收:
      - paddle.linalg.svd
      - paddle.nn.functional.diag_embed
      - paddle.set_grad_enabled
      - paddle.disable_signal_handler
      - paddle.cumprod
      - paddle.devaice.cuda.stream_guard
      
      待修改:
      - paddle.nn.dynamic_decode
      - paddle.einsum
      - paddle.unique_consecutive
      - paddle.linalg.svd
      - paddle.uncubate.segment_min
      - paddle.uncubate.segment_max
      - paddle.uncubate.segment_sum
      - paddle.uncubate.segment_mean
      
      ;test=docs_preview
      
      * 重新修改了英文文档格式。
      待验收:
      - paddle.linalg.svd
      - paddle.nn.functional.diag_embed
      - paddle.set_grad_enabled
      - paddle.disable_signal_handler
      - paddle.cumprod
      - paddle.devaice.cuda.stream_guard
      - paddle.nn.dynamic_decode
      - paddle.unique_consecutive
      - paddle.linalg.svd
      
      待修改:
      - paddle.einsum
      - paddle.incubate.segment_min
      - paddle.incubate.segment_max
      - paddle.incubate.segment_sum
      - paddle.incubate.segment_mean
      
      ;test=docs_preview
      
      * 重新修改了英文文档格式。
      待验收:
      - paddle.linalg.svd
      - paddle.nn.functional.diag_embed
      - paddle.set_grad_enabled
      - paddle.disable_signal_handler
      - paddle.cumprod
      - paddle.devaice.cuda.stream_guard
      - paddle.nn.dynamic_decode
      - paddle.unique_consecutive
      - paddle.linalg.svd
      
      待修改:
      - paddle.einsum
      - paddle.incubate.segment_min
      - paddle.incubate.segment_max
      - paddle.incubate.segment_sum
      - paddle.incubate.segment_mean
      
      ;test=docs_preview
      
      * update
      
      * test=docs_preview
      
      * update formula; test=docs_preview
      
      * update formula; test=docs_preview
      
      * remove this operator; test=docs_preview
      
      * add hyper link; test=docs_preview
      
      * add default value; test=docs_preview
      
      * update format; test=docs_preview
      
      * empty commit; test=docs_preview
      
      * fix codestyle issues; test=docs_preview
      
      * empty commit; test=docs_preview
      Co-authored-by: Nlzy <569782149@qq.com>
      Co-authored-by: NVvsmile <450864116@qq.com>
      Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
      Co-authored-by: NRichardWooSJTU <37864677+RichardWooSJTU@users.noreply.github.com>
      Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
      Co-authored-by: NNyakku Shigure <sigure.qaq@gmail.com>
      
      * [PHI] Migrate squeeze and squeeze_grad kernels (#48634)
      
      * squeeze kernel
      
      * squeze fwd
      
      * whitespace
      
      * 修复paddle.nn.functinal包和paddle.nn包下API文档 (#48581)
      
      * assign cve number to pdsa, test=document_fix (#48846)
      
      * [fluid remove]: remove paddle.fluid.layers.yolo_box and paddle.fluid.layers.yolov3_loss (#48722)
      
      * remove paddle.fluid.layers.nn.temporal_shift
      
      * code check
      
      * rm unittest
      
      * remove fluid.yolo_box
      
      * remove fluid.yolov3_loss
      
      * change the comments of yolov3_loss to yolo_loss
      
      * merge gpugraph to develop, fix windows compile
      
      * merge gpugraph to develop, fix windows compile
      
      * merge gpugraph to develop, fix windows compile
      
      * Try add eval() to speedup the eigen performance. (#48855)
      
      * [Fluid Clean]move inplace_apis_indygraph_only from paddle.flud.dygraph.inplace_utils to paddle.utils (#48744)
      
      * move inplace_apis_indygraph_only from paddle.flud.dygraph.inplace_utils to paddle.utils
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify static-check ci error
      
      * fix conflict
      
      * modify failed tests
      
      * fix conflict
      
      * fix conflict
      
      * fix pool2d examples
      
      * modify conflict
      
      * fix failed tests
      
      * fix conflict
      
      * fix failed tests
      
      * modfiy problem of deleting pool2d
      
      * merge gpugraph to develop, fix windows compile
      
      * clean fluid task: transfer gaussian random api (#48529)
      
      * Delete duplicate quant nodes in QAT (#48751)
      
      * rm autograd func dynamic eager tests (#48788)
      
      * Setuptools optimization (#48770)
      
      * optimize setup.py
      
      * modify setup.py
      
      * modify setup.py
      
      * modify setup.py
      
      * modify setup.py after zhangbo reviewed
      
      * [CodeStyle][F811] fix some test cases shadowed by the same name (#48745)
      
      * [CodeStyle][F811] fix some unittests
      
      * fix setup.py
      
      * remove ignore from flake8 config
      
      * remove repeat TestAbsDoubleGradCheck
      
      * fix rrelu test
      
      * fix fft ut
      
      * add noqa in fluid.lstm ut
      
      * add rtol and atol in test_matmul_v2_op
      
      * update rtol
      
      * empty commit
      
      * empty commit
      
      * revert changes in matmul ut and add noqa
      
      * rename test case name
      
      * set free_when_no_cache_hit default value to true (#48815)
      
      * [Clean Fluid] Rm and mv some fluid dygrah apis (#48576)
      
      Remove fluid dygrah apis
      GroupNorm
      TreeConv
      Move fluid dygraph apis
      Flatten
      SpectralNorm
      
      * [Inference] inference add cinn interface (#48741)
      
      * Clean and migrate fluid APIs of paddle.fluid.layers.control_flow (#48233)
      
      * Merge branch 'reduce_sum' of https://github.com/GhostScreaming/Paddle into mine_fluid_clean_common.
      
      * Fix some bugs.
      
      * Clean APIs in python/paddle/fluid/layers/control_flow.py
      
      * Polish code style.
      
      * Change API.
      
      * Fix some bugs.
      
      * Fix some bugs.
      
      * remove gpu_info.h from phi dependencies (#48811)
      
      * [Paddle Inference] Add add onehot trt converter (#48655)
      
      * add onehot trt converter
      
      * add unitest
      
      * fix bug
      
      * opt code
      
      * fix bug
      
      * fix depth_tensor
      
      * fix unitest
      
      * fix bug
      
      * fix unitest
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * fix bug
      
      * [PHI decoupling] remove  bbox_util.h from phi dependencies (#48761)
      
      * remove bbox_util.h from phi
      
      * add file bbox_util.h
      
      * reframe bbox_util.h
      
      * Optimize Paddle diagonal (#47904)
      
      * [API Clean]Clean __all__ to avoid exposing usless API (#48713)
      
      * [API Clean]Clean __all__ to avoid exposing usless API
      
      * fix import
      
      * fix typo
      
      * remove tracedLayer unittest
      
      * Clean fluid APIs in distributed and fleet files (#48851)
      
      * Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
      is wrong.
      
      * Remove climits.
      
      * Clean fluid API in paddle/distributed and paddle/fleetx folders.
      Include following files:
      python/paddle/distributed/__init__.py
      python/paddle/distributed/collective.py
      python/paddle/distributed/fleet/utils/fs.py
      python/paddle/distributed/fleet/utils/hybrid_parallel_inference.py
      python/paddle/distributed/fleet/utils/hybrid_parallel_util.py
      python/paddle/distributed/fleet/utils/internal_storage.py
      python/paddle/distributed/launch/context/device.py
      python/paddle/distributed/parallel.py
      python/paddle/distributed/parallel_with_gloo.py
      python/paddle/distributed/spawn.py
      python/paddle/framework/__init__.py
      To be mentioned, 'paddle.fluid.dygraph.parallel.ParallelEnv'
       and 'fluid.framework.core' keeps unchanged in those files.
      ParallelEnv is used by paddle.fluid.dygraph.parallel.DataParallel.
      However, APIs in paddle.fluid.dygraph.parallel can't be
      migrated to paddle.distributed, as there exists cyclic import
      dependencies in modules like paddle.static, paddle.tensor. And
      'fluid.framework.core' will be changed to import framework.core
      after fluid.core is transmitted.
      
      * Change TODO authors.
      
      * rm kunlun xpu2_op_list (#48826)
      
      *test=kunlun
      
      * remove detection_output, iou_similarity and bipartite_match (#48773)
      
      * Set WaiterType of kGpuSync to kCPU (#48758)
      
      * [Migrate Fluid] Migrate Decoder, BeamSearchDecoder (#48754)
      
      * [Inference] Enable infer shape cache. (#48312)
      
      * [Fluid Clean] remove unfold, deformable_roi_pooling, shard_index, hard_swish, mish, uniform_random, unbind (#48451)
      
      * fix-gpups setup.py (#48888)
      
      * fix-gpups
      
      * test=document_fix
      
      * [PHI decoupling] move cuda_graph from fluid to phi (#48686)
      
      * move cuda_graph from fluid to phi
      
      * move device_memory_aligment from fluid to phi
      
      * Revert "move device_memory_aligment from fluid to phi"
      
      This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f.
      
      * update xpu cmake
      
      * fix english docs typo errors (#48599)
      
      * fix english docs typo errors
      
      the errors in docs as same as chinese pr 5468
      
      * update docs; test=docs_preview
      Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
      
      * [XPU] add load op into oplist. (#48860)
      
      * [XPU] add load op into oplist.
      
      * remove test_sampling_id_op_xpu.py
      
      * 【fluid clean】remove fluid.dygraph.rnn.lstmcell and fluid.dygraph.rnn.grucell (#48719)
      
      * refine bsd doc (#48882)
      
      * [Paddle Inference] General optimization for no_varlen embedding layernorm (#48580)
      
      * general optimization no_varlen embedding layernorm
      
      * fix tmp directories (#48863)
      
      * rm dygraph_to_static eager guard tests part2 minst2ptb_lm (#48793)
      
      * rm dygraph_to_static eager guard tests part2 minst2ptb_lm
      
      * merge gpugraph to develop, fix the_one_ps.py for gpups
      
      * [remove fluid] under unittesets of linear api (#48564)
      
      * [remove fluid] under unittesets of linear api
      
      * [remove fluid] under unittesets of linear api
      
      * [remove fluid] under unittesets of linear api
      
      * [remove fluid] under unittesets of linear api
      
      * [remove fluid] under unittesets of linear api
      
      * [remove fluid] under unittesets of linear api
      
      * [remove fluid] fluid dygrapn linear api
      
      * [remove fluid] fluid dygrapn linear api
      
      * [remove fluid] fluid dygrapn linear api
      
      * [remove fluid.layers.cross_entropy] remove unit tests (part 1) (#48726)
      
      * replace layers.cross_entropy with paddle.entropy
      
      * fix args
      
      * fix codestyle
      
      * proper fix (#48360)
      
      Reenabled ext_reorder recording for TransDataLayoutFromOneDNN
      
      * [remove fluid.layers.matmul] remove fluid.layers.matmul in example code (#48818)
      
      * replace fluid.layers.matmul in fluid/io.py
      
      * fix doc error in fluid.layers.nn.sampling_id
      
      * remove test_auto_search_dist_matmul_op.py (#48794)
      
      * delete mean api (#48764)
      
      * clean test_op_name_conflict (#48704)
      
      * opt kernel_selection error msg (#48864)
      
      * rewrite delete_weight_dequant_linear_op_encoder/decoder pass (#48650)
      
      * rewrite delete_weight_deqquant_linear_op_encoder/decoder pass
      
      * [XPU] add set_value and set_value_grad (#48845)
      
      * merge gpugraph to develop, fix gpups ut
      
      * Add QuantizedMatmul in QAT (#47997)
      
      * fix 'BlasAXPBY unimplemented' error with custom device (#48762)
      
      * fix 'BlasAXPBY unimplemented' error with custom device
      
      * fix utils CmakeLists bug
      
      * first commit (#38143)
      
      * [Auto Parallel] Add cluster partition and dm to pm (#48320)
      
      * add cluster_partition and device_meshes to process_meshes funcs
      
      * add unitest
      
      * fix paddle2cinn float16 type support bug (#48249)
      
      * remove pool2d from fluid (#48512)
      
      * remove pool2d
      
      * [fluid remove]: remove paddle.fluid.layers.detection_map, paddle.fluid.metrics.DetectionMAP and paddle.fluid.evaluator.DetectionMAP (#48674)
      
      * remove paddle.fluid.layers.nn.temporal_shift
      
      * code check
      
      * rm unittest
      
      * remove paddle.fluid.layers.detection_map and the class:DetectionMAP
      
      * [PHI decoupling] move "flags.h" from fluid to phi (#48696)
      
      * add set_lr & get_lr for stage2 optimizer. (#48857)
      
      * move share_buffer kernel to phi (#48858)
      
      * move share_buffer kernel to phi
      
      * fix ut
      
      * add source file
      
      * fix window links
      
      * [Kernel Selection] Simplify kernel selection process in phi, reduce search number to half (#47771)
      
      * simplify SelectKernelOrThrowError function in phi
      
      * opt kernel_selection process
      
      * polish code, fix backend error
      
      * Support static graph code-gen for scalar and int_array (#48792)
      
      * add suppport_tensor for code_gen to static graph
      
      * support code-gen for int_array
      
      * polish code
      
      * fix bug of data_type
      
      * clean unittest test_model_cast_to_bf16 (#48705)
      
      * rm dy2static eager tests part1 bert2loop (#48790)
      
      * rm dygraph_to_static eager guard tests part3 reinforce2yolo (#48795)
      
      * rm distribution uniform eager guard test (#48768)
      
      * rm distribution uniform eager guard test
      
      * review
      
      * replace cross_entropy in python/paddle/fluid/tests/unittests/test_[a-n]*.py except test_dist_transpiler.py (#48913)
      
      * replace cross_entropy except in python/paddle/fluid/tests/unittests/*.py && unittests/*/*.py (#48922)
      
      * [Paddle Inference]add cutlass act set in conv_elementwise_add_act_fuse_pass (#48838)
      
      * add cutlass act set in conv_elementwise_add_act_fuse_pass
      
      * move fluid.layers.create_global_var to static.create_global_var (#48777)
      
      * Modified the Kernel policy. When the compute is NHWC (#48563)
      
      * temporally disable set_value (#48942)
      
      * xpu support inplace flatten (#48909)
      
      This is a PR to catch up with latest xpu white list strategy
      (https://github.com/PaddlePaddle/Paddle/pull/48606)
      , since original list only include 'fluid' fashion names, but new list
      must include 'phi' fashion as well.
      Refer to paddle/phi/core/kernel_factory.cc for more details.
      
      * fix:vit_attention ut (#48884)
      
      * mv fused_bias_dropout_residual_ln to fluid manual dir (#48824)
      
      * mv fused_bias_dropout_residual_ln to fluid manual dir
      
      * rm useless comments
      
      * bug fix (#48829)
      
      * move ops_extra_info_gen.py from phi to fluid (#48926)
      
      * fix scale type in alpha and beta (#48887)
      
      * [inference][trt] upgrade prelu op  (#48528)
      
      * add prelu
      
      * 对多个文档按照要求修改 对应中文的#5453 (#48886)
      
      * fix doc
      
      * test=document_fix
      Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
      
      * replace cross_entropy in python/paddle/fluid/tests/unittests/*.py except test*.py (#48919)
      
      * [remove fluid] Remove fluid APIs (#48641)
      
      * [CodeStyle] fix renamed files not being monitored by Codestyle Check (#48892)
      
      * [fluid remove]: remove paddle.fluid.layers.box_coder and paddle.fluid.layers.polygon_box_transform (#48896)
      
      * remove fluid_box_coder and polygon_box_transform
      
      * code check
      
      * [Custom XPU Support] Custom extension support xpu backend (#48733)
      
      * support custom_xpu
      
      * update cmake to test xpu
      
      * support custom_xpu, verify mechanism
      
      * fix test_custom_relu_op_xpu_setup.py, test=kunlun
      
      * fix FLAGS_init_allocated_mem
      
      * cancel TIMEOUT property
      
      * reset FLAGS_init_allocated_mem property
      
      * rm mlu ops eager guard tests (#48769)
      
      * rm npu instance_np op for eager guard tests (#48785)
      
      * remove xpu eager guard tests (#48786)
      
      * [remove fluid.layers.cross_entropy] remove unit tests (part 3)  (#48918)
      
      * replace cross_entropy in python/paddle/fluid/tests/unittests/test_[o-z]*.py plus test_dist_transpiler.py
      
      * fix test_prune
      
      * [Inference] optimize some code and fix some bug (#48780)
      
      * clean ir_pass_manager and fix map_depthwise_conv_to_conv_pass
      
      * fix unitest timeout
      
      * [PHI] Migrate reshape kernel (#48749)
      
      * reshape
      
      * typo
      
      * remove header
      
      * support py3 in setup.py (#48905)
      
      * support py3 in setup.py
      
      * support setup.py bdist_wheel in py3
      
      * support py3 in setup.py
      
      * modify run_setup
      
      * [Paddle-TRT] add cast between  int64 tensor  and Paddle-TRT (#45547)
      
      * Add cast between int64 tensor and Paddle-TRT
      * Add Unit testing.
      
      * fix sharding_stage1 amp O2 decorate bug (#48960)
      
      * [remove fluid] fluid dygraph Embedding (#48806)
      
      * [remove fluid] fluid dygraph Embedding
      
      * [remove fluid] fluid dygraph Embedding
      
      * [remove fluid] fluid dygraph Embedding
      
      * [remove fluid] fluid dygraph Embedding
      
      * [remove fluid] fluid dygraph Embedding
      
      * [remove fluid] fluid dygraph Embedding
      
      * fix for mkldnn (#48852)
      
      * H2D data transfer optimization with usage of structure type for stack kernel (#48899)
      
      * first commit.
      
      * refine performance with fast_divmod
      
      * refine performance with fast_divmod
      
      * rm accuracy and auc in extra __all__ (#48986)
      
      * Add dynamic checks for collective communication on NCCL  (#48915)
      
      * chore: unify `SingleTensor`
      
      * feat: dynamic check
      
      * support sharding in fp16 on xpu,  (#48897)
      
      * support sharding in fp16 on xpu, change reduce_max to reduce_sum for found nan or inf
      
      * update
      
      * Support cross-step stream synchronization for standalone executor (#48809)
      
      * Add UT
      
      * Support cross-step stream synchronization for standalone executor
      
      * Fix typos
      
      * Fix typos
      
      * Update UTs
      
      * Generate static graph code of some ops by yaml (#48771)
      
      * generate static graph code of some ops by yaml, test = develop
      
      * fix 'take_along_axis' yaml style
      
      * reset scatter/scatter_nd_add
      
      * delete the comments of put_along_axis
      
      * fix a bug in GetTrtWeight (#48993)
      
      * add static_ops.yaml for static op (#48991)
      
      * [PHI decoupling] move norm_utils.cu.h from fluid to phi and remove norm_utils.h in fluid (#48930)
      
      * move norm_utils.cu.h from fluid to phi
      
      * remove norm_utils.h in fluid
      
      * fix bugs and replace mutable_data with Alloc
      
      * replace mutable_data with Alloc
      
      * forbid conv op whose weight is not a persistable weight into Paddle-TRT (#48763)
      
      * fix: Move the pass location to the appropriate location (#48951)
      
      * Enhance check_nan_inf implementation for CPU. (#48591)
      
      * Enable to print device info.
      
      * Enhance the nan and inf checking for cpu.
      
      * Implement a common print function.
      
      * Unify the check of complex numbers.
      
      * Rewrite the omp method.
      
      * Count and print the number of nan and inf.
      
      * Change the print content.
      
      * Add unittest.
      
      * [PHI] OneDNN version of Copy (#48539)
      
      * OneDNN version of Copy, tranpose kernels adjusted
      
      * style fixes in tranpose_grad
      
      * redundant headers deleted
      
      * fix: there are some bugs with trt 8.0 (#48921)
      
      * fix: there are some bugs with trt 8.0
      
      * fix:windows CI trt is too old
      
      * Optimization of Eigh op with ssyevj_batched runtime api (#48560)
      
      * fix codestyle
      
      * add double complex<float> complex<double> dtype support for syevj_batched
      
      * fix use_syevj flag for precision loss when input dtype of syevj_batch is complex128 in some case
      
      * optimize eigh in different case
      
      * fix missing ; bug
      
      * fix use_syevj bug
      
      * fix use_cusolver_syevj_batched flag
      
      * replace cross_entropy in python/paddle/fluid/tests/unittests/*/*.py except unittests/*.py (#48920)
      
      * [PHI decoupling] replace dependency of inclusive_scan.h from phi (#48980)
      
      * replace dependency of inclusive_scan.h from phi
      
      * format code
      
      * fluid API magration : Assert, increment, cond (#48885)
      
      * [Clean fluid] Add inner function _elementwise_op_with_axis (#48748)
      
      * add inner function _elementwise_op_with_axis
      
      * fix transformer_model
      
      * polish API code
      
      * remove elementwise_div/mul api
      
      * delete API in __all__
      
      * delete elementwise_mul completely
      
      * polish elementwise_mul call
      
      * polish internal api
      
      * resolve conflict, fix rnn.py
      
      * use non-inplace call
      
      * delete elementwise_mul api test
      
      * delete elementwise_mul api test
      
      * clean elementwise_add/sub
      
      * restore _elementwise_op_in_dygraph in nn.py
      
      * test_convert_to_mixed_precision.py use tempfile for temporary models/params (#48819)
      
      * Tighten the Interception strategy (#48947)
      
      * test approve ,test=document_fix
      
      * test approve ,test=document_fix
      
      * test approve ,test=document_fix
      
      * [CodeStyle][isort][F401] fix some regression issues (#48936)
      
      * [CodeStyle][isort][F401] fix some regression issues
      
      * add import paddle to fix eval call
      
      * rm multinode eager guard tests (#48766)
      
      * rm multinode eager guard tests
      
      * remove unwanted tests
      
      * reset process_mpi test
      
      * rm unittests eager guard tests part5 dataloader2dygraph_mnist (#48816)
      
      * [PHI]Add new Tensor type and migrate save_combine kernel (#47856)
      
      * add new tensor
      
      * fix windows compile bugs
      
      * fix ci bugs
      
      * fix ci bugs
      
      * fix ci bugs
      
      * perfect according comment
      
      * fix ci compile bugs
      
      * add raw tensor
      
      * fix ci bugs
      
      * modify code by comment
      
      * delete String
      
      * [Fluid Clean]move BatchNorm from flud.dygraph.nn to paddle.nn.layer.norm (#48734)
      
      * move BatchNorm from flud.dygraph.nn to paddle.nn.layer.norm
      
      * modfiy conflict
      
      * modify pre-commit error
      
      * modify static-check ci error
      
      * fix failed tests
      
      * modify conflict
      
      * modify conflict
      
      * delete import modelu GRUUnit
      
      * fix falied test
      
      * fix failed testes
      
      * fix failed tests
      
      * fix failed tests
      
      * fix failed test
      
      * fix error in test_fused_resenet_basic_block_op_xpu.py
      
      * modify after xiaoguang reviewed
      
      * [Setup] Ignore @PADDLE_BINARY_DIR@ files (#49002)
      
      * [Setup] Ignore @PADDLE_BINARY_DIR@ files
      
      * test=document_fix
      
      * reshape onednn test reimplemented (#48850)
      
      * - UT reshape onednn
      
      - Fix
      
      test
      
      test2
      
      - test4
      
      - test5
      
      - test6
      
      test7
      
      - test8
      
      - Ut reinvented
      
      - cosmetic
      
      * - fix
      
      * - fix
      
      * - fix
      
      * - fix
      
      * - Fix
      
      * - fix
      
      * - fix
      
      * - fix
      
      * - Fix
      
      * lint
      
      * update fused_multi_transformer_encoder_pass support GPT new matmul API (#48953)
      
      * fit paddle.matmul in fleetx.gpt
      
      * Revert "set free_when_no_cache_hit default value to true (#48815)" (#48968)
      
      This reverts commit 592ed40b.
      
      * [Paddle Inference]fix some transformer unitest (#48929)
      
      * fix some transformer unitest
      
      * Enable Generic-Plugin support FP16 (#48807)
      
      * support conv1d quant & skip calibrate zero-size tensor (#48912)
      
      * enable custom device save model on device memory && fix conflict (#48221)
      
      * [api move] cvm (#48989)
      
      * [api move] cvm
      
      * [api move] cvm
      
      * [api move] cvm
      
      * [api move] cvm
      
      * [api move] cvm
      
      * [api move] cvm
      
      * [api move] cvm
      
      * [api move] ci test
      
      * [api move] ci test
      
      * [api move] ci test
      
      * Bugfix: xpu now only support single node multi-card, bkcl_comm_num should always set to 1 (#48961)
      
      * rm unittests eager guard tests part23 where2zeros (#48895)
      
      * rm unittests eager guard tests part17 number2pool1d (#48840)
      
      * [NPU] fix FLAGS_npu_storage_format flag in python, test=develop (#48976)
      
      * remove fleet eager guard tests (#48765)
      
      * rm unittests eager guard tests part6 eager_run2expand_v2 (#48817)
      
      * rm unittests eager guard tests part12 imperative_optimizer2resnet (#48833)
      
      * [fluid clean] remove 4 fluid.layers api and imigrate 2 fluid.layer api (#48972)
      
      * fluid clean layer
      
      * docs
      
      * remove reset reference in unittest for `fluid.layers.cross_entropy` (#49012)
      
      * replace cross_entropy in test*.py except python/paddle/fluid/tests/unittests/*.py (#48978)
      
      * remove linear_chain_crf and crf_decoding from fluid (#48996)
      
      * remove linear_chain_crf and crf_decoding
      
      * Generate static graph code of some ops by yaml (#48977)
      
      * generate static graph code of some ops by yaml
      
      * fix the code-style of yaml
      
      * fix the framework_ci for triangular_solve
      
      * change the 'data_type' of scatter
      
      * add the 'out: Out' of scatter_nd_add
      
      * [tools] Update summary env (#48627)
      
      * [tools] remove deprecated api , fix macOS get version error
      
      * [tools] Rename the value that returns null
      
      * [tools] add gcc, clang, cmak, libc version
      
      * [tools] fix cudnn read error
      
      * [tools] add gpu devices list, drive based
      
      * [issue] update 3_build-installation-issue.yml
      
      * [tools] fix get gpu list AttributeError
      
      * [Dy2St] transforms.RandomVerticalFlip Support static mode (#49024)
      
      * add static RandomVerticalFlip
      
      * object => unittest.TestCase
      
      * Save fused_attention op memory when dropout_rate = 0.0 (#48902)
      
      * save fused_attention memory when dropout_rate = 0.0
      
      * add ut
      
      * fix ut bug
      
      * fix fused_layernorm_residual_dropout_bias_test.cu
      
      * Correct multiple inputs and outputs (#48872)
      
      * [CodeStyle][isort][Dy2St] sort imports for paddle.jit (#48637)
      
      * isort jit
      
      * refine comment
      
      * remove non-public apis from __all__ (#48952)
      
      * remove non-public apis from __all__
      
      * fix code style
      
      * fix rmsprop_ yaml bug (#49026)
      
      * fix rmsprop_ yaml bug
      
      * Fixed the dead link bug in the API documentation (#48969)
      
      * first pr
      
      * Revise nn.py
      
      * Revise nn.py 2.0
      
      * Revise rnn.py;test=document_fix
      
      * test=document_fix
      Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
      
      * Change mutable_data to ctx.Alloc. (#49001)
      
      * [inference][trt] add more unary op and square (#48534)
      
      * add more unary op and square
      
      * Support ninja (#48932)
      
      * move inplace_apis_indygraph_only from paddle.flud.dygraph.inplace_utils to paddle.utils
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify conflict
      
      * modify static-check ci error
      
      * fix conflict
      
      * modify failed tests
      
      * fix conflict
      
      * fix conflict
      
      * fix pool2d examples
      
      * modify conflict
      
      * fix failed tests
      
      * fix conflict
      
      * fix failed tests
      
      * modfiy problem of deleting pool2d
      
      * support Ninja in setup.py
      
      * support different cmake_generators
      
      * modify after reviewed
      
      * delete unused denotes
      
      * Deleted mkldnn_inplace_pass code (#47818)
      
      * Deleted mkldnn_inplace_pass code
      
      * Fixed error with cmake
      
      * Resolve conflicts
      
      * hide log (#49045)
      
      * test=doucment_fix
      
      * test=document_fix
      
      * [Sparse]Optimize performance of sparse conv on T4 (#49009)
      
      * modify cmake file for cuda11.8 compile (#49020)
      
      * modify cmake file for cuda11.8 compile
      
      * add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor)
      
      * remove dropout from fluid (#48319)
      
      * remove dropout
      
      * nullptr bugfix for XPU pg mode (#49043)
      
      * nullptr bugfix for XPU pg mode
      
      Also a few kernels is added to xpu whitelist
      
      * increase error msg length
      
      * Divide elementwise case from BroadcastKernel and refine transpose autotune (#33051)
      
      * First Commit.
      
      * add some codes
      
      * add elementwise loader
      
      * fix code styles
      
      * merge with develop
      
      * add some changes both in elementwise and transpose
      
      * add init operation in broadcast kernel.
      
      * change codes according to pr suggestions about transpose file
      
      * fix error for op-benchmark ci
      
      * fix according to ci
      
      * add condition of skipif (#48791)
      
      * add condition of skipif
      
      * fix code format error
      
      * Update test_fused_gate_attention_op.py
      
      update
      
      * rm unittests eager guard tests part9 histogram2imperative_dataloader (#48825)
      
      * rm unittests eager guard tests part9 histogram2imperative_dataloader
      
      * rm basic
      
      * rm unittests eager guard test part14 initializer2layer_norm (#48835)
      
      * rm unittests eager guard test part14 initializer2layer_norm
      
      * monior change
      
      * [Bugfix] recompute dep filter param (#49010)
      
      * recompute dep filter param
      
      * recompute dep for reshard
      
      * [Paddle Inference] rewrite convert_to_mixed_precision (#48853)
      
      * [CodeStyle] fix c++17-extensions warning on macos (#49017)
      
      * fix c++17-extensions warning on macos
      
      * fix type
      
      fix c++17-extensions warning on macos
      
      fix c++17-extensions warning on macos
      
      * Add custom CUDNN finding paths for 64bit Windows (#49066)
      
      * remove prior_box (#49006)
      
      * remove prior_box
      
      * modify the sequence of paras of prior_box in multi_box_head api
      
      * InstanceNorm1D、InstanceNorm2D、InstanceNorm3D (#48940)
      
      * modified:   python/paddle/nn/layer/norm.py
      
      * modified:   python/paddle/nn/layer/norm.py
      
      * modified:   python/paddle/nn/layer/norm.py
      
      * modified:   python/paddle/nn/layer/norm.py
      
      * modified:   python/paddle/nn/layer/norm.py
      
      * modified:   python/paddle/nn/layer/norm.py
      
      * test=docs_preview
      
      * InstanceNorm2D中文档格式修改
      
      * test=docs_preview
      
      * modified:   python/paddle/nn/functional/loss.py
      	modified:   python/paddle/nn/functional/norm.py
      	modified:   python/paddle/nn/layer/loss.py
      	modified:   python/paddle/nn/layer/norm.py
      
      * test=docs_preview
      
      * test=docs_preview
      
      * [AutoParallel] recompute tuning (#48608)
      
      * [AutoParallel] recompute tuning
      
      * fix conflict
      
      * update comment
      
      * bug fix
      
      * update rc algo
      
      * tiny fix
      
      * fix clear process_group
      
      * remove comment
      
      * update segment print
      
      * fix import OpRole
      
      * adapt amp pass and grad_clip pass for opt_tuner
      
      * update tuning config
      
      * fix import
      
      * annotate recompute info on ops and upgrade recompute pass
      
      * add op_namescope for seed op
      
      * record reserved vars
      
      * fix recompute var's dist_attr
      
      * fix strategy unittest
      
      * adapt for fp16
      
      * update unittest
      
      * revert copy opt
      
      * update unittest
      
      * rename set_recompute_segments
      
      * fix unittest
      
      * fluid API magration : array_read, array_write (#49022)
      
      * del array_write & array_read
      
      * fix import err
      
      * fix import err
      
      * fix example codes
      
      * Keep double-buffer reader for static mode  (#49068)
      
      * Fix nullptr to TestFuseGemmEpilogueReluBWDFP* (#48997)
      
      * support fp16 index sample (#47897)
      
      * add index sample fp16 support
      
      * remove fluid APIs in distributed_strategy.py and role_maker.py
      
      * Revert "remove fluid APIs in distributed_strategy.py and role_maker.py"
      
      This reverts commit 223bbee990d3bf69e252fc3c0f19e3873550a264.
      
      * fix instantiated more than once
      
      * clean codes
      
      * rm unittest eager guard tests part20 sparse_mv2split (#48879)
      
      * rm unittests eager guard tests part11 imperative_layer2ocr (#48828)
      
      * rm unittests eager guard tests part11 imperative_layer2ocr
      
      * review
      
      * rm eager guard tests part3_1 (#49059)
      
      * fix: gloo compatible (#49084)
      
      * rm eager guard tests part3_3 (#49061)
      
      * fix bug (#49081)
      
      * [Inference] memory_optimize and mkdlnn  problem (#49054)
      
      * memory_optimize and mkdlnn problem
      
      * update
      
      * update
      
      * update
      
      * Remove/move 16 fluid APIs (#48377)
      
      * remove density_prior_box
      
      * remove anchor_generator
      
      * remove roi_perspective_transform
      
      * remove generate_proposal_labels
      
      * remove generate_mask_labels
      
      * remove generate_proposals
      
      * remove box_clip
      
      * remove retinanet_detection_output
      
      * remove multiclass_nms
      
      * remove locality_aware_nms
      
      * remove matrix_nms
      
      * remove distribute_fpn_proposals
      
      * remove box_decoder_and_assign
      
      * remove collect_fpn_proposals
      
      * remove 2 trt files
      
      * move prior_box to static/nn/common.py
      
      * move multi_box_head to static/nn/common.py
      
      * fix for CI/CE
      
      * remove retinanet_detection_output
      
      * restore compile_vs_runtime_white_list.py
      
      * restore test_retinanet_detection_output to white list
      
      * replace nn.flatten by paddle.flatten, and fix doc for retinanet_target_assign
      
      * add enable_static in demo and fix bug
      
      * remove roi_perspective_transform in test_layers
      
      * remove multi_box_head
      
      * change self.multiclass_nms to _legacy_C_ops.multiclass_nms
      
      * empty commit
      
      * empty commit
      
      * check code style
      
      * fix prior_box
      
      * fix CI
      
      * remove redundant prior_box in detection.py
      
      * fix docs
      
      * remove detection
      
      * fix prior_box en doc
      
      * delete prior_box in common
      
      * remote proir_box from __init__.py
      
      * fix embedding multihead (#49085)
      
      * SetDeviceId in StreamSafeCUDAAllocation (#49080)
      
      * [PHI decoupling] Remove fluid imports from MKLDNN code (#48981)
      
      * fix wrong handler name
      
      * mkldnn_engine -> onednn_engine
      
      * remove fluid/errors.h imports
      
      * remove fluid/enforce.h imports
      
      * remove note and unnecessary import
      
      * remove fluid/pretty_log.h imports
      
      * remove fluid/place.h imports
      
      * remove fluid/data_layout_transform.h imports
      
      * remove fluid/device_context.h imports
      
      * remove mkldnn_helper code
      
      * remove fluid/mkldnn_reuse.h imports
      
      * pretty_log import
      
      * replace cross_entropy in python/paddle/fluid/tests/unittests/*.py (#48975)
      
      * 修复paddle.amp.decorate等API的文档 (#48983)
      
      * 涉及到的api有
      paddle.amp.decorate
      paddle.static.npu_places
      paddle.signal.istft
      paddle.signal.stft
      paddle.linalg.eigvalsh
      paddle.randint_like
      
      * change signal.stft
      
      * randint_like的low增加optional
      
      * ; test=docs_preview
      
      * 修改了注解格式; test=docs_preview
      
      * 修改了公式格式
      
      * 修改了decorate的models等
      
      * test=document_fix
      Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
      
      * 按在线文档需求 61~70 更新了部分文档 (#49014)
      
      * Update docstring:
      1. 去除 python/paddle/tensor/manipulation.py 中 cast 函数描述中的 This OP;
      2. 调整 python/paddle/fluid/layers/control_flow.py 中 Print 函数中参数描述的顺序,添加 optional 描述;
      3. 为 python/paddle/tensor/logic.py 中 logical_and 函数添加 optional 描述;
      4. 为 python/paddle/fluid/reader.py 中 DataLoader 类中 from_generator、from_dataset 函数添加 optional 描述;
      5. 在 python/paddle/fluid/layers/nn.py 中 crf_decoding 函数的 param_attr 在使用中确实可视为存在默认值 None,故添加 optional 描述;
      6. 修复 python/paddle/static/nn/common.py 中 data_norm 函数描述里 tex 语法错误的问题,并一并修复同一文件中的相同问题。
      
      * 根据 review 意见修改部分内容。
      
      * 将谓语动词去掉第三人称单数形式。
      
      * 同步中文文档变更。
      
      * string-->str; test=document_fix
      Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
      
      * merge gpugraph to develop, fix gloo wrapper
      
      * merge gpugraph to develop, fix ci
      
      * merge gpugraph to develop, fix gloo wrapper
      
      * merge gpugraph to develop, fix ci
      
      * merge gpugraph to develop, fix fleet.py
      
      * merge gpugraph to develop, fix merge error
      
      * merge gpugraph to develop, fix merge error
      
      * merge gpugraph to develop, add python ut
      
      * merge gpugraph to develop, fix code style
      
      * merge gpugraph to develop, add c++ ut
      
      * merge gpugraph to develop, fix code style
      
      * merge gpugraph to develop, fix data_feed.h
      
      * merge gpugraph to develop, fix code style
      
      * merge gpugraph to develop, fix code style
      
      * merge gpugraph to develop, fix code style
      
      * merge gpugraph to develop, fix code style
      Co-authored-by: Nwuhuachaocoding <77733235+wuhuachaocoding@users.noreply.github.com>
      Co-authored-by: NNyakku Shigure <sigure.qaq@gmail.com>
      Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>
      Co-authored-by: Nxiongkun <xiongkun03@baidu.com>
      Co-authored-by: Nceci3 <ceci3@users.noreply.github.com>
      Co-authored-by: Nzhangyikun02 <48021248+zhangyk0314@users.noreply.github.com>
      Co-authored-by: NWeilong Wu <veyron_wu@163.com>
      Co-authored-by: N201716010711 <87008376+201716010711@users.noreply.github.com>
      Co-authored-by: NQi Li <qili93@qq.com>
      Co-authored-by: Nzhoutianzi666 <39978853+zhoutianzi666@users.noreply.github.com>
      Co-authored-by: Nfeng_shuai <fengshuai03@baidu.com>
      Co-authored-by: NQingshuChen <chenqingshu@baidu.com>
      Co-authored-by: NWangZhen <23097963+0x45f@users.noreply.github.com>
      Co-authored-by: N张春乔 <83450930+Liyulingyue@users.noreply.github.com>
      Co-authored-by: N傅剑寒 <Xs1580802568@gmail.com>
      Co-authored-by: Nxiaoguoguo626807 <100397923+xiaoguoguo626807@users.noreply.github.com>
      Co-authored-by: Nwangzhen38 <41941775+wangzhen38@users.noreply.github.com>
      Co-authored-by: zhouweiwei2014's avatarZhou Wei <1183042833@qq.com>
      Co-authored-by: NKevin吴嘉文 <417333277@qq.com>
      Co-authored-by: NZman <35071129+Atlantisming@users.noreply.github.com>
      Co-authored-by: Nlzy <569782149@qq.com>
      Co-authored-by: NVvsmile <450864116@qq.com>
      Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
      Co-authored-by: NRichardWooSJTU <37864677+RichardWooSJTU@users.noreply.github.com>
      Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
      Co-authored-by: Nhjyp <53164956+Tomoko-hjf@users.noreply.github.com>
      Co-authored-by: NVigi Zhang <VigiZhang@users.noreply.github.com>
      Co-authored-by: Nzqw_1997 <118182234+zhengqiwen1997@users.noreply.github.com>
      Co-authored-by: NYiqun Liu <Xreki@users.noreply.github.com>
      Co-authored-by: Nrisemeup1 <62429225+risemeup1@users.noreply.github.com>
      Co-authored-by: NGuanghua Yu <742925032@qq.com>
      Co-authored-by: N姜永久 <34344716+yjjiang11@users.noreply.github.com>
      Co-authored-by: Nwanghuancoder <wanghuan29@baidu.com>
      Co-authored-by: NRoc <30228238+sljlp@users.noreply.github.com>
      Co-authored-by: NWilber <jiweibo@baidu.com>
      Co-authored-by: NGhost Screaming <mofengshenjieII@163.com>
      Co-authored-by: NNetpunk <69072522+Patrick-Star125@users.noreply.github.com>
      Co-authored-by: N六个骨头 <46243324+zrr1999@users.noreply.github.com>
      Co-authored-by: NAurelius84 <zhangliujie@baidu.com>
      Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
      Co-authored-by: Nliu zhengxi <380185688@qq.com>
      Co-authored-by: Nheyanru <81976792+heyanru01@users.noreply.github.com>
      Co-authored-by: Ntianshuo78520a <707759223@qq.com>
      Co-authored-by: Nhuangjiyi <43315610+huangjiyi@users.noreply.github.com>
      Co-authored-by: NInfinity_lee <luhputu0815@gmail.com>
      Co-authored-by: Nhouj04 <35131887+houj04@users.noreply.github.com>
      Co-authored-by: Nlugimzzz <63761690+lugimzzz@users.noreply.github.com>
      Co-authored-by: NWangzheee <634486483@qq.com>
      Co-authored-by: Nsneaxiy <32832641+sneaxiy@users.noreply.github.com>
      Co-authored-by: Nkangguangli <kangguangli@hotmail.com>
      Co-authored-by: Njakpiase <jakpia21@gmail.com>
      Co-authored-by: NHongyuJia <jiahongyu@baidu.com>
      Co-authored-by: Nhaosicheng <47998305+HarperCy@users.noreply.github.com>
      Co-authored-by: NChang Xu <molixu7@gmail.com>
      Co-authored-by: NKai Song <50285351+USTCKAY@users.noreply.github.com>
      Co-authored-by: Nlimingshu <61349199+JamesLim-sy@users.noreply.github.com>
      Co-authored-by: NJianghai <72591262+CjhHa1@users.noreply.github.com>
      Co-authored-by: Njiangcheng <thisjiang@qq.com>
      Co-authored-by: Nccrrong <101700995+ccrrong@users.noreply.github.com>
      Co-authored-by: NPuQing <me@puqing.work>
      Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
      Co-authored-by: Ncyber-pioneer <116002591+cyber-pioneer@users.noreply.github.com>
      Co-authored-by: Nniuliling123 <51102941+niuliling123@users.noreply.github.com>
      Co-authored-by: Njames <zhangxiaoci@baidu.com>
      Co-authored-by: Nwenbin <wang3323032@qq.com>
      Co-authored-by: MarDino's avatarZZK <359521840@qq.com>
      Co-authored-by: NZhang Jun <ewalker@live.cn>
      Co-authored-by: Nyjphhw <43883055+yjphhw@users.noreply.github.com>
      Co-authored-by: NYuanle Liu <yuanlehome@163.com>
      Co-authored-by: NWen Sun <35923278+HermitSun@users.noreply.github.com>
      Co-authored-by: HappyHeavyRain's avatarlzydev <1528794076@qq.com>
      Co-authored-by: NPaulina Gacek <paulina.gacek@intel.com>
      Co-authored-by: Nfeifei-111 <2364819892@qq.com>
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>
      Co-authored-by: Nweishengying <63448337+weishengying@users.noreply.github.com>
      Co-authored-by: Nengineer1109 <jialiang.wang@xdxct.com>
      Co-authored-by: Ngouzil <66515297+gouzil@users.noreply.github.com>
      Co-authored-by: NRyan <44900829+DrRyanHuang@users.noreply.github.com>
      Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>
      Co-authored-by: NJYChen <zoooo0820@qq.com>
      Co-authored-by: Njjyaoao <88936287+jjyaoao@users.noreply.github.com>
      Co-authored-by: NHulek <jakub.hulek@intel.com>
      Co-authored-by: Nzhangkaihuo <zhangkaihuo@baidu.com>
      Co-authored-by: NYUNSHEN XIE <1084314248@qq.com>
      Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
      Co-authored-by: NTinson Lai <laitingsheng@hotmail.com>
      Co-authored-by: NAyuan <79981115+Ayuan2021@users.noreply.github.com>
      Co-authored-by: Nzhaoyingli <86812880+zhaoyinglia@users.noreply.github.com>
      Co-authored-by: NMing-Xu Huang <mingh@nvidia.com>
      Co-authored-by: Nwangxiaoning <71813629+wangxn12138@users.noreply.github.com>
      Co-authored-by: NHaohongxiang <86215757+haohongxiang@users.noreply.github.com>
      Co-authored-by: NHydrogenSulfate <490868991@qq.com>
      Co-authored-by: Nmjxs <52824616+kk-2000@users.noreply.github.com>
      Co-authored-by: 学渣戊's avatar学渣戊 <x19403@163.com>
      1acddc34
  3. 14 12月, 2022 1 次提交
  4. 02 11月, 2022 1 次提交
  5. 01 11月, 2022 1 次提交
  6. 19 10月, 2022 1 次提交
  7. 14 9月, 2022 1 次提交
  8. 01 8月, 2022 1 次提交
    • A
      support build with Ninja on Linux (#44210) · 1d79f1f7
      Allen Guo 提交于
      * support ninja
      
      * fix mkldnn on windows
      
      * fix mkldnn on windows up1
      
      * up2
      
      * up3
      
      * fix gflags
      
      * BUILD_BYPRODUCTS_OPTION -> BUILD_BYPRODUCTS_ARGS
      
      * use CMAKE_COMMAND
      
      * up x
      1d79f1f7
  9. 15 7月, 2022 1 次提交
  10. 01 7月, 2022 1 次提交
  11. 29 6月, 2022 1 次提交
  12. 13 6月, 2022 1 次提交
  13. 09 6月, 2022 1 次提交
  14. 04 6月, 2022 1 次提交
  15. 09 5月, 2022 1 次提交
  16. 22 4月, 2022 1 次提交
    • Z
      Ssd sparse table (#41812) · cca57c4a
      zhaocaibei123 提交于
      * [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464)
      
      cherry-pick
      
      fix compile bug of windows cuda11.5 #41433
      
      * fix bug of missing boost when compile cache.cc (#41449)
      
      【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies
      
      * Fix eager try catch (#41438) (#41477)
      
      [Cherry-Pick]Fix eager try catch (#41438)
      
      * Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475)
      
      Cherry-pick PR #41407
      
      * [BugFix] Add error hint for one_hot gpu version (#41335) (#41495)
      
      * add one_hot gpu hint
      
      * move allow_out_of_range judgement
      
      * delete useless unittest
      
      * fix bugs of reshape double grad infermeta (#41459) (#41493)
      
      * [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)
      Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
      
      * [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523)
      
      Cherry-pick of #41521
      
      * [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509)
      
      * Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200)
      
      * Add fill_constant_batch_size YAML and UT (#41474)
      
      * Switch some dy2st UT to eager mode (#41382)
      
      * Sitch some dy2st UT to eager mode
      
      * Fix test_lstm and remove test_transformer
      
      * Run test_resnet_v2 in old dy mode
      
      * Unittest recover (#41431)
      
      * update name
      
      * update name
      
      * fix test
      
      * fix fleet bind
      
      * update name
      
      * update name
      
      * fix test
      
      * fix gpups wrapper
      
      * remove Push/Pull/Load/Save with context in client and wrapper base class
      
      * fix
      
      * fix
      
      * remove some interface
      
      * fix
      
      * remove
      
      * code style
      
      * recover
      
      * fix
      
      * remove code unused
      
      * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable
      
      * fix
      
      * fix
      
      * fix
      
      * recover
      
      * remove unused code
      
      * recover unittest
      
      * fix
      
      * remove
      
      * fix
      
      * remove code unuseful
      
      * remove
      
      * fix
      
      * recover
      
      * remove
      Co-authored-by: Nesythan <esythan@126.com>
      
      * add ssd sparse table
      
      * fix
      
      * add cache shuffle
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * add unit test
      
      * fix
      Co-authored-by: zhouweiwei2014's avatarZhou Wei <1183042833@qq.com>
      Co-authored-by: NSing_chan <51314274+betterpig@users.noreply.github.com>
      Co-authored-by: N0x45f <23097963+0x45f@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      Co-authored-by: NSiming Dai <908660116@qq.com>
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      Co-authored-by: NZhang Jun <ewalker@live.cn>
      Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
      Co-authored-by: NQi Li <qili93@qq.com>
      Co-authored-by: Nesythan <esythan@126.com>
      cca57c4a
  17. 15 4月, 2022 1 次提交
    • Z
      solve brpc compile in arm-ubantu18 (#41649) · 56dafc4f
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * arm_brpc compile
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * only output is ok
      
      * base is ok
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * add switch server bin
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * adapt brpc ssl
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      56dafc4f
  18. 10 3月, 2022 1 次提交
    • H
      Inference add ONNXRuntime back-end (#39988) · 431afc39
      heliqi 提交于
      * add onnxruntime predictor
      
      * Add code comments
      
      * support link paddle2onnx onnxruntime
      
      * support onnxruntime with python
      
      * support onnxruntime with python
      
      * support onnxruntime with windows
      
      * paddle2onnx compile with windows
      
      * supoort windows compile
      
      * supoort windows compile with onnxruntime
      
      * supoort windows compile with paddle2onnx
      
      * supoort mac compile
      
      * compile with mac
      
      * compile with mac
      
      * add code comments
      
      * fix remind word
      
      * code optimization
      
      * add test case
      
      * add test case
      
      * add inference demo_ci test case
      
      * fix compile paddle2onnx with no python
      
      * add inference demo_ci test case
      
      * add inference demo_ci test case
      
      * add inference infer_ut test case
      
      * support c go api and test cases
      
      * add converage test case
      
      * add converage test case
      
      * add capi test case
      
      * add capi test case
      431afc39
  19. 27 1月, 2022 1 次提交
  20. 20 12月, 2021 1 次提交
  21. 15 12月, 2021 1 次提交
  22. 07 12月, 2021 1 次提交
    • Y
      introduce INF-RT (#37669) · 70dea138
      Yan Chunwei 提交于
      * add infrt code
      
      refined with Paddle's code style.
      
      * rename CinnRtConfig to InfRtConfig
      
      * rename CinnRt to InfRt of some code
      
      * rename CINNRT to INFRT
      
      * remove unnecessary code
      
      * replace CINN to INFRT in the source code
      
      * replace all "cinn" in code to "infrt"
      
      * remove some const_cast
      70dea138
  23. 03 12月, 2021 1 次提交
  24. 11 11月, 2021 1 次提交
  25. 09 11月, 2021 1 次提交
  26. 04 11月, 2021 1 次提交
  27. 24 10月, 2021 1 次提交
  28. 23 10月, 2021 1 次提交
    • H
      New Paddle-CINN Compile PR (#36584) · ab732884
      Huihuang Zheng 提交于
      This PR added some changes to match the CINN change for compilation. It also tried to fix JiangCheng's Problem in PR: https://github.com/PaddlePaddle/Paddle/pull/36100
      
      These changes include:
      1. Set `CINN_GIT_TAG` to a newer tag
      2. CINN now just `make cinnapi -j`
      3. We have to add `-DPY_VERSION=${PY_VERSION} -DWITH_TESTING=ON` to CINN cmake args
      4. For CINN's third party dependencies, we could just include headers without target_link_libraries
      5. Moved `cinn.cmake` from `paddle/cmake` to `paddle/cmake/external` to match old style. External folder contains `lite`, which is the same level of `cinn`
      6. CINN added `-DNAMESPACE=cinn_gflags` in `gflags.cmake` to have different gflag namespaces between CINN and Paddle. It solved re-define problem.
      7. Change namespace of `::google::` in gflags to `::GFLAGS_NAMESPACE`
      ab732884
  29. 20 10月, 2021 1 次提交
    • S
      Add FasterTokenizer Operator (#34491) · 3f2d6a3f
      Steffy-zxf 提交于
      Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.
      
      * support the text string as an input Tensor
      * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
      * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
      * It first applies basic tokenization, followed by wordpiece tokenization.
      3f2d6a3f
  30. 11 10月, 2021 1 次提交
  31. 27 9月, 2021 1 次提交
  32. 22 9月, 2021 1 次提交
  33. 18 9月, 2021 1 次提交
    • F
      Add FFT related operators and APIs (#35665) · 11518a43
      Feiyu Chan 提交于
      * 1. add interface for fft;
      2. add data type predicate;
      3. fix paddle.roll.
      
      * add fft c2c cufft kernel
      
      * implement argument checking & op calling parts for fft_c2c and fftn_c2c
      
      * add operator and opmaker definitions
      
      * only register float and double for cpu.
      
      * add common code for implementing FFT, add pocketfft as a dependency
      
      * add fft c2c cufft kernel function
      
      * fix bugs in python interface
      
      * add support for c2r, r2c operators, op makers, kernels and kernel functors.
      
      * test and fix bugs
      
      * 1. fft_c2c function: add support for onesided=False;
      2. add complex<float>, complex<double> support for concat and flip.
      
      * 1. fft: fix python api bugs;
      2. shape_op: add support for complex data types.
      
      * fft c2c cufft kernel done with complie and link
      
      * fix shape_op, add mkl placeholder
      
      * remove mkl
      
      * complete fft c2c in gpu
      
      * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
      2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.
      
      * complete fft c2c on gpu in ND
      
      * complete fft c2c on gpu in ND
      
      * complete fft c2c backward in ND
      
      * fix MKL-based implementation
      
      * Add frame op and CPU/GPU kernels.
      
      * Add frame op forward unittest.
      
      * Add frame op forward unittest.
      
      * Remove axis parameter in FrameFunctor.
      
      * Add frame op grad CPU/GPU kernels and unittest.
      
      * Add frame op grad CPU/GPU kernels and unittest.
      
      * Update doc string.
      
      * Update after review and remove librosa requirement in unittest.
      
      * Update grad kernel.
      
      * add fft_c2r op
      
      * Remove data allocation in TransCompute function.
      
      * add fft r2c onesided with cpu(pocketfft/mkl) and gpu
      
      * last fft c2r functor
      
      * fix C2R and R2C for cufft, becase the direction is not an option in these cases.
      
      * add fft r2c onesided with cpu(pocketfft/mkl) and gpu
      
      * fix bugs in python APIs
      
      * fix fft_c2r grad kernal
      
      * fix bugs in python APIs
      
      * add cuda fft c2r grad kernal functor
      
      * clean code
      
      * fix fft_c2r python API
      
      * fill fft r2c result with conjugate symmetry (#19)
      
      fill fft r2c result with conjugate symmetry
      
      * add placeholder for unittests (#24)
      
      * simple parameterize test function by auto generate test case from parm list (#25)
      
      * miscellaneous fixes for python APIs (#26)
      
      * add placeholder for unittests
      
      * resize fft inputs before computation is n or s is provided.
      
      * add complex kernels for pad and pad_grad
      
      * simplify argument checking.
      
      * add type promotion
      
      * add int to float or complex promotion
      
      * fix output data type for static mode
      
      * fix fft's input dtype dispatch, import fft to paddle
      
      * fix typos in axes checking (#27)
      
      * fix typos in axes checking
      
      * fix argument checking (#28)
      
      * fix argument checking
      
      * Add C2R Python layer normal and abnormal use cases (#29)
      
      * documents and single case
      
      * test c2r case
      
      * New C2R Python layer normal and exception use cases
      
      * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)
      
      * Documentation of the common interfaces of c2r and c2c (#31)
      
      * Documentation of the common interfaces of c2r and c2c
      
      * clean c++ code  (#32)
      
      * clean code
      
      * Add numpy-based implementation of spectral ops (#33)
      
      * add numpy reference implementation of spectral ops
      
      * Add fft_c2r numpy based implementation for unittest. (#34)
      
      * add fft_c2r numpy implementation
      
      * Add deframe op and stft/istft api. (#23)
      
      * Add frame api
      
      * Add deframe op and kernels.
      
      * Add stft and istft apis.
      
      * Add deframe api. Update stft and istft apis.
      
      * Fix bug in frame_from_librosa function when input dims >= 3
      
      * Rename deframe to overlap_add.
      
      * Update istft.
      
      * Update after code review.
      
      * Add overlap_add op and stft/istft api unittest (#35)
      
      * Add overlap_add op unittest.
      
      * Register complex kernels of squeeze/unsquuze op.
      
      * Add stft/istft api unittest.
      
      * Add unittest for fft helper functions (#36)
      
      * add unittests for fft helper functions. add complex kernel for roll op.
      
      * complete static graph unittest for all public api (#37)
      
      * Unittest of op with FFT C2C, C2R and r2c added (#38)
      
      * documents and single case
      
      * test c2r case
      
      * New C2R Python layer normal and exception use cases
      
      * Documentation of the common interfaces of c2r and c2c
      
      * Unittest of op with FFT C2C, C2R and r2c added
      Co-authored-by: lijiaqi0612's avatarlijiaqi <lijiaqi0612@163.com>
      
      * add fft related options to CMakeLists.txt
      
      * fix typos and clean code (#39)
      
      * fix invisible character in mkl branch and fix error in error message
      
      * clean code: remove docstring from unittest for signal.py.
      
      * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)
      
      * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
      
      * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)
      
      1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
      2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
      3. fix unittest to catch UnImplementedError and RuntimeError;
      4. fix compile error by avoid using thrust when cuda is not available.
      5.  fix sample code, use paddle.fft instead of paddle.tensor.fft
      
      * remove inclusion of thrust, add __all__ list for fft (#42)
      
      * Add api doc and update unittest. (#43)
      
      * Add doc strings.
      * Update overlap_add op unittest
      
      * fix MKL-based FFT implementation (#44)
      
      * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R
      
      * remove code for debug (#45)
      
      * use dynload for cufft (#46)
      
      * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.
      
      * add complex support for fill_zeros_like
      
      * use dynload for cufft
      
      * Update doc and unittest. (#47)
      
      * Add doc of frame op and overlap_add op.
      
      * Update unittest.
      
      * use dynload for cufft (#48)
      
      1. use dynload for cufft
      2. fix unittest;
      3. temporarily disable Rocm.
      
      * fix conflicts and merge upstream (#49)
      
      fix conflicts and merge upstream
      
      * fix compile error: only link dyload_cuda when cuda is available (#50)
      
      * fix compile error: only link dyload_cuda when cuda is available
      
      * fix dynload for cufft on windows (#51)
      
      1. fix dynload for cufft on windows;
      2. fix unittests.
      
      * add NOMINMAX to compile on windows (#52)
      
       add NOMINMAX to compile on windows
      
      * explicitly specify capture mode for lambdas (#55)
      
       explicitly specify capture mode for lambdas
      
      * fix fft sample (#53)
      
      * fix fft sample
      
      * update scipy and numpy version for unittests of fft (#56)
      
      update scipy and numpy version for unittests of fft
      
      * Add static graph unittests of frame and overlap_add api. (#57)
      
      * Remove cache of cuFFT & Disable ONEMKL (#59)
      
      1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
      2. remove cache of cufft plans;
      3. enhance error checking.
      4. default WITH_ONEMKL to OFF
      Co-authored-by: Njeff41404 <jeff41404@gmail.com>
      Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
      Co-authored-by: NKP <109694228@qq.com>
      Co-authored-by: lijiaqi0612's avatarlijiaqi <lijiaqi0612@163.com>
      Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
      Co-authored-by: Nlijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>
      11518a43
  34. 24 6月, 2021 1 次提交
  35. 09 6月, 2021 1 次提交
  36. 01 6月, 2021 1 次提交
  37. 27 5月, 2021 2 次提交
  38. 10 5月, 2021 1 次提交
  39. 21 4月, 2021 1 次提交
    • Z
      【NPU】Merge NPU ccl code (#32381) · c3158527
      zhang wenhui 提交于
      * add allreduce and broadcast without test (#31024)
      
      add allreduce and broadcast without test
      
      * Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      Refactor HCCLCommContext to be compatible with Paddle (#31359)
      
      * [NPU] add npu kernel for communication op (#31437)
      
      * add allreduce and broadcast without test
      
      * add c_broadcast_test case
      
      * build c_comm_init and c_create_group operators
      
      * make the whole thing compile
      
      * add broadcast and init op test case but run failed
      
      * make unit test compile
      
      * fix broadcast test bug and change into hcom for ccl
      
      * change c_comm_init and c_create_group ops accordingly
      
      * make tests compile
      
      * transfer code to 27
      
      * compiled successfully in 28, but run failed
      
      * test broadcast in 28, but failed
      
      * make hcom primitives work
      
      * change hccl data type for base.h
      
      * fix broadcast bug
      
      * make attributes work
      
      * fix group name bug
      
      * add allreduce but test failed
      
      * allreduce bug for qiuliang
      
      * allreduce finished
      
      * add allgather and reducescatter
      
      * merge all op code
      
      * add allgather test
      
      * finish run all ccl op test exclude send/recv
      
      * all all op and test exclude send/recv
      
      * send_v2_npu.cc recv_v2_npiu.cc compiled
      
      * fix ccl core dump bug and test allgather, reducescatter, broadcast op
      
      * fix allreduce bug just for test
      
      * hcom send&recv test pass, without hcom_destroy
      
      * for qiuliang test
      
      * Ascend Send&Recv Test Pass
      
      * all op (ex send/recv) ok
      
      * fix bug
      
      * merge all ccl op
      
      * style merge to PaddlePaddle
      
      * merge style
      
      * new merge style
      
      * merge style 2
      
      * insert an empty at the end
      
      * disable ctest for hcom to pass ci
      Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      
      * Add auto-increasing tag id for Hcom OPs (#31702)
      
      * add c_reduce_sum op (#31793)
      
      add c_reduce_sum op
      
      * update Ascendrc hccl to 20.3 (#32126)
      
      update Ascendrc hccl to 20.3 (#32126)
      
      * fix merge code
      
      * change cmake.txt1
      
      * [NPU] Support npu kernel for c sync stream op (#31386)
      
      * sync stream npu op
      
      * add with_ascend_acl
      
      * update c++ unittest
      
      * compile all failed
      
      * try to pre commit
      
      * after pre commit
      
      * merge&compile&test hccl successfully!
      
      * fix code style
      
      * fix code style
      
      * fix bugs about hccl
      
      * fix some bugs
      
      * fix code style
      
      * fix style
      
      * fix style
      
      * fix
      
      * fixed
      
      * merge develop
      Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
      Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
      Co-authored-by: Nf2hkop <f2huestc@outlook.com>
      Co-authored-by: Nxiayanming <41795079@qq.com>
      c3158527