1. 31 3月, 2022 17 次提交
    • L
      [new-exec] fit mkldnn op (#41058) · 02cf6764
      Leo Chen 提交于
      * fix bug that some op has no op_role attr
      
      * add mkldnn support for new executor
      
      * fit for mkldnn data_transfer
      
      * fit for mkldnn data_transfer
      02cf6764
    • T
      Using DistConfig in Paddle Inference (#41128) · dc0702fe
      TeslaZhao 提交于
      * Pass compat of conv_transpose_bias_mkldnn_fuse_pass
      
      * Fix a bug of strided_slice op, about the axes parameter access memory out of bounds
      
      * Fix a bug of strided_slice op, about the axes parameter access memory out of bounds
      
      * Fix a bug of transpose op, about accessing memory out of bounds of the perm param
      
      * op:transpose_op supports bool type
      
      * op:transpose_op supports bool type
      
      * Keep strided_slice op behavior consistent with slice op when starts input is less than -rank
      
      * Using DistConfig in inference
      dc0702fe
    • C
      Maintain old profiler (#41132) · a6bf2218
      chenjian 提交于
      * no
      
      * maintain old profiler
      
      * exclude new python record events for old profiler
      
      * maintain old profiler
      
      * maintain
      
      * maintain old profiler
      
      * maintain
      
      * fix cmakes
      a6bf2218
    • H
      add flatten2,reshape2,squueze2_trt_fuse_pass test cast (#41031) · 7ef69202
      heliqi 提交于
      * add flatten2,reshape2,squueze2_trt_fuse_pass  test cast
      
      * add flatten2,reshape2,squueze2_trt_fuse_pass  test cast
      
      * add flatten2,reshape2,squueze2_trt_fuse_pass  test cast
      7ef69202
    • L
      [FleetExecutor] Add source interceptor and test (#41122) · 4974fdfd
      LiYuRio 提交于
      4974fdfd
    • 0
      Fix test_run_program_op.py (#41141) · 7c555f4e
      0x45f 提交于
      7c555f4e
    • W
      [phi] move yolov3_loss to phi (#40944) · fb93bd5c
      wuyefeilin 提交于
      * mv yolov3_loss op to phi
      
      * fix as review
      
      * update operator.h
      fb93bd5c
    • W
      fix load bug and add distributed strategy from pslib (#40883) · 47383dca
      wangguanqun 提交于
      * fix load bug and add distributed strategy from pslib
      
      * add unittest
      
      * use cvm config
      
      * trainer and worker config
      
      * add unittest
      
      * add unittest
      
      * add test
      
      * code style
      47383dca
    • L
      add depend when doing fuse_all_optimizer on program (#41178) · 3b00dc92
      Leo Chen 提交于
      * fix dependency of fused optimizer
      
      * add ut
      3b00dc92
    • C
      Add time range duration display (#41029) · 6744754f
      chenjian 提交于
      * no
      
      * fix bugs
      
      * fix doc according to review
      
      * fix api doc format
      
      * fix api doc according to review
      
      * fix bug and add unit test
      
      * fix record event bug
      
      * optimize chrome tracing display
      
      * fix bug
      
      * add comment
      
      * add unit test
      
      * fix a bug
      
      * fix
      
      * fix
      
      * fix format
      6744754f
    • L
      [KP] fix bug in phi kp (#41069) · ac5548a2
      Liu-xiandong 提交于
      * [KP] fix bug in phi kp
      
      * delete useless comment
      
      * update
      
      * update
      
      * choose the xpu kp kernel in phi
      ac5548a2
    • Z
      Restrict compilation conditions of optimized topk kernel (#41153) · dea24544
      Zhang Zheng 提交于
      * Restrict compilation conditions of optimized topk kernel
      
      * fix
      dea24544
    • W
      remove shape check (#41143) · 4b9e748a
      wenbin 提交于
      4b9e748a
    • P
      support view strategy in eager_fluid state (#40830) · 2f1c1ae5
      pangyoki 提交于
      * support view strategy in eager_fluid state
      
      * little change
      
      * little change
      
      * optimize unittest
      
      * fix
      2f1c1ae5
    • P
      fix eager_gen node bug (#41165) · 56493c9e
      pangyoki 提交于
      56493c9e
    • P
      Support inplace strategy for pylayer (#41043) · 11d1a51a
      pangyoki 提交于
      * Supported Complex2Real Conversion for Eager Dygraph
      
      * Supported Complex2Real Conversion for Eager Dygraph
      
      * Enabled complex type promotion test for matmul_v2
      
      * pylayer, test=develop
      
      * Fix CI issues
      
      * Support initializing specific grad tensors to zero for selected operators
      
      * finish forward, test=develop
      
      * create grad node finish, test=develop
      
      * Merged adj_edges_ with GradSlotMeta
      
      * Fixed monir issue
      
      * backward finish, start dbg, test=develop
      
      * Adjusted num runs
      
      * Recovered Eager performance tests configurations
      
      * Recovered Eager performance tests configurations
      
      * finish, test=develop
      
      * polish, test=develop
      
      * polish, test=develop
      
      * refine, test=develop
      
      * eager, test=develop
      
      * Adjusted performance tests configurations
      
      * Fixed Minor Issues with performance tests
      
      * [Phi] Fix macro name typo
      
      * support set_materialize_grads, test=develop
      
      * suppotr mark_non_differentiable, test=develop
      
      * support once_differentiable, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * Moved out Edge from GradSlotMeta
      
      * Fixed issues from merge
      
      * Fixed typo
      
      * Addressed review comments
      
      * Fixed merge issues
      
      * Fixed minor issues
      
      * Fixed minor issue
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * Fixed major issues and enabled auto_prune test cases
      
      * Fixed issues from merge
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support inplace for pylayer
      Co-authored-by: Njim19930609 <jim19930609@gmail.com>
      Co-authored-by: NWang Huan <wanghuan29@baidu.com>
      Co-authored-by: NAurelius84 <zhangliujie@baidu.com>
      11d1a51a
    • L
      Pg heter cloud (#40911) · 92faeedf
      lilong12 提交于
      92faeedf
  2. 30 3月, 2022 23 次提交
    • 0
      Fix test_jit_save_load (#41114) · 4b61918d
      0x45f 提交于
      4b61918d
    • Z
      [Phi] Move Rnn Op from fluid to phi (#41007) · 66cf8b08
      zyfncg 提交于
      * move rnn kernel to phi
      
      * move infershape of rnn to phi
      
      * fix HIP bug
      
      * rename function
      
      * fix HIP bug
      
      * fix hip bug
      66cf8b08
    • R
      [MoE] Moe apis (#41092) · aac7879a
      Roc 提交于
      * add random routing op
      
      add _random_routing api in utils
      
      add random routing ut
      
      * # This is a combination of 10 commits.
      # The first commit's message is:
      add expert count op
      
      add ut for expert_count
      
      # This is the 2nd commit message:
      
      update UT only for cuda
      
      # This is the 3rd commit message:
      
      fix for rocm
      
      # This is the 4th commit message:
      
      update ut
      
      # This is the 5th commit message:
      
      add moe module
      
      # This is the 6th commit message:
      
      add expert count op
      
      add ut for expert_count
      
      # This is the 7th commit message:
      
      update UT only for cuda
      
      # This is the 8th commit message:
      
      update ut
      
      # This is the 9th commit message:
      
      add moe module
      
      # This is the 10th commit message:
      
      make expert count private
      
      * add assign pos op
      
      * fix upper num name
      
      * add api _assign pos
      
      * add ut for assign pos op
      
      * update date
      
      * add op about moe gate
      
      update utils
      
      add limit by capacity op
      
      add ut for limit_by_capacity
      
      add ut for prune_gate_by_capacity
      
      add ut for limit_by_capacity
      
      add ut for prune_gate_by_capacity
      
      * fix for win
      
      * fix bugs in test_limit_by_capacity_op
      
      * update ut
      
      * update for test (timeout)
      
      * fix ut
      
      * update
      
      * update(fix) ut for win
      
      * moe apis in incubate
      
      * # This is a combination of 10 commits.
      # The first commit's message is:
      add expert count op
      
      add ut for expert_count
      
      # This is the 2nd commit message:
      
      update UT only for cuda
      
      # This is the 3rd commit message:
      
      fix for rocm
      
      # This is the 4th commit message:
      
      update ut
      
      # This is the 5th commit message:
      
      add moe module
      
      # This is the 6th commit message:
      
      add expert count op
      
      add ut for expert_count
      
      # This is the 7th commit message:
      
      update UT only for cuda
      
      # This is the 8th commit message:
      
      update ut
      
      # This is the 9th commit message:
      
      add moe module
      
      # This is the 10th commit message:
      
      make expert count private
      
      * add assign pos op
      
      * fix upper num name
      
      * add api _assign pos
      
      * add ut for assign pos op
      
      * update date
      
      * fix for win
      
      * update for test (timeout)
      
      * fix ut
      
      * update
      
      * fix ut for number count
      
      * add apis and utils
      
      * add gate apis
      
      * add moe and grad clip apis
      
      * update moe apis
      
      * add ops for moe gate
      
      * fix
      
      * update for base moe layer api
      
      * add random routing op
      
      add _random_routing api in utils
      
      add random routing ut
      
      * fix for dygraph
      
      * update with ranodm routing
      
      * update
      
      * fix ut for limit by capacity
      
      * update
      
      * update limit by capacity for easily to switch to single thread mode
      
      * update api docs
      Co-authored-by: Nhlygit66666 <2570058140@qq.com>
      aac7879a
    • C
      Revert "Revert "[Phi] Move elementwise_floordiv and elementwise_pow to phi... · eef46770
      Chen Weihang 提交于
      Revert "Revert "[Phi] Move elementwise_floordiv and elementwise_pow to phi (#40993)" (#41065)" (#41110)
      
      This reverts commit 3a6f1135.
      eef46770
    • F
      Add new APIs for GPU memory monitoring (max_memory_allocated,... · afe02e9d
      From00 提交于
      Add new APIs for GPU memory monitoring (max_memory_allocated, max_memory_reserved, memory_allocated, memory_reserved) (#38657)
      
      * Add new API memory_reserved
      
      * Add memory_allocated, max_memory_reserved and max_memory_allocater
      
      * Fix CI error
      
      * Fix CI error
      
      * Enhance UT
      
      * Add FLAGS_memory_stats_opt
      
      * Add STATS macro functions
      
      * Add StatAllocator
      
      * Fix CI errors
      
      * Add UT
      
      * Fix CI errors
      afe02e9d
    • C
      Revert "Revert "[Phi] trans logsumexp op (#40790)" (#41068)" (#41109) · ee8eeb45
      Chen Weihang 提交于
      This reverts commit 054fc997.
      ee8eeb45
    • H
      Revert "Revert "Move some activation to phi (#40727)" (#41056)" (#41095) · 91bb52cd
      hong 提交于
      This reverts commit 05f3d48e.
      91bb52cd
    • Z
      [DoubleGrad PR #3] Supported higher-order GradNode generation (#41051) · abd2df4c
      Zhanlue Yang 提交于
      * [Refactor] refactored eager_gen.py PR #2
      
      * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes
      
      * Fixed minor issue
      
      * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition
      
      * Fixed issues
      
      * Supported higher-order grad node generation
      
      * [DoubleGrad PR #4] Supported higher-order GradNode generation
      
      * Fixed yaml typo
      abd2df4c
    • P
      add _reset_grad_inplace_version (#41101) · cb8afc24
      pangyoki 提交于
      cb8afc24
    • A
      [Yaml] Fix topk yaml compilation problem on Windows (#41082) · 95265d5c
      Aurelius84 提交于
      * [Yaml] Fix topk yaml compilation on Windows
      
      * fix make_shared
      
      * fix conflict
      95265d5c
    • Y
      add bilinear interpolate v2 to xpu list and unitteset, *test=kunlun (#41037) · 4e86dff2
      ykkk2333 提交于
      * add bilinear interpolate v2 to xpu list and unitteset, *test=kunlun
      
      * Delete ps_usr_print_log
      
      * Delete ps_usr_print_log
      
      * Delete xpu_op_test
      4e86dff2
    • Z
      Apply TransposeFolding & GemmRewriter passes. (#41084) · c761b48b
      Zhen Wang 提交于
      c761b48b
    • W
      [Eager] dlpack (#40811) · 4d300224
      wanghuancoder 提交于
      * dlpack eager, test=develop
      
      * eager test_base_layer, test=develop
      
      * fix error report, test=develop
      
      * eager _getitem_from_offset, test=develop
      
      * refine, test=develop
      
      * refine offset, test=develop
      
      * add test_inner test_outer, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      4d300224
    • Y
      move elementwise_mul selected rows input (#41042) · 13f1641d
      YuanRisheng 提交于
      13f1641d
    • Z
      Optimize the perf of top_k when k is too large (#40941) · 45078d9f
      Zhang Zheng 提交于
      * Optimize the perf of top_k when k is too large
      
      * fix rcom compile
      
      * fix
      
      * only compile in cuda
      
      * fix log info
      45078d9f
    • H
      swish and pow op for xpu test=kunlun (#40654) · d951f3af
      houj04 提交于
      * swish and pow op for xpu. test=kunlun
      
      * fix code style. test=kunlun.
      
      * use pow_grad xdnn api. test=kunlun.
      d951f3af
    • H
      Optimize the onnxruntime code (#41044) · f12b5260
      heliqi 提交于
      f12b5260
    • P
      suppor inplace in tensor_method_setitem (#40915) · 7170c687
      pangyoki 提交于
      * suppor inplace in tensor_method_setitem
      
      * delete bump_inplace_version
      
      * optimize inplace unittest
      
      * fix
      
      * fix setitem bug
      
      * update eager_generator
      
      * optimize inplace unittest
      
      * little change
      7170c687
    • Z
      Refactor code auto-gene for no_need_buffer (#41025) · 97cd0f51
      zyfncg 提交于
      * refactor code auto-gene for no_need_buffer
      
      * fix some bug
      
      * delete test code
      97cd0f51
    • C
      [Phi]fix pad3d infermeta bug (#41020) · 9219495c
      chentianyu03 提交于
      * fix pad3d infermeta bug
      
      * add check for construct ScalarArray
      9219495c
    • Y
      change to new api in ssync mode (#41022) · 2089b485
      yaoxuefeng 提交于
      * change to new api in ssync mode
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      2089b485
    • P
      support view strategy in dygraph eager_final state (#40891) · 495ca4aa
      pangyoki 提交于
      * support view strategy in eager_final state
      
      * perfect reshape kernel
      
      * fix bugs of sig
      
      * add unittest for reshape_sig
      
      * fix bugs when run converage
      
      * fix inplace bug in final_state eager_gen
      
      * fix python_c_gen
      
      * support view strategy for final state
      
      * fix order of out and xshape in reshape
      
      * fix Coverage_CI unittest timeout error
      
      * support reshape view
      
      * fix reshape_sig
      
      * fix yml and api_base
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      495ca4aa
    • C
      fix double grad var judging (#41072) · 775ddb5a
      Chen Weihang 提交于
      775ddb5a