1. 15 4月, 2022 5 次提交
    • S
      gpu_graph engine optimization+ (#41455) · ce72690c
      seemingwang 提交于
      * extract sub-graph
      
      * graph-engine merging
      
      * fix
      
      * fix
      
      * fix heter-ps config
      
      * test performance
      
      * test performance
      
      * test performance
      
      * test
      
      * test
      
      * update bfs
      
      * change cmake
      
      * test
      
      * test gpu speed
      
      * gpu_graph_engine optimization
      
      * add ssd layer to graph_engine
      
      * fix allocation
      
      * fix syntax error
      
      * fix syntax error
      
      * fix pscore class
      
      * fix
      
      * recover test
      
      * recover test
      
      * fix spelling
      
      * recover
      
      * fix
      ce72690c
    • C
      [Phi]Reduce kernels into multiply files (#41747) · 1927aff9
      chentianyu03 提交于
      * split reduce_kernel
      
      * rm reduce_kernel in cmake
      
      * split reduce_grad kernels
      
      * fix cmake build error
      
      * format code
      
      * fix standalone_executor_test error
      1927aff9
    • D
      【GPUPS】add afsclient and gpupsutil (#41324) · 30a1213b
      danleifeng 提交于
      * add gpupsutil and afsclient; test=develop
      30a1213b
    • Z
      [XPUPS]fix hashtable_kernel.kps (#41790) · ef6ff4ef
      zmxdream 提交于
      * refactor heter comm kernel
      
      * update. test=develop
      
      * update calc_shard_offset. test=develop
      
      * update xpu kernel. test=develop
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * add optimizer kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update hashtable. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * template init. test=develop
      
      * hashtable template init. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix hashtable_kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      Co-authored-by: NWorgenZhang <frank08081993@gmail.com>
      ef6ff4ef
    • A
      [IPU] add mixed-precission support for ipu (#41733) · d7224482
      Allen Guo 提交于
      * add mixed-precission support for ipu
      
      * restore cast_model_to_fp16 api
      
      * update UTs
      d7224482
  2. 14 4月, 2022 6 次提交
    • L
      fbe2c311
    • L
      executor perf statistics (#41648) · cbe7466f
      liutiexing 提交于
      * executor perf statistics
      
      * fix ut
      
      * fix ut
      
      * fix ut
      
      * add ut
      
      * add ut
      cbe7466f
    • J
      Fix to #38693 (minimal UT) (#41026) · d0f3296b
      Jacek Czaja 提交于
      * Add UT
      
      - Added missed data_layout
      
      - Added missing conversions
      
      - NDHWC added
      
      - NDHWC support in data_transform
      
      - another fix
      
      - condddate change
      
      - fix
      
      u- fix
      
      - fix
      
      - fix
      
      - fix
      
      - fix
      
      - fix to hack
      
      - compilation fix
      
      - fix to automatic merge
      
      * - reduced UT
      
      * - fix
      
      * - lint
      
      * - fix to lint
      d0f3296b
    • S
      FC+elementwise_add (residual connection) (#41776) · 92d8d0bc
      Sławomir Siwek 提交于
      * Change tensor name to match activation
      
      * declare fc_eltwise_add pass
      
      * merge conv_eltwise refactor PR
      
      * first compilable draft
      
      * unittest feedback tools
      
      * Fuse pass tester
      
      * Move IsReachable() to shared file
      
      * 100% coverage of fuse_pass_tester.cc
      
      * register pass
      
      * Add bias node
      
      * Improve unit tests / remove bias node from pattern
      
      * improve fc_eltwiseadd_unittest
      
      * cancel eltwise_add fuse if act is already fused
      
      * Add elementwise_input scale
      
      * Residual MVP
      
      * Add new FC attrs
      
      * Add more test cases
      
      * Add missing op attrs
      
      * Adapt code to new Elementwise pattern
      
      * reuse existing fcpattern
      
      * improve code style
      
      * remove unused arguments
      
      * fix typo
      
      * remove whitespace
      
      * remove int8 related code
      
      * Remove attributes from base ops
      
      * style
      
      * style check
      
      * Remove input from base op
      
      * Set attribute during fuse
      
      * ut timeout
      
      * download and test model
      
      * DRY
      
      * apply feedback from review
      
      * Style check
      
      * fix typo
      
      * cosmetic changes
      
      * explicitly set residual as output
      
      * VIT-OCR accuracy check
      
      * trigger CI
      
      * remove whitespaces
      
      * fix missing data file
      92d8d0bc
    • B
      add mkldnn int8 pass [step3] (#41599) · 8e2d4d30
      baoachun 提交于
      * add mkldnn int8 pass [step3]
      
      * Add test for compute_propagate_scales_mkldnn_pass
      
      * update pass
      
      * update api comment and python api
      Co-authored-by: Nwozna <joanna.wozna@intel.com>
      8e2d4d30
    • J
      Added shuffle_channel BF16/FP32 FWD oneDNN kernel (#39756) · c7623d72
      jakpiase 提交于
      * added shuffle_channel bf16/fp32 fwd kernel
      
      * added missing files
      
      * CI fix
      
      * changed from pten to phi
      
      * tmp save
      
      * added reviewers suggestions
      
      * fix for test
      c7623d72
  3. 13 4月, 2022 5 次提交
  4. 12 4月, 2022 3 次提交
  5. 10 4月, 2022 3 次提交
  6. 09 4月, 2022 3 次提交
    • Z
      Unittest recover (#41431) · 7a07c4a5
      zhaocaibei123 提交于
      * update name
      
      * update name
      
      * fix test
      
      * fix fleet bind
      
      * update name
      
      * update name
      
      * fix test
      
      * fix gpups wrapper
      
      * remove Push/Pull/Load/Save with context in client and wrapper base class
      
      * fix
      
      * fix
      
      * remove some interface
      
      * fix
      
      * remove
      
      * code style
      
      * recover
      
      * fix
      
      * remove code unused
      
      * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable
      
      * fix
      
      * fix
      
      * fix
      
      * recover
      
      * remove unused code
      
      * recover unittest
      
      * fix
      
      * remove
      
      * fix
      
      * remove code unuseful
      
      * remove
      
      * fix
      
      * recover
      
      * remove
      Co-authored-by: Nesythan <esythan@126.com>
      7a07c4a5
    • L
      Autotune the workspace_size_limit in conv. (#40338) · b937cdc5
      limingshu 提交于
      * Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.
      
      * Use the system cudaMalloc and cudaFree to allocate workspace during searching.
      
      * Enable switch of two kind of workspace setting methods.
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      b937cdc5
    • L
      [new-exec] fix bug that no thread is waked up when adding task to threadpool (#41567) · f581f5bf
      Leo Chen 提交于
      * fix bug that no thread is waked up when adding task to threadpool
      
      * fix typo
      f581f5bf
  7. 07 4月, 2022 3 次提交
  8. 06 4月, 2022 1 次提交
  9. 05 4月, 2022 2 次提交
  10. 04 4月, 2022 2 次提交
  11. 03 4月, 2022 1 次提交
    • C
      [Phi]Concat grad (#41112) · 3f57ef7a
      chentianyu03 提交于
      * add concat_grad kernel
      
      * fix error
      
      * remove comment code
      
      * fix outs nullptr error
      
      * change to phi header
      
      * add concat_grad declare for standalone_executor_test
      3f57ef7a
  12. 02 4月, 2022 5 次提交
  13. 01 4月, 2022 1 次提交