1. 15 4月, 2022 25 次提交
    • Z
      solve brpc compile in arm-ubantu18 (#41649) · 56dafc4f
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * arm_brpc compile
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * only output is ok
      
      * base is ok
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * add switch server bin
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * adapt brpc ssl
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      56dafc4f
    • S
      gpu_graph engine optimization+ (#41455) · ce72690c
      seemingwang 提交于
      * extract sub-graph
      
      * graph-engine merging
      
      * fix
      
      * fix
      
      * fix heter-ps config
      
      * test performance
      
      * test performance
      
      * test performance
      
      * test
      
      * test
      
      * update bfs
      
      * change cmake
      
      * test
      
      * test gpu speed
      
      * gpu_graph_engine optimization
      
      * add ssd layer to graph_engine
      
      * fix allocation
      
      * fix syntax error
      
      * fix syntax error
      
      * fix pscore class
      
      * fix
      
      * recover test
      
      * recover test
      
      * fix spelling
      
      * recover
      
      * fix
      ce72690c
    • R
      Moe ref (#41836) · c37af19c
      Roc 提交于
      * moe ref
      
      * ref commit; test=document_fix
      
      * update; test=document_fix
      
      * update test=document_fix
      c37af19c
    • H
      e25b75b6
    • C
      [Yaml]add adamw yaml (#41678) · ea0a164b
      chentianyu03 提交于
      * add adamw yaml
      
      * fix test case error
      
      * make the name of weight and bias in linear1 and linear2 to be constant
      ea0a164b
    • C
      [Phi]Reduce kernels into multiply files (#41747) · 1927aff9
      chentianyu03 提交于
      * split reduce_kernel
      
      * rm reduce_kernel in cmake
      
      * split reduce_grad kernels
      
      * fix cmake build error
      
      * format code
      
      * fix standalone_executor_test error
      1927aff9
    • Z
      [DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode (#41730) · 27f28e82
      Zhanlue Yang 提交于
      * [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad
      
      * Fixed elementwise issue
      
      * Addressed CI failures
      
      * [DoubleGrad] Enabled test_imperative_triple_grad test cases under eager_mode
      
      * [DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode
      
      * Enabled more test cases
      
      * [DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode
      
      * Adjusted test_imperative_star_gan_with_gradient_penalty.py
      27f28e82
    • H
      [Dygraph] Refactor Model Parallel in eager mode (#41761) · e6fb6599
      Haohongxiang 提交于
      * refactor mp in eager mode
      
      * update
      
      * update
      
      * add uts
      e6fb6599
    • T
      ff818c77
    • L
      update (#41762) · 482e5b6c
      lilong12 提交于
      482e5b6c
    • D
      【GPUPS】add afsclient and gpupsutil (#41324) · 30a1213b
      danleifeng 提交于
      * add gpupsutil and afsclient; test=develop
      30a1213b
    • F
      [MLU] add mlu softmax kernel (#41816) · 2d6b71a2
      fwenguang 提交于
      2d6b71a2
    • J
      Add eager string tensor (#41039) · a22b68b8
      Jack Zhou 提交于
      * Add core.eager.StringTensor __init__ which pyarray args can be passed
      
      * Add the numpy method of core.eager.StringTensor
      
      * revert tensor.to_string modification
      
      * Add ToPyObject for core.eager.StringTensor
      
      * Add debug string for core.eager.StringTensor
      
      * Remove place args of core.eager.StringTensor temporarily
      
      * Fix check string_tensor error
      
      * remove dtype of core.eager.StringTensor
      
      * add core.eager.StringTensor unittest
      
      * remove pstring from VarDesc
      
      * Add InitStringTensorWithStringTensor
      
      * Remove to_string modification
      
      * Remove zero_copy arg from StringTensor creator
      a22b68b8
    • Z
      [XPUPS]fix hashtable_kernel.kps (#41790) · ef6ff4ef
      zmxdream 提交于
      * refactor heter comm kernel
      
      * update. test=develop
      
      * update calc_shard_offset. test=develop
      
      * update xpu kernel. test=develop
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * add optimizer kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update hashtable. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * template init. test=develop
      
      * hashtable template init. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix hashtable_kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      Co-authored-by: NWorgenZhang <frank08081993@gmail.com>
      ef6ff4ef
    • A
      [IPU] add mixed-precission support for ipu (#41733) · d7224482
      Allen Guo 提交于
      * add mixed-precission support for ipu
      
      * restore cast_model_to_fp16 api
      
      * update UTs
      d7224482
    • C
      polish tensor depreacted method warning (#41807) · e83e44c7
      Chen Weihang 提交于
      e83e44c7
    • Z
      Add API: Sparse Convolution3D (#41434) · 1665594d
      zhangkaihuo 提交于
      1665594d
    • P
      support no_need_buffer in eager_fluid state (#41720) · 840d2eb6
      pangyoki 提交于
      * support no_need_buffer in eager_fluid state
      
      * change no_need_buffer info from fwd_info to bwd_info
      
      * fix CI fail, gru_unit donnot use no_need_buffer
      
      * fix conflict between no_need_buffer and dispensable
      
      * use tensor.define in dispensable
      
      * solve conflict
      
      * solve conflict
      840d2eb6
    • A
    • Z
    • L
      Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda
      limingshu 提交于
      * change cudnn helper for auto-tune
      
      * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.
      
      * Fix the bug in calculating and printing current step cache hit rate.
      
      * Improve the autotune cache and fix unittest.
      
      * Change the key from AlgorithmType to int64_t.
      
      * Fix unittest for cpu-only env.
      
      * change ChooseAlgoByWorkspace for heuristic mode
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      35acfeda
    • F
      [MLU] add mlu activation kernels (#41751) · 10114859
      fwenguang 提交于
      10114859
    • F
      [MLU] add mlu new profiler (#41138) · fc208b7e
      fwenguang 提交于
      * [MLU] add mlu new profiler
      
      * fix format
      fc208b7e
    • C
      [Auto Parallel]update cluster (#41722) · 605552a9
      caozhou 提交于
      * update cluster
      605552a9
    • H
      fix batch norm memory issue (#41717) · 42abcc08
      hong 提交于
      * try to fix batch norm memory issue
      
      * fix batch norm memroy alloc bug
      
      * polish some code
      42abcc08
  2. 14 4月, 2022 15 次提交