1. 17 4月, 2022 2 次提交
    • C
      [Perf] Optimize dygraph scheduling performance (#41696) · 7ee31a96
      Chen Weihang 提交于
      * split phi and fluid infermeta context
      
      * resolve conflict
      
      * fix type error
      
      * optimize scheduling perf
      
      * spec small vector size
      
      * replace all grad var name
      
      * fix test failed
      
      * move init defalut signature
      
      * polish details
      
      * polish details
      
      * fix no init bug
      
      * init sig for tests
      
      * add init sig for infer
      
      * fix infrt error
      
      * fix infrt failed
      
      * fix kunlun error
      
      * fix infrt failed
      7ee31a96
    • C
      [CustomOp] Fix PlaceType related compat error (#41826) · b5d9c31c
      Chen Weihang 提交于
      * fix place type related compat error
      
      * fix test failed
      
      * remove dll decl
      
      * revert place type change
      
      * add dll decl
      b5d9c31c
  2. 16 4月, 2022 5 次提交
    • move fc_functor from fluid to phi.test=develop (#41856) · 21aa3adc
      王明冬 提交于
      21aa3adc
    • z8hanghuan's avatar
      modify xpu.cmake,*test=kunlun (#41832) · f3753b7f
      z8hanghuan 提交于
      * modify xpu.cmake,*test=kunlun
      
      * modify xpu.cmake,*test=kunlun
      
      * modify xpu.cmake,*test=kunlun
      
      * modify xpu.cmake,*test=kunlun
      f3753b7f
    • B
      fix_sharding_copy_right (#41849) · 5e5ae0a0
      Baibaifan 提交于
      5e5ae0a0
    • R
      Moe ref (#41864) · e9a63237
      Roc 提交于
      * moe ref
      
      * ref commit; test=document_fix
      
      * update; test=document_fix
      
      * update test=document_fix
      
      * update; test=document_fix
      e9a63237
    • L
      Lml/prim op pywrapper (#41813) · ebf4fe6e
      levi131 提交于
      * native commit for triple grad of sigmod
      
      * Updated unittests files
      
      * init functional jacobian api
      
      * Updated trible_test func
      
      * Updated gradient_checker & test_script
      
      * finish test with dtype float32
      
      * add float64 test case
      
      * polish code
      
      * use atol=1e-5 with dtype float64
      
      * fix for ci
      
      * set timeout for test_jacobian
      
      * fix dygraph grad to support high differential
      
      * polish API docstring
      
      * Updated gradient checker and some related files
      
      * fix double grad strip error for high differential
      
      * fix double grad strip error for high differential
      
      * Add Sigmoid triple grad tests
      
      * fix dygraph double grad dtype error when calling for high differential senario
      
      * Updated triple grad teses func
      
      * Use np.random to initialize ddx
      
      * Updated triple_grad_check func
      
      * add todo for gradient checker and refine some comments
      
      * remove additional code
      
      * add test for warnging in backward.py
      
      * format python code
      
      * support multi input in triple gradient checker
      
      * Add matmul triple grad kernel
      
      * Updated comments of TODO
      
      * Supported some special tests
      
      * Change code-format to follow CI std
      
      * Updated gradient_checker.py
      
      * Fix conflicts
      
      * Removed unnecessary printing log
      
      * Change code style to follow CI std
      
      * merge upstream
      
      * add priops.py
      
      * add_p
      
      * rm useless files
      
      * add sub_p mul_p div_p
      
      * add sqrt_p and tanh_p
      
      * add reshape_p
      
      * add broadcast_p
      
      * Add python primitive wrappers.
      
      * Jvp rules updated.
      
      * JVP rules done for all the 17 primops.
      
      * quick check and fixes.
      
      * add jvp(op, *args)
      
      * add broadcast_p fill_constant_p matmul_p reduce_p reshape_p transpose_p
      
      * add split_p and concat_p
      
      * add gather_p and scatter_add_p
      
      * add slice_select_p and slice_assign_p
      
      * Add transpose rules.
      
      * add multi input check for add_p, sub_p, mul_p, div_p
      
      * update concat_p
      
      * Linearize and transpose in progress..
      
      * refine gather_p and scatter_add_p
      
      * updated.
      
      * update transpose.
      
      * refine slice_assign_p and slice_select_p
      
      * init commit for lower
      
      * Merged with primitive ops.
      
      * small update
      
      * add rules for orig2prim and prim2orig
      
      * add 9 test for prim ops
      
      * add more test and fix some bug
      
      * add more test
      
      * register proto
      
      * Adding primops test.
      
      * add shape valid check for broadcast_p op, and add keepdim attr into reduce_p op proto
      
      * support multi input and multi output for split_p and concat_p
      
      * Test updated.
      
      * update
      
      * fix slice bug for slice_select_p and slice_assign_p
      
      * updated.
      
      * Ops updated.
      
      * Refactor and bug fixes.
      
      * updated.
      
      * finish orig2prim and prim2orig rules
      
      * dtype for axis attr should be long int
      
      * update dtype for axis attr int64_t
      
      * update for iscan CI
      
      * Update primx.
      
      * Refactor vars in primx.
      
      * update for lower transform
      
      * update primx.py
      
      * update
      
      * Fix linearize and transpose.
      
      * Update is_dot
      
      * Update is_dot
      
      * Update is_dot
      
      * add gradient aggregation, fix add_transpose.
      
      * pass first linearize+transpose test.
      
      * update test
      
      * add_prim_op_pywrapper
      
      * Add primops UT
      
      * Fix set_value and update
      
      * Fix code format and PR-CI-Coverage
      Co-authored-by: Nveyron95 <veyron_wu@163.com>
      Co-authored-by: NJiabin Yang <360788950@qq.com>
      Co-authored-by: NTongxin Bai <waffle.bai@gmail.com>
      Co-authored-by: N0x45f <wangzhen45@baidu.com>
      ebf4fe6e
  3. 15 4月, 2022 25 次提交
    • Z
      solve brpc compile in arm-ubantu18 (#41649) · 56dafc4f
      ziyoujiyi 提交于
      * back fl
      
      * delete ssl cert
      
      * .
      
      * make warning
      
      * .
      
      * unittest paral degree
      
      * solve unittest
      
      * heter & multi cloud commm ready
      
      * .
      
      * .
      
      * arm_brpc compile
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * only output is ok
      
      * base is ok
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * add switch server bin
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * adapt brpc ssl
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      
      * .
      56dafc4f
    • S
      gpu_graph engine optimization+ (#41455) · ce72690c
      seemingwang 提交于
      * extract sub-graph
      
      * graph-engine merging
      
      * fix
      
      * fix
      
      * fix heter-ps config
      
      * test performance
      
      * test performance
      
      * test performance
      
      * test
      
      * test
      
      * update bfs
      
      * change cmake
      
      * test
      
      * test gpu speed
      
      * gpu_graph_engine optimization
      
      * add ssd layer to graph_engine
      
      * fix allocation
      
      * fix syntax error
      
      * fix syntax error
      
      * fix pscore class
      
      * fix
      
      * recover test
      
      * recover test
      
      * fix spelling
      
      * recover
      
      * fix
      ce72690c
    • R
      Moe ref (#41836) · c37af19c
      Roc 提交于
      * moe ref
      
      * ref commit; test=document_fix
      
      * update; test=document_fix
      
      * update test=document_fix
      c37af19c
    • H
      e25b75b6
    • C
      [Yaml]add adamw yaml (#41678) · ea0a164b
      chentianyu03 提交于
      * add adamw yaml
      
      * fix test case error
      
      * make the name of weight and bias in linear1 and linear2 to be constant
      ea0a164b
    • C
      [Phi]Reduce kernels into multiply files (#41747) · 1927aff9
      chentianyu03 提交于
      * split reduce_kernel
      
      * rm reduce_kernel in cmake
      
      * split reduce_grad kernels
      
      * fix cmake build error
      
      * format code
      
      * fix standalone_executor_test error
      1927aff9
    • Z
      [DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode (#41730) · 27f28e82
      Zhanlue Yang 提交于
      * [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad
      
      * Fixed elementwise issue
      
      * Addressed CI failures
      
      * [DoubleGrad] Enabled test_imperative_triple_grad test cases under eager_mode
      
      * [DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode
      
      * Enabled more test cases
      
      * [DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode
      
      * Adjusted test_imperative_star_gan_with_gradient_penalty.py
      27f28e82
    • H
      [Dygraph] Refactor Model Parallel in eager mode (#41761) · e6fb6599
      Haohongxiang 提交于
      * refactor mp in eager mode
      
      * update
      
      * update
      
      * add uts
      e6fb6599
    • T
      ff818c77
    • L
      update (#41762) · 482e5b6c
      lilong12 提交于
      482e5b6c
    • D
      【GPUPS】add afsclient and gpupsutil (#41324) · 30a1213b
      danleifeng 提交于
      * add gpupsutil and afsclient; test=develop
      30a1213b
    • F
      [MLU] add mlu softmax kernel (#41816) · 2d6b71a2
      fwenguang 提交于
      2d6b71a2
    • J
      Add eager string tensor (#41039) · a22b68b8
      Jack Zhou 提交于
      * Add core.eager.StringTensor __init__ which pyarray args can be passed
      
      * Add the numpy method of core.eager.StringTensor
      
      * revert tensor.to_string modification
      
      * Add ToPyObject for core.eager.StringTensor
      
      * Add debug string for core.eager.StringTensor
      
      * Remove place args of core.eager.StringTensor temporarily
      
      * Fix check string_tensor error
      
      * remove dtype of core.eager.StringTensor
      
      * add core.eager.StringTensor unittest
      
      * remove pstring from VarDesc
      
      * Add InitStringTensorWithStringTensor
      
      * Remove to_string modification
      
      * Remove zero_copy arg from StringTensor creator
      a22b68b8
    • Z
      [XPUPS]fix hashtable_kernel.kps (#41790) · ef6ff4ef
      zmxdream 提交于
      * refactor heter comm kernel
      
      * update. test=develop
      
      * update calc_shard_offset. test=develop
      
      * update xpu kernel. test=develop
      
      * update args of calc_shard_offset
      
      * update. test=develop
      
      * remove customGradMerger
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update optimizer kernel
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * add optimizer kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix kunlun not support size_t. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update hashtable. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * template init. test=develop
      
      * hashtable template init. test=develop
      
      * fix. test=develop
      
      * fix. test=devlop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix hashtable_kernel. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      Co-authored-by: NWorgenZhang <frank08081993@gmail.com>
      ef6ff4ef
    • A
      [IPU] add mixed-precission support for ipu (#41733) · d7224482
      Allen Guo 提交于
      * add mixed-precission support for ipu
      
      * restore cast_model_to_fp16 api
      
      * update UTs
      d7224482
    • C
      polish tensor depreacted method warning (#41807) · e83e44c7
      Chen Weihang 提交于
      e83e44c7
    • Z
      Add API: Sparse Convolution3D (#41434) · 1665594d
      zhangkaihuo 提交于
      1665594d
    • P
      support no_need_buffer in eager_fluid state (#41720) · 840d2eb6
      pangyoki 提交于
      * support no_need_buffer in eager_fluid state
      
      * change no_need_buffer info from fwd_info to bwd_info
      
      * fix CI fail, gru_unit donnot use no_need_buffer
      
      * fix conflict between no_need_buffer and dispensable
      
      * use tensor.define in dispensable
      
      * solve conflict
      
      * solve conflict
      840d2eb6
    • A
    • Z
    • L
      Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda
      limingshu 提交于
      * change cudnn helper for auto-tune
      
      * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.
      
      * Fix the bug in calculating and printing current step cache hit rate.
      
      * Improve the autotune cache and fix unittest.
      
      * Change the key from AlgorithmType to int64_t.
      
      * Fix unittest for cpu-only env.
      
      * change ChooseAlgoByWorkspace for heuristic mode
      Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
      35acfeda
    • F
      [MLU] add mlu activation kernels (#41751) · 10114859
      fwenguang 提交于
      10114859
    • F
      [MLU] add mlu new profiler (#41138) · fc208b7e
      fwenguang 提交于
      * [MLU] add mlu new profiler
      
      * fix format
      fc208b7e
    • C
      [Auto Parallel]update cluster (#41722) · 605552a9
      caozhou 提交于
      * update cluster
      605552a9
    • H
      fix batch norm memory issue (#41717) · 42abcc08
      hong 提交于
      * try to fix batch norm memory issue
      
      * fix batch norm memroy alloc bug
      
      * polish some code
      42abcc08
  4. 14 4月, 2022 8 次提交