1. 17 6月, 2022 1 次提交
  2. 29 4月, 2022 1 次提交
    • W
      [cherry-pick 2.3] Add fused_multi_transformer op to optimize transformer... · 50bfe420
      WangXi 提交于
      [cherry-pick 2.3] Add fused_multi_transformer op to optimize transformer generation performance (#42311)
      
      * Add fused_multi_transformer op to optimize transformer generation performance (#41814)
      
      * fix fused_multi_transformer compile failed in cuda arch < sm53 (#42315)
      
      * fix ci timeout
      50bfe420
  3. 28 4月, 2022 1 次提交
  4. 27 4月, 2022 2 次提交
  5. 26 4月, 2022 1 次提交
  6. 18 4月, 2022 1 次提交
    • R
      fix bugs in moe (#41903) · f92dbfb7
      Roc 提交于
      * fix moe apis (#41650)
      
      * Moe ref (#41836)
      
      * moe ref
      
      * ref commit
      
      * update; document_fix
      
      * update;document_fix
      
      * Moe ref (#41864)
      
      * moe ref
      
      * ref commit; document_fix
      
      * update; document_fix
      
      * update document_fix
      
      * update; document_fix
      f92dbfb7
  7. 12 4月, 2022 1 次提交
    • Y
      [Cherry-Pick]Add... · a0b0a32f
      YuanRisheng 提交于
      [Cherry-Pick]Add hard_swish/kron/linspace/logit/graph_send_recv/multi_dot/maxout/multiplex op yaml file  (#41566)
      
      * [Phi]Add graph_send_recv yaml file (#41206)
      
      * add graph_send_recv yaml
      
      * deal with confict
      
      * fix compile bugs
      
      * cherry-pick pr 41298
      
      * cherry-pick pr41550
      
      * fix compile bugs
      a0b0a32f
  8. 11 4月, 2022 1 次提交
  9. 04 4月, 2022 1 次提交
  10. 02 4月, 2022 2 次提交
    • S
      Add graph apis (#40809) · b0398c8e
      Siming Dai 提交于
      * Add graph_reindex API
      
      * add graph_sample_neighbors api
      
      * Add buffer
      
      * delete VLOG
      
      * delete thrust::copy for output
      
      * add ShareDataWith
      
      * delete graph_reindex hashtable output
      
      * add graph_reindex dispensable
      
      * add reindex unittest, move memset to cuda kernel, change api
      
      * fix conflict
      
      * add reindex buffer for gpu version note
      
      * fix conflicts for op_func_generator
      
      * Add fisher_yates sampling, add dispensable, change infermeta
      
      * add dtype for edge_id
      
      * fix rocm ci and static check ci
      
      * add unittest
      
      * fix unittest
      
      * fix unittest
      
      * fix bug
      b0398c8e
    • X
      Enhance vjp/jvp/Jacobian/Hessian API for supporting dynamic, static graph and... · 9e764d82
      Xiaoxu Chen 提交于
      Enhance vjp/jvp/Jacobian/Hessian API for supporting dynamic, static graph and batched, unbatched mode (#40692)
      
      * modify vjp/jvp for both dynamic and static graph
      
      * enforce jacobian class for supporting first/last batch
      
      * add unittest for jvp, jacobian withlast batch, jacobian with first batch
      
      * fix the incorrect shape when multi-index Jacobian
      
      * enforce Hessian class for supporting dynamic graph
      
      * add Hessian class unittest
      
      * bugfix, jvp double_backward_trick zeros_like return stop_gradient=True in static graph
      
      * add API beta warnnings
      
      * add white_list for cuda11.x ci windows.
      
      * optimize some code snippets and documments
      
      * set unittest timeout to 100 seconds
      
      * move vjp,jvp,Jacobian,Hessian to incubate
      
      * fix vjp,vjp import path of sample code
      
      * fix code style error of augtograd/__init__ file
      9e764d82
  11. 01 4月, 2022 3 次提交
  12. 31 3月, 2022 1 次提交
    • S
      [New API]: miminize_bfgs and miminize_lbfgs (#40710) · e7928a06
      Sing_chan 提交于
      * [New API]: miminize_bfgs and miminize_lbfgs
      
      * modify for python module call correctly
      
      * add functional package, add error raise in static_graph, change assign to set_value
      
      * unify static_graph and dygraph, fix bug when x or H0 is float64
      
      * now only accept input is tensor, put check args in utils.py, put exception test together
      
      * temp
      
      * add more detailed algorithm illustration and comment, reduce test case to limit test time in 15s
      
      * change in_dygraph_mode to in_dynamic_mode
      
      * fix bug of sample code; reduce test case to reduce test time
      
      * change dir to incubate
      e7928a06
  13. 30 3月, 2022 1 次提交
    • R
      [MoE] Moe apis (#41092) · aac7879a
      Roc 提交于
      * add random routing op
      
      add _random_routing api in utils
      
      add random routing ut
      
      * # This is a combination of 10 commits.
      # The first commit's message is:
      add expert count op
      
      add ut for expert_count
      
      # This is the 2nd commit message:
      
      update UT only for cuda
      
      # This is the 3rd commit message:
      
      fix for rocm
      
      # This is the 4th commit message:
      
      update ut
      
      # This is the 5th commit message:
      
      add moe module
      
      # This is the 6th commit message:
      
      add expert count op
      
      add ut for expert_count
      
      # This is the 7th commit message:
      
      update UT only for cuda
      
      # This is the 8th commit message:
      
      update ut
      
      # This is the 9th commit message:
      
      add moe module
      
      # This is the 10th commit message:
      
      make expert count private
      
      * add assign pos op
      
      * fix upper num name
      
      * add api _assign pos
      
      * add ut for assign pos op
      
      * update date
      
      * add op about moe gate
      
      update utils
      
      add limit by capacity op
      
      add ut for limit_by_capacity
      
      add ut for prune_gate_by_capacity
      
      add ut for limit_by_capacity
      
      add ut for prune_gate_by_capacity
      
      * fix for win
      
      * fix bugs in test_limit_by_capacity_op
      
      * update ut
      
      * update for test (timeout)
      
      * fix ut
      
      * update
      
      * update(fix) ut for win
      
      * moe apis in incubate
      
      * # This is a combination of 10 commits.
      # The first commit's message is:
      add expert count op
      
      add ut for expert_count
      
      # This is the 2nd commit message:
      
      update UT only for cuda
      
      # This is the 3rd commit message:
      
      fix for rocm
      
      # This is the 4th commit message:
      
      update ut
      
      # This is the 5th commit message:
      
      add moe module
      
      # This is the 6th commit message:
      
      add expert count op
      
      add ut for expert_count
      
      # This is the 7th commit message:
      
      update UT only for cuda
      
      # This is the 8th commit message:
      
      update ut
      
      # This is the 9th commit message:
      
      add moe module
      
      # This is the 10th commit message:
      
      make expert count private
      
      * add assign pos op
      
      * fix upper num name
      
      * add api _assign pos
      
      * add ut for assign pos op
      
      * update date
      
      * fix for win
      
      * update for test (timeout)
      
      * fix ut
      
      * update
      
      * fix ut for number count
      
      * add apis and utils
      
      * add gate apis
      
      * add moe and grad clip apis
      
      * update moe apis
      
      * add ops for moe gate
      
      * fix
      
      * update for base moe layer api
      
      * add random routing op
      
      add _random_routing api in utils
      
      add random routing ut
      
      * fix for dygraph
      
      * update with ranodm routing
      
      * update
      
      * fix ut for limit by capacity
      
      * update
      
      * update limit by capacity for easily to switch to single thread mode
      
      * update api docs
      Co-authored-by: Nhlygit66666 <2570058140@qq.com>
      aac7879a
  14. 29 3月, 2022 1 次提交
    • R
      [MoE] Moe apis (#40895) · aeade538
      Roc 提交于
      * add random routing op
      
      add _random_routing api in utils
      
      add random routing ut
      
      * # This is a combination of 10 commits.
      # The first commit's message is:
      add expert count op
      
      add ut for expert_count
      
      # This is the 2nd commit message:
      
      update UT only for cuda
      
      # This is the 3rd commit message:
      
      fix for rocm
      
      # This is the 4th commit message:
      
      update ut
      
      # This is the 5th commit message:
      
      add moe module
      
      # This is the 6th commit message:
      
      add expert count op
      
      add ut for expert_count
      
      # This is the 7th commit message:
      
      update UT only for cuda
      
      # This is the 8th commit message:
      
      update ut
      
      # This is the 9th commit message:
      
      add moe module
      
      # This is the 10th commit message:
      
      make expert count private
      
      * add assign pos op
      
      * fix upper num name
      
      * add api _assign pos
      
      * add ut for assign pos op
      
      * update date
      
      * add op about moe gate
      
      update utils
      
      add limit by capacity op
      
      add ut for limit_by_capacity
      
      add ut for prune_gate_by_capacity
      
      add ut for limit_by_capacity
      
      add ut for prune_gate_by_capacity
      
      * fix for win
      
      * fix bugs in test_limit_by_capacity_op
      
      * update ut
      
      * update for test (timeout)
      
      * fix ut
      
      * update
      
      * update(fix) ut for win
      
      * moe apis in incubate
      
      * # This is a combination of 10 commits.
      # The first commit's message is:
      add expert count op
      
      add ut for expert_count
      
      # This is the 2nd commit message:
      
      update UT only for cuda
      
      # This is the 3rd commit message:
      
      fix for rocm
      
      # This is the 4th commit message:
      
      update ut
      
      # This is the 5th commit message:
      
      add moe module
      
      # This is the 6th commit message:
      
      add expert count op
      
      add ut for expert_count
      
      # This is the 7th commit message:
      
      update UT only for cuda
      
      # This is the 8th commit message:
      
      update ut
      
      # This is the 9th commit message:
      
      add moe module
      
      # This is the 10th commit message:
      
      make expert count private
      
      * add assign pos op
      
      * fix upper num name
      
      * add api _assign pos
      
      * add ut for assign pos op
      
      * update date
      
      * fix for win
      
      * update for test (timeout)
      
      * fix ut
      
      * update
      
      * fix ut for number count
      
      * add apis and utils
      
      * add gate apis
      
      * add moe and grad clip apis
      
      * update moe apis
      
      * add ops for moe gate
      
      * fix
      
      * update for base moe layer api
      
      * add random routing op
      
      add _random_routing api in utils
      
      add random routing ut
      
      * fix for dygraph
      
      * update with ranodm routing
      
      * update
      
      * fix ut for limit by capacity
      
      * update
      Co-authored-by: Nhlygit66666 <2570058140@qq.com>
      aeade538
  15. 25 3月, 2022 1 次提交
    • J
      Refactor Dygraph Flags (#40786) · 3085d5e4
      Jiabin Yang 提交于
      * refactor eager flags
      
      * fix flags error when we switch from eager to dygraph
      
      * fix ci problem
      
      * fix ci
      
      * fix ci
      
      * merge develop and fix code style
      
      * merge develop and fix code style
      
      * fix op test error
      
      * fix op test error
      
      * fix op test error
      
      * fix op test error
      
      * fix op test error
      
      * merge develop
      3085d5e4
  16. 22 3月, 2022 1 次提交
    • S
      [phi] Update graph_send_recv OP (#40509) · 67b46e45
      Siming Dai 提交于
      * add out_size shape for graph_send_recv
      
      * fix bug in register kernel: no const int& support
      
      * add out_size in infermeta
      
      * change unittest
      
      * fix unittest
      
      * fix out_size default value
      
      * fix doc
      
      * delete arg mapping
      
      * add sig
      
      * move -1 to 0
      
      * move -1 to 0
      67b46e45
  17. 16 3月, 2022 1 次提交
  18. 14 3月, 2022 1 次提交
  19. 11 3月, 2022 1 次提交
  20. 01 3月, 2022 1 次提交
  21. 25 2月, 2022 1 次提交
  22. 24 2月, 2022 1 次提交
  23. 19 2月, 2022 1 次提交
    • S
      Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61
      sneaxiy 提交于
      * add DistributedFusedLamb op
      
      * polish code
      
      * fix compile error
      
      * compatible with pten changement
      
      * fix rocm compile error
      
      * improve converage
      
      * update upstream/develop
      
      * fix cast_with_ptr.h
      
      * add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1
      
      * fix clip before allreduce
      
      * add use_master_param_norm
      
      * code polish
      
      * fix bug
      
      * fix ROCM ci
      5df3cd61
  24. 28 1月, 2022 1 次提交
  25. 27 1月, 2022 2 次提交
    • S
      Add Khop Graph Sampler API (#39146) · 35f949b5
      Siming Dai 提交于
      * add the test case for the UVA
      
      * add the context load for the uva
      
      * Add graph_sample kernel
      
      * Add graph_sample commit
      
      * add new commit for graph_sample
      
      * add unsigned long long int
      
      * delete some remarks
      
      * add cpu version
      
      * add cuda eids
      
      * add cpu eids
      
      * delete _uva
      
      * optimize speed: emplace_back, last_layer
      
      * add to_uva_tensor
      
      * add cpu return_eids choice
      
      * add gpu return_eids choice
      
      * add cpu reindex_nodes
      
      * add gpu reindex_nodes
      
      * rename op and add OMP for cpu
      
      * add incubate api
      
      * fix the compile problem for the PADDLE_ENFORE and different device
      
      * fix the rcom and windows compile problem
      
      * add unittest for graph_sample_neighbors
      
      * fix cpu unittest and unique problem
      
      * fix uva unittest, fix cuda unique problem
      
      * fix the windows compile problem
      
      * fix the windows rand_r compile problem
      
      * add correct unittest, add src_eids dispensable
      
      * delete black
      
      * combine uva unittest
      
      * mv Sample_index to Sample_Index; check input shape; fix random sample func
      
      * delete memset & cudaMemset
      
      * fix according to PR comments
      
      * fix rocm ci
      
      * modify function names according to the specification
      
      * fix windows_openblas ci
      
      * refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors
      
      * fix rocm ci
      
      * rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc
      
      * add data type
      
      * fix conflict
      Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
      35f949b5
    • Z
      Add SparseCooTensor and SparseCsrTensor (#38906) · a7edb3f3
      zhangkaihuo 提交于
      * fix bug:
      1. atten: set the default value of attn_dropout_rate to None
      2. ffn: add activation parameter
      
      * for pure fp16
      
      * Add a SparseCsrTensor
      
      * remove unused functional
      
      * remove const
      
      * remove SetMemoberTensor
      
      * remove non_zero_nums_, the number of non zero elements of each batch can be obtained from the crows
      
      * SparseCooTensor
      
      * add SetMember
      
      * merge upstream; add SetMember
      
      * merge upstream
      
      * merge upstream; add newline at end of file
      
      * add newline at end of file
      
      * remove newline at end of file
      
      * remove newline at end of file
      
      * stash
      
      * user pten::framework::make_ddim
      
      * user pten::framework::make_ddim
      
      * merge upstream; use the latest mutable_data
      
      * merge upstream; use the latest mutable_data
      
      * return mutable dense tensor
      a7edb3f3
  26. 22 12月, 2021 1 次提交
  27. 26 11月, 2021 1 次提交
  28. 23 11月, 2021 1 次提交
  29. 19 11月, 2021 2 次提交
    • W
      Add fuse_resnet_unit pass (#36818) · 3cd3bf29
      wuhuanzhou 提交于
      * GeneratePass support attr condition and mapping, test=develop
      
      * fix coverage, test=develop
      
      * Add fuse_resnet_unit pass, test=develop
      
      * fix CI errors, test=develop
      
      * fix CI errors, test=develop
      
      * fix unittest error when compiling without CUDA, test=develop
      
      * fix static ci error, test=develop
      
      * limit kernel size must equal 1, test=develop
      3cd3bf29
    • S
      Add paddle.incubate.graph_send_recv API (#37205) · 39012536
      Siming Dai 提交于
      * add cpu version, using set: sum, min, max
      
      * add cpu version: mean
      
      * improve cpu code and fix dynamic memory allcation problem
      
      * fix arg error, add index judge, delete fp16
      
      * fix bug in CudaAtomicMax and CudaAtomicMin
      
      * add CUDA version
      
      * fix grad_op bug for index
      
      * add op test, add correct cpu grad op
      
      * Add correct CUDA Mean grad
      
      * [Add] Successful MEAN and SUM
      
      * [Add] Successful MIN and MAX in CPU
      
      * [Add] Successful MIN and MAX in CUDA
      
      * fix windows dtype ci
      
      * fix ROCM ci by adding HIP flag
      
      * rename fused_gather_scatter to send_recv
      
      * unify name as send and recv
      
      * change zero index return time
      
      * add send_recv incubate api
      
      * fix index data type, add unittest case for API
      
      * delete redundant input tensor
      
      * fix en example and docs, add default value in pool_type
      
      * add shape judge and max grid judge
      
      * fix comment
      
      * fix index type bug
      
      * add const &
      
      * fix en docs
      
      * delete numpy in examples
      
      * add unittest for int input
      
      * fix send_recv comment
      
      * change send_recv to graph_send_recv
      39012536
  30. 16 11月, 2021 1 次提交
    • L
      Fix attn_bias_add bug. (#37147) · a9e7a854
      Li Min 提交于
      fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
      a9e7a854
  31. 12 11月, 2021 1 次提交
  32. 28 10月, 2021 1 次提交
  33. 27 10月, 2021 1 次提交
  34. 26 10月, 2021 1 次提交
    • L
      Add fused attention op backward and python layer. (#36498) · 5119428e
      Li Min 提交于
      功能:本PR的目标是提高attention模块的计算性能。
      为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
      为了减少防存开销,本PR采取了两种优化方法:
      (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
      (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
      5119428e