1. 25 2月, 2022 1 次提交
  2. 24 2月, 2022 1 次提交
  3. 19 2月, 2022 1 次提交
    • S
      Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61
      sneaxiy 提交于
      * add DistributedFusedLamb op
      
      * polish code
      
      * fix compile error
      
      * compatible with pten changement
      
      * fix rocm compile error
      
      * improve converage
      
      * update upstream/develop
      
      * fix cast_with_ptr.h
      
      * add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1
      
      * fix clip before allreduce
      
      * add use_master_param_norm
      
      * code polish
      
      * fix bug
      
      * fix ROCM ci
      5df3cd61
  4. 28 1月, 2022 1 次提交
  5. 27 1月, 2022 2 次提交
    • S
      Add Khop Graph Sampler API (#39146) · 35f949b5
      Siming Dai 提交于
      * add the test case for the UVA
      
      * add the context load for the uva
      
      * Add graph_sample kernel
      
      * Add graph_sample commit
      
      * add new commit for graph_sample
      
      * add unsigned long long int
      
      * delete some remarks
      
      * add cpu version
      
      * add cuda eids
      
      * add cpu eids
      
      * delete _uva
      
      * optimize speed: emplace_back, last_layer
      
      * add to_uva_tensor
      
      * add cpu return_eids choice
      
      * add gpu return_eids choice
      
      * add cpu reindex_nodes
      
      * add gpu reindex_nodes
      
      * rename op and add OMP for cpu
      
      * add incubate api
      
      * fix the compile problem for the PADDLE_ENFORE and different device
      
      * fix the rcom and windows compile problem
      
      * add unittest for graph_sample_neighbors
      
      * fix cpu unittest and unique problem
      
      * fix uva unittest, fix cuda unique problem
      
      * fix the windows compile problem
      
      * fix the windows rand_r compile problem
      
      * add correct unittest, add src_eids dispensable
      
      * delete black
      
      * combine uva unittest
      
      * mv Sample_index to Sample_Index; check input shape; fix random sample func
      
      * delete memset & cudaMemset
      
      * fix according to PR comments
      
      * fix rocm ci
      
      * modify function names according to the specification
      
      * fix windows_openblas ci
      
      * refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors
      
      * fix rocm ci
      
      * rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc
      
      * add data type
      
      * fix conflict
      Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
      35f949b5
    • Z
      Add SparseCooTensor and SparseCsrTensor (#38906) · a7edb3f3
      zhangkaihuo 提交于
      * fix bug:
      1. atten: set the default value of attn_dropout_rate to None
      2. ffn: add activation parameter
      
      * for pure fp16
      
      * Add a SparseCsrTensor
      
      * remove unused functional
      
      * remove const
      
      * remove SetMemoberTensor
      
      * remove non_zero_nums_, the number of non zero elements of each batch can be obtained from the crows
      
      * SparseCooTensor
      
      * add SetMember
      
      * merge upstream; add SetMember
      
      * merge upstream
      
      * merge upstream; add newline at end of file
      
      * add newline at end of file
      
      * remove newline at end of file
      
      * remove newline at end of file
      
      * stash
      
      * user pten::framework::make_ddim
      
      * user pten::framework::make_ddim
      
      * merge upstream; use the latest mutable_data
      
      * merge upstream; use the latest mutable_data
      
      * return mutable dense tensor
      a7edb3f3
  6. 22 12月, 2021 1 次提交
  7. 26 11月, 2021 1 次提交
  8. 23 11月, 2021 1 次提交
  9. 19 11月, 2021 2 次提交
    • W
      Add fuse_resnet_unit pass (#36818) · 3cd3bf29
      wuhuanzhou 提交于
      * GeneratePass support attr condition and mapping, test=develop
      
      * fix coverage, test=develop
      
      * Add fuse_resnet_unit pass, test=develop
      
      * fix CI errors, test=develop
      
      * fix CI errors, test=develop
      
      * fix unittest error when compiling without CUDA, test=develop
      
      * fix static ci error, test=develop
      
      * limit kernel size must equal 1, test=develop
      3cd3bf29
    • S
      Add paddle.incubate.graph_send_recv API (#37205) · 39012536
      Siming Dai 提交于
      * add cpu version, using set: sum, min, max
      
      * add cpu version: mean
      
      * improve cpu code and fix dynamic memory allcation problem
      
      * fix arg error, add index judge, delete fp16
      
      * fix bug in CudaAtomicMax and CudaAtomicMin
      
      * add CUDA version
      
      * fix grad_op bug for index
      
      * add op test, add correct cpu grad op
      
      * Add correct CUDA Mean grad
      
      * [Add] Successful MEAN and SUM
      
      * [Add] Successful MIN and MAX in CPU
      
      * [Add] Successful MIN and MAX in CUDA
      
      * fix windows dtype ci
      
      * fix ROCM ci by adding HIP flag
      
      * rename fused_gather_scatter to send_recv
      
      * unify name as send and recv
      
      * change zero index return time
      
      * add send_recv incubate api
      
      * fix index data type, add unittest case for API
      
      * delete redundant input tensor
      
      * fix en example and docs, add default value in pool_type
      
      * add shape judge and max grid judge
      
      * fix comment
      
      * fix index type bug
      
      * add const &
      
      * fix en docs
      
      * delete numpy in examples
      
      * add unittest for int input
      
      * fix send_recv comment
      
      * change send_recv to graph_send_recv
      39012536
  10. 16 11月, 2021 1 次提交
    • L
      Fix attn_bias_add bug. (#37147) · a9e7a854
      Li Min 提交于
      fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
      a9e7a854
  11. 12 11月, 2021 1 次提交
  12. 28 10月, 2021 1 次提交
  13. 27 10月, 2021 1 次提交
  14. 26 10月, 2021 2 次提交
    • L
      Add fused attention op backward and python layer. (#36498) · 5119428e
      Li Min 提交于
      功能:本PR的目标是提高attention模块的计算性能。
      为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
      为了减少防存开销,本PR采取了两种优化方法:
      (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
      (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
      5119428e
    • L
      Move fused_attention and fused_feedforward functional api path to incubate (#36704) · 9aeca2f1
      Li Min 提交于
      将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
      9aeca2f1
  15. 17 10月, 2021 1 次提交
  16. 16 10月, 2021 1 次提交
  17. 15 10月, 2021 1 次提交
  18. 26 9月, 2021 1 次提交
  19. 17 9月, 2021 1 次提交
  20. 16 9月, 2021 1 次提交
  21. 16 7月, 2021 1 次提交
  22. 15 7月, 2021 1 次提交
  23. 14 7月, 2021 1 次提交
  24. 12 7月, 2021 1 次提交
  25. 11 6月, 2021 1 次提交
  26. 22 4月, 2021 1 次提交
  27. 21 4月, 2021 1 次提交
  28. 30 3月, 2021 1 次提交
  29. 25 1月, 2021 1 次提交
  30. 13 1月, 2021 1 次提交
  31. 07 1月, 2021 1 次提交
  32. 08 12月, 2020 1 次提交
  33. 28 10月, 2020 1 次提交
  34. 12 10月, 2020 1 次提交
    • M
      refine adam/strided_slice && fix doc for rmsprop/unstack (#27740) · 84d8e49d
      MRXLT 提交于
      * refine parameters order && doc
      
      * update rmsprop doc
      
      * refine adam/transpose/unstack/stride_slice
      
      * fix bug && doc
      
      * fix doc
      
      * bug fix
      
      * bug fix
      
      * fix doc
      
      * fix doc
      
      * fix doc
      
      * fix doc
      
      * depercate old strided_slice
      
      * update doc
      
      * set default value for name
      
      * update doc
      84d8e49d
  35. 31 8月, 2020 1 次提交
    • Q
      Move hapi to python/paddle root dir. (#26442) · f7fb4c22
      qingqing01 提交于
      * Move hapi form paddle/incubate to paddle
      
      * Remove vision/datasets/utils.py and clean code
      
      * Add sample code for conll05
      
      * Print pull path when saving model
      
      * Fix sample code after paramter_list of SGD is changed to parameters
      
      * Fix bug in wmt16 datase
      f7fb4c22
  36. 28 8月, 2020 2 次提交