1. 23 2月, 2022 4 次提交
    • S
      Add ProcessGroupNCCL for distributed training (#39737) · 0b205817
      ShenLiang 提交于
      * add processgroup_nccl
      0b205817
    • Z
      ca11a0e5
    • mhhhh1's avatar
      [MLU] add cncl parallel context and mlu resource pool (#39803) · 6241913b
      mhhhh1 提交于
      * [MLU] add cncl parallel context and mlu resource pool
      
      * [MLU] fix the cncl_context_test
      6241913b
    • W
      [Eager] Support Eager mode for some model testcase (#39248) · abe232d8
      wanghuancoder 提交于
      * eager, test=develop
      
      * fix bug, test=develop
      
      * eager, test=develop
      
      * merge legacy to fluid
      
      * eager, test=develop
      
      * eager, test=develop
      
      * Refactor TensorAdd func by template and remove gradient_accumulation in eager
      
      * Remove needless target name
      
      * eager, test=develop
      
      * eager, test=develop
      
      * Use overload instead of template
      
      * Remove legacy code
      
      * Remove legacy code
      
      * selectedrows, test=develop
      
      * Remove DataType test
      
      * eager, test=develop
      
      * eager, test=develop
      
      * support gan, test=develop
      
      * Using Tensor directly instead of using EagerTensor
      
      * support gradient_accumulation
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      
      * refine code
      
      * ptb, test=develop
      
      * Rename all EagerTensor to Tensor
      
      * Rename some EagerTensor to Tensor
      
      * rename EagerTensor to EagerVariable
      
      * eager, test=develop
      
      * eager, test=develop
      
      * eager, test=develop
      
      * eager, test=develop
      
      * add more test
      
      * eager, test=develop
      
      * Support copiable selected rows and merge develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * clear grad, test=develop
      
      * merge, develop
      
      * merge, develop
      Co-authored-by: NJiabinYang <360788950@qq.com>
      Co-authored-by: NWeilong Wu <veyron_wu@163.com>
      abe232d8
  2. 22 2月, 2022 3 次提交
  3. 20 2月, 2022 1 次提交
  4. 19 2月, 2022 2 次提交
    • A
      [Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264
      Aurelius84 提交于
      * Unify paddle/pten::framework::ddim into pten::ddim
      
      * fix paddle namespace
      
      * compile sucessfully
      
      * fix npu src file
      
      * fix conflict
      
      * fix conflict
      
      * fix tensorrt compiler error
      
      * fix conflict
      
      * fix conflict
      
      * fix tesst file conflict
      
      * fix conflict
      
      * fix mlu file conflict
      
      * fix mlu file conflict
      
      * fix cinn header file conflict
      
      * fix conflict
      
      * fix conflict
      
      * fix conflict
      
      * fix conflict
      2fe04264
    • C
      Update record interface using part1 (#39693) · eec6ef81
      chenjian 提交于
      * fix RecordEvent interface
      
      * modify default level to 4
      
      * update interface use
      
      * add const default trace level
      
      * update record event interface using
      
      * update operator.cc
      
      * update part1
      
      * fix include profiler.h header in ps server
      
      * fix include profiler.h header in ps server
      eec6ef81
  5. 18 2月, 2022 2 次提交
    • Z
      [AMP] support GPU BF16 amp for dygraph (#39029) · 7d6d3848
      zhangbo9674 提交于
      * support dtype param for auto_cast
      
      * add amp_dtype for tracer
      
      * add unsupported bf16 list
      
      * support bf16 amp for O2
      
      * refine python interface for bfloat16
      
      * refine code
      
      * refine code
      
      * refine unittest
      
      * refine code
      
      * refine code
      
      * add bf16 o1
      
      * refine code by comment
      
      * add gradient accumulator
      
      * add recompute
      7d6d3848
    • S
      add tool: print kernel signaturs (#39670) · 03b875a8
      Shang Zhizhou 提交于
      * add tool: print kernel signaturs
      
      * fix windows compile
      03b875a8
  6. 16 2月, 2022 3 次提交
  7. 15 2月, 2022 2 次提交
    • R
      [PluggableDevice] Add custom runtime support (#38740) · 3e7825f3
      ronnywang 提交于
      * [CustomRuntime] Add DeviceManager
      
      * [CustomRuntime] Add DeviceInterface
      
      * [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager
      
      * [CustomRuntime] Add plug-in device
      
      * [CustomRuntime] Memory module support PluggableDevice
      
      * [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option
      
      * update
      
      * [API] update API doc based on comments, test=develop
      Co-authored-by: Nqili93 <qili93@qq.com>
      3e7825f3
    • A
      [PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404
      Aurelius84 提交于
      * #1 migrate dist-related type()-> dtype()
      
      * move datatype function from pten -> fluid/framework
      
      * change type() in imperative into convert(dtype())
      
      * modify xx_tensor->type into xx_tensor->dtype
      
      * change the set_type interface and the caller
      
      * modify xx_tensor.type into xx_tensor.dtype
      
      * fix mutable_data(place, dtype())
      
      * change caller of mutable_data in pten and distributed
      
      * change the caller of mutable_data in fluid/framework
      
      * change the caller of mutable_data in imperative directory
      
      * mutable_data: inference
      
      * update the call of mutable_data
      
      * transfer MakePenScalarArray MakePtenScalar ResetHolderWithType
      
      * pass the compile. the next step is remove VarType in Pten
      
      * fix all and remove VarType from pten. success in linux. Next task is other platform
      
      * fix conflict with develop
      
      * fix compiled error
      
      * Fix reset conversion
      
      * fix conflict
      
      * fix compiled problem
      
      * fix typo
      
      * Fix << in tensor_utils.cc
      
      * fix type->dtype
      
      * fix unittest
      
      * fix tensor init constructor
      
      * fix DataTypeSize for BFloat16
      
      * fix code style
      
      * fix npu compiled error
      
      * fix npu
      
      * compile npu sucessfully
      
      * fix conflict
      
      * fix conflict
      Co-authored-by: Nxiongkun <xiongkun03@baidu.com>
      7e7e9404
  8. 14 2月, 2022 3 次提交
  9. 11 2月, 2022 1 次提交
  10. 10 2月, 2022 2 次提交
    • Z
      Added python-c code generation for final state Eager Dygraph (#39233) · 43f84d0f
      Zhanlue Yang 提交于
      * Removed debug info
      
      * Added automatic code generation for final state Eager Dygraph
      
      * Modified backward yaml
      
      * Added EagerUtils helper functions for final state CodeGen
      
      * Adjusted CMakeFiles to support compilation for final state auto generated codes
      
      * Added python-c code generation for final state Eager Dygraph
      
      * Fixed minor issue
      
      * Fixed yaml.load() method failure
      
      * Fixed minor issues
      
      * Refactored Python-C Attributes Parsing Functions
      
      * Fixed minor issue with Python-C AddFunctions
      
      * Fixed issues from merge
      
      * Fixed merge issues
      43f84d0f
    • Z
      32d79bb9
  11. 09 2月, 2022 2 次提交
    • L
      [pten] fit pten for amp (#39403) · c5affb78
      Leo Chen 提交于
      * fit pten for amp
      
      * fix typo
      c5affb78
    • J
      Replace EagerTensor with Tensor (#39376) · 945a3ce9
      Jiabin Yang 提交于
      * merge legacy to fluid
      
      * Remove legacy code
      
      * Remove legacy code
      
      * Remove DataType test
      
      * Using Tensor directly instead of using EagerTensor
      
      * support gradient_accumulation
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      
      * make test_imperative_lod_tensor_to_selected_rows longer
      945a3ce9
  12. 08 2月, 2022 1 次提交
  13. 07 2月, 2022 1 次提交
  14. 06 2月, 2022 1 次提交
  15. 02 2月, 2022 1 次提交
  16. 30 1月, 2022 1 次提交
  17. 29 1月, 2022 1 次提交
    • C
      [PTen] Tidy pten core headers (#39188) · dd990981
      Chen Weihang 提交于
      * open header for custom kernel
      
      * add core utils
      
      * tidy core code
      
      * tify header
      
      * tidy include
      
      * tidy namespace
      
      * resolve conflit
      
      * fix unittest and coverage
      
      * remove platform using
      
      * resolve conflict
      
      * resolve conflict
      
      * fix digamma namespace error
      
      * fix xpu full kernel error
      
      * fix xpu full kernel error
      
      * polish details
      
      * add place for lib storage
      dd990981
  18. 28 1月, 2022 1 次提交
    • F
      [PSLIB] Add Metrics Module, Support User-defined Add Metric (#38789) · 2e6be886
      Fan Zhang 提交于
      * [PSLIB] Add Metrics Module, Support User-defined Add Metric
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI Coverage
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI
      
      * [PSLIB] Modify According to CI Coverage
      
      * [PSLIB] Modify According to CI Coverage
      
      * [PSLIB] Modify According to CI Coverage
      
      * modify role_maker
      
      * update CMakeLists.txt
      2e6be886
  19. 27 1月, 2022 5 次提交
    • S
      Add Khop Graph Sampler API (#39146) · 35f949b5
      Siming Dai 提交于
      * add the test case for the UVA
      
      * add the context load for the uva
      
      * Add graph_sample kernel
      
      * Add graph_sample commit
      
      * add new commit for graph_sample
      
      * add unsigned long long int
      
      * delete some remarks
      
      * add cpu version
      
      * add cuda eids
      
      * add cpu eids
      
      * delete _uva
      
      * optimize speed: emplace_back, last_layer
      
      * add to_uva_tensor
      
      * add cpu return_eids choice
      
      * add gpu return_eids choice
      
      * add cpu reindex_nodes
      
      * add gpu reindex_nodes
      
      * rename op and add OMP for cpu
      
      * add incubate api
      
      * fix the compile problem for the PADDLE_ENFORE and different device
      
      * fix the rcom and windows compile problem
      
      * add unittest for graph_sample_neighbors
      
      * fix cpu unittest and unique problem
      
      * fix uva unittest, fix cuda unique problem
      
      * fix the windows compile problem
      
      * fix the windows rand_r compile problem
      
      * add correct unittest, add src_eids dispensable
      
      * delete black
      
      * combine uva unittest
      
      * mv Sample_index to Sample_Index; check input shape; fix random sample func
      
      * delete memset & cudaMemset
      
      * fix according to PR comments
      
      * fix rocm ci
      
      * modify function names according to the specification
      
      * fix windows_openblas ci
      
      * refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors
      
      * fix rocm ci
      
      * rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc
      
      * add data type
      
      * fix conflict
      Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>
      35f949b5
    • Z
      【PTen】Remove ReMakePtenDenseTensor (#39094) · 98c1829b
      zyfncg 提交于
      * remove remake densetensor
      
      * fix eager test error
      
      * fix bug in eager
      98c1829b
    • A
      [PluggableDevice] Add custom kernel support based on pten kernel management (#38848) · a8879215
      Aganlengzi 提交于
      * [Demo] custom kernel based on pten kernel
      
      * merge and npu custom work well
      
      * del comments
      
      * delete other code
      
      * fix CUDAContext
      
      * fix not found small_vector.h
      
      * support NPU
      
      * fix NPUContext
      
      * fix DeviceContext support
      
      * add UT
      
      * fix call
      
      * add UT
      
      * fix
      
      * fix for comments and ut
      
      * add MACRO control
      
      * fix multi input output
      
      * support env CUSTOM_DEVICE_ROOT
      
      * deal with special cases
      
      * fix for Windows
      
      * try coverage with test_custom_kernel_dot.py
      
      * fix test_custom_kernel_dot
      
      * fix test_custom_kernel_dot
      
      * fix merge
      
      * fix merge
      
      * fix CI
      
      * update
      
      * merge and fix
      
      * remove WITH_CUSTOM_KERNEL
      
      * fix merge
      
      * merge and fix
      
      * fix ut
      
      * fix ut for mac
      
      * add more UT
      
      * add more UT
      
      * fix
      a8879215
    • T
      compile for afs api (#39113) · 4748486e
      Thunderbrook 提交于
      * compile for afs api
      
      * with pslib
      4748486e
    • Y
      [fleet_executor] add flag to control timer (#39241) · d6d745d2
      Yuang Liu 提交于
      d6d745d2
  20. 26 1月, 2022 3 次提交