1. 20 12月, 2021 1 次提交
  2. 17 12月, 2021 2 次提交
  3. 16 12月, 2021 2 次提交
  4. 13 12月, 2021 1 次提交
  5. 10 12月, 2021 3 次提交
  6. 09 12月, 2021 2 次提交
  7. 08 12月, 2021 2 次提交
  8. 07 12月, 2021 2 次提交
  9. 03 12月, 2021 2 次提交
  10. 01 12月, 2021 3 次提交
  11. 29 11月, 2021 3 次提交
  12. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  13. 24 11月, 2021 2 次提交
  14. 23 11月, 2021 2 次提交
  15. 19 11月, 2021 1 次提交
    • S
      Add paddle.incubate.graph_send_recv API (#37205) · 39012536
      Siming Dai 提交于
      * add cpu version, using set: sum, min, max
      
      * add cpu version: mean
      
      * improve cpu code and fix dynamic memory allcation problem
      
      * fix arg error, add index judge, delete fp16
      
      * fix bug in CudaAtomicMax and CudaAtomicMin
      
      * add CUDA version
      
      * fix grad_op bug for index
      
      * add op test, add correct cpu grad op
      
      * Add correct CUDA Mean grad
      
      * [Add] Successful MEAN and SUM
      
      * [Add] Successful MIN and MAX in CPU
      
      * [Add] Successful MIN and MAX in CUDA
      
      * fix windows dtype ci
      
      * fix ROCM ci by adding HIP flag
      
      * rename fused_gather_scatter to send_recv
      
      * unify name as send and recv
      
      * change zero index return time
      
      * add send_recv incubate api
      
      * fix index data type, add unittest case for API
      
      * delete redundant input tensor
      
      * fix en example and docs, add default value in pool_type
      
      * add shape judge and max grid judge
      
      * fix comment
      
      * fix index type bug
      
      * add const &
      
      * fix en docs
      
      * delete numpy in examples
      
      * add unittest for int input
      
      * fix send_recv comment
      
      * change send_recv to graph_send_recv
      39012536
  16. 18 11月, 2021 1 次提交
  17. 17 11月, 2021 1 次提交
  18. 11 11月, 2021 1 次提交
  19. 10 11月, 2021 1 次提交
  20. 09 11月, 2021 3 次提交
  21. 08 11月, 2021 1 次提交
    • W
      Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a
      wanghuancoder 提交于
      * Use cuda virtual memory management and merge blocks, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * window dll, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * use autogrowthv2 for system allocator, test=develop
      
      * remove ~CUDAVirtualMemAllocator(), test=develop
      
      * refine, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix bug, test=develop
      
      * revert system allocator, test =develop
      
      * revert multiprocessing, test=develop
      
      * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop
      
      * catch cudaErrorInitializationError when create allocator, test=develop
      
      * fix cuMemSetAccess use, test=develop
      
      * refine cuda api use, test=develop
      
      * refine, test=develop
      
      * for test, test=develop
      
      * for test, test=develop
      
      * switch to v2, test=develop
      
      * refine virtual allocator, test=develop
      
      * Record cuMemCreate and cuMemRelease, test=develop
      
      * refine, test=develop
      
      * avoid out of bounds, test=develop
      
      * rename allocator, test=develop
      
      * refine, test=develop
      
      * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop
      
      * for test,test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      a1ec1d5a
  22. 05 11月, 2021 1 次提交
    • J
      Disable pool&conv_transpose&quantize caching (#36695) · db6c00c4
      Jacek Czaja 提交于
      * - WIP
      
      - compilation fix
      
      - fix
      
      - fixes
      
      - fix
      
      - fix
      
      - fix again
      
      - fix
      
      - another fix
      
      - another compilation fix
      
      - fix
      
      - fix
      
      - fix
      
      - lint
      
      * - pool2d partially stripped from cache
      
      - pool2d partially stripped of caching
      
      * - compilation fix
      
      * - compilation fix
      
      * - Fix to UT of caching
      
      * - Enabling test_conv3d_mkldnn
      
      * - conv_transpose stripped of cache
      
      * - compilation fix
      
      * - fix
      
      * - fix
      
      * - compilation fix
      
      * - fix
      
      * Reverted disabling caching of conv2d
      
      * - compilation fix
      
      * - ut reverted
      db6c00c4
  23. 03 11月, 2021 1 次提交
    • Z
      Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops for controlling op types used... · 2479664a
      Zhen Wang 提交于
      Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops for controlling op types used in training with CINN. (#36842)
      
      * Update UT test_parallel_executor_run_cinn.py.
      
      * Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops & FLAGS_cinn_ops_delim.
      
      * Use the custom StringSplit function and remove the FLAGS_cinn_ops_delim flag.
      
      * Add FlagController test.
      
      * Apply lock to the cache_ only in CinnCompiler.
      
      * Add VizGraph & ReadableKey method for CinnCompiler.
      
      * Update the dot style of VizGraph in CinnCompiler.
      2479664a
  24. 02 11月, 2021 1 次提交