1. 02 9月, 2021 6 次提交
    • X
      Add SVD Op and it's GPU and CPU kernel (#34953) · 7e5fb462
      xiongkun 提交于
      * Add SVD Op and it's GPU and CPU kernel
      
      * Remove CUDAPlace in test_svd_op, make the test available in CPU package
      
      * modfity the file
      
      * fix windows bug/ fix ROCM / fix test timeout
      
      * for pass the CIs
      
      * improve error report
      
      * for code review
      
      * some modification to test_svd_op
      
      * change python code style
      
      * expose the svd interface for document
      7e5fb462
    • Z
      [NPU] Add label_smooth_op (#34828) · e57a88b3
      zhulei 提交于
      * [NPU] Add label_smooth_op
      
      * [NPU] Add label_smooth_op
      e57a88b3
    • Y
      [hybrid] [npu] fit npu nan/inf check (#35171) · 67ed7e12
      Yuang Liu 提交于
      67ed7e12
    • W
      fix static error in summary (#35303) · b28cc734
      wangna11BD 提交于
      b28cc734
    • J
      [Auto Parallel] Logical Partition & Dist Op (#35117) · a622b701
      JZ-LIANG 提交于
      * support shard reader
      
      * support shard reader
      
      * add parallel mode
      
      * update process mesh
      
      * add method to compute comm_group
      
      * implement dist_embedding forward func
      
      * implement dist matmul forward func
      
      * implement dist reshape forward func
      
      * add transpiler framework
      
      * add transpiler forward
      
      * implement transpiler forward
      
      * implement transpiler backward & update
      
      * add process
      
      * add unitest
      
      * chmod
      
      * chmod
      
      * chmod
      
      * update unitest
      
      * add unitest for gpt
      
      * remove unused print
      
      * rename transpiler --> partitioner
      
      * rename transpiler --> partitioner
      
      * chmod
      
      * chmod
      
      * bug fixed
      
      * remove amp function
      
      * update case for dp mode
      
      * update case for dp mode
      a622b701
    • B
      [npu] add update_loss_scaling npu min value (#35270) · 280d7421
      Baibaifan 提交于
      280d7421
  2. 01 9月, 2021 15 次提交
  3. 31 8月, 2021 10 次提交
    • A
      Support CostInfo and MemProfiler in InterpreterCore (#34981) · 572bad8a
      Aurelius84 提交于
      * polish code
      
      * fix unittest on windows
      
      * refine pybind interface
      
      * support statistic MemSize of AllocatorPool
      
      * Replace mutex into atomic
      572bad8a
    • F
      transformer opt python files (#35206) · e2991555
      Feng Xing 提交于
      This PR adds fused transformer python related files. It defines interface of fused transformer.
      
      Fused transformer implements an optimized version of transformer layer (in python/paddle/nn/layer/transformer.py). In this PR, four layers (functions) are defined:
      (1) FusedMultiHeadAttention: multi-head attention layer
      (2) FusedFeedForward: feed forward layer
      (3) FusedTransformerEncoderLayer: transformer encoder layer
      (4) FusedTransformer: transformer layer
      e2991555
    • A
      [Dy2Stat]Add model ResNet50 for Dy2stat AMP training (#35276) · 079c585c
      Aurelius84 提交于
      * Add model for ResNet50 for Dy2stat AMP training
      
      * fix timeout
      
      * fix dataloader
      079c585c
    • Q
      [NPU] fix cmake for ascend ci, test=develop (#35255) · f6004ab9
      Qi Li 提交于
      * [NPU] fix cmake for ascend ci, test=develop
      
      * update paddle_build.sh scripts, test=allcase
      f6004ab9
    • S
      Revert "Revert "Add copy from tensor (#34406)" (#35173)" (#35256) · 6116f9af
      Shang Zhizhou 提交于
      * Revert "Revert "Add copy from tensor (#34406)" (#35173)"
      
      This reverts commit 32c1ec42.
      
      * add template instantiation
      6116f9af
    • Z
      New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d
      Zhanlue Yang 提交于
      [Background]
      Expansion in code size can be irreversible in the long run, leading to huge release packages which
      not only hampers user experience but also exceeds a hard limit of pypi.
      
      In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
      arches supported.
      
      This PR aims to prune this NV_FATBIN.
      
      [Solution]
      In the new release strategy, two types of whl packages will be involved:
      
      Cubin PIP package:
      PIP package maintains a smaller window for GPU arches support, containing
      sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches
      
      JIT release package:
      This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
      compute_70, compute_75, compute_80, with best performance and GPU arches coverage.
      
      However, it takes around 10 min to install due to the JIT compilation.
      
      [How to use]
      The new release strategy is disabled by default.
      To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
      To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL
      2f3b393d
    • W
      update infer trt ut. (#35261) · 96e7d903
      Wilber 提交于
      96e7d903
    • X
      support fuse layers for ptq (#35015) · ef536250
      XGZhang 提交于
      ef536250
    • A
      NPU add elementwise_mod (#35245) · 561841d2
      Aganlengzi 提交于
      561841d2
    • A
      NPU add fill_zeros_like kernel (#35246) · aaaa9965
      Aganlengzi 提交于
      aaaa9965
  4. 30 8月, 2021 3 次提交
  5. 29 8月, 2021 1 次提交
  6. 27 8月, 2021 5 次提交