1. 08 10月, 2021 5 次提交
  2. 07 10月, 2021 2 次提交
  3. 05 10月, 2021 1 次提交
    • J
      Added concat BF16/FP32 BWD OneDNN kernel (#35889) · dc4d5719
      jakpiase 提交于
      * tmp
      
      * added concat BF16/FP32 BWD oneDNN kernel
      
      * minor change
      
      * minor change
      
      * fix for CI
      
      * added formatting
      
      * Reverted deleting static keyword
      
      * added reviewers suggestions
      
      * reverted deleting concat bf16 test file
      
      * fixed concat tests
      dc4d5719
  4. 30 9月, 2021 4 次提交
  5. 29 9月, 2021 11 次提交
    • Z
      Add basic support for CUDA Graph (#36190) · 21b93c3d
      Zeng Jinle 提交于
      * add basic support for CUDA Graph
      
      * fix ci compile error
      
      * fix LOG print, fix windows CI
      
      * follow comments and update
      
      * small fix for default ctor
      
      * fix rocm compile error
      
      * fix CPU compile error
      21b93c3d
    • Z
      add optest for adamw (#36148) · 69eed34d
      zhaoyingli 提交于
      * update func name
      
      * skip cpu
      
      * update unittest
      
      * update unittest
      69eed34d
    • L
      fix cusparse compile problem, test=develop (#36199) · 3eb50715
      Liu-xiandong 提交于
      * fix cusparse compile problem, test=develop
      
      * Modify file permissions
      3eb50715
    • L
      Add functional autograd API:hessian (#36108) · 1f93582c
      levi131 提交于
      * init functional jacobian api
      
      * finish test with dtype float32
      
      * add float64 test case
      
      * polish code
      
      * use atol=1e-5 with dtype float64
      
      * fix for ci
      
      * set timeout for test_jacobian
      
      * init hessian API
      
      * save status
      
      * polish API docstring
      
      * modify docstring
      
      * add utils.py
      
      * save status
      
      * fix dygraph double grad dtype error when calling for high differential senario
      
      * reinvoke ci
      
      * test_hessian.py is ok
      
      * polish hessian API
      
      * init vhp
      
      * Revert "init vhp"
      
      This reverts commit cbd4d3b66abe82b0ac10721b9eddeb7d82e0a1c8.
      
      * add test for partial_engine.cc
      
      * modify numerical_delta with dtype float32
      
      * merge fix for dtype float64
      
      * spell fix
      
      * polish code
      
      * rm _stop_gradient_pre_process
      Co-authored-by: NJiabinYang <360788950@qq.com>
      1f93582c
    • Z
      [npu] add box coder (#36171) · 83578cfa
      zhulei 提交于
      * [npu] add box coder
      
      * [npu] add box coder
      83578cfa
    • P
      fix bug of top_k npu op (#36175) · 2b8fd704
      pangyoki 提交于
      2b8fd704
    • Z
      [NPU] Add group norm (#35937) · c79de728
      zhulei 提交于
      * [NPU] Add group norm
      
      * [NPU] Add group norm
      
      * [NPU] Add group norm
      
      * [NPU] Add group norm
      
      * [NPU] Add group_norm op
      c79de728
    • A
      [NPU] mod for model bert (#36165) · 7bddf2e8
      Aganlengzi 提交于
      * merge conflict of paddle_gtest_main.cc
      
      * modify FLAGS_npu_precision_mode and default not to call aclSetCompileopt
      7bddf2e8
    • W
      bec9fc9a
    • H
      Add op paddle.device.cuda.get_device_name and paddle.device.cuda.get_device_capability. (#35672) · f703558d
      hlygit66666 提交于
      * add op paddle.device.cuda.get_device_name
      
      * fix some bugs
      
      * fix some bugs
      
      * fix error message bugs
      
      * fix en docs
      
      * fix bugs
      
      * fix bugs
      
      * fix bugs
      
      * add error message test case
      
      * add get_device_name and get_device_capability
      
      * fix review
      
      * fix docs bug
      
      * fix docs
      
      * fix docs
      f703558d
    • Y
      fix paddle.device.cuda.get_device_properties doc (#36178) · 6d4435ac
      Yanxing Shi 提交于
      * Initial Commit
      
      * add unittest and add error information
      
      * modify doc
      
      * fix some error
      
      * fix some word
      
      * fix bug cudaDeviceProp* and modify error explanation
      
      * fix cudaDeviceProp* error and unnitest samples
      
      * fix hip error and PADDLE_WITH_HIP
      
      * update style
      
      * fix error is_compiled_with_cuda
      
      * fix paddle.device.cuda.get_device_properties
      
      * fix error for multi thread safe
      
      * update style
      
      * merge conflict
      
      * modify after mentor review
      
      * update style
      
      * delete word
      
      * fix unittest error for windows
      
      * support string input and modify some code
      
      * modify doc to support string input
      
      * fix error for express information
      
      * fix error for express information
      
      * fix unnitest for windows
      
      * fix device.startswith('gpu:')
      
      * format error and doc
      
      * fix after review
      
      * format code
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix py2 error
      
      * fix wrong words and doc
      
      * fix _gpuDeviceProperties
      
      * test=document_fix
      6d4435ac
  6. 28 9月, 2021 10 次提交
    • F
      add roi_align (#35102) · f068e08d
      Feng Ni 提交于
      * add roi_align in vision/ops.py
      f068e08d
    • L
      Add sparse_attention api, test=develop (#35676) · 6b587e93
      Liu-xiandong 提交于
      Add sparse_attention OPs, python api will be added in next pr
      6b587e93
    • L
      add API paddle.linalg.eig (#35674) · bc7e2b92
      Lijunhui 提交于
      * Add paddle.linalg.eig op
      
      * remove comments
      
      * remove comments
      
      * extend batch_size to the origin
      
      * add real times complex functor & destroy the backward complex output bug
      
      * terminate output diff when input real tensors
      
      * correct tiny doc errors
      
      * move functions from eig_helper to svd_helper and remove eig_helper
      
      * remove tensor.Resize
      
      * remove no longer used code
      
      * use existing lapack functions
      
      * reply review comments 21/27
      
      * remove .cu as this op is only executed on CPU
      
      * remove const_cast & add const in argument list for read-only references
      
      * fix sample code error in CI
      
      * remove template typename Tbase and more
      
      * remove eig exposure in paddle.*
      
      * add 'name=None' in eig python implementation
      
      * handle the unittest
      
      * try to solve the unittest
      
      * solve CI coverage
      
      * remove no longer used code
      
      * polish API doc and more
      
      * reply review comments
      
      * polish unittest, commit plan B
      
      * polish unittest
      bc7e2b92
    • X
      [hybrid] seed and dropout op support force-cpu (#35820) · 58c8f6b3
      xiayanming 提交于
      * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid
      
      * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid
      
      * [HIP] fix op not support AMD GPU bug
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] seed and dropout op support force-cpu
      
      * [hybrid] fix seed ci failed issue
      
      * add AsExtra for force_cpu of seed op
      58c8f6b3
    • Z
      remove new linalg api in paddle.__init__ (#36151) · 3bb4715e
      zhiboniu 提交于
      remove recent linalg api in paddle.init;
      add args 'name' in some new linalg api interface
      same change in develop branch to #36112
      3bb4715e
    • J
      【Bug fix】Fix dygraph double grad dtype error (#36125) · af4f018a
      Jiabin Yang 提交于
      * fix dygraph double grad dtype error when calling for high differential senario
      
      * reinvoke ci
      
      * add test for partial_engine.cc
      af4f018a
    • K
      py2 to py3 bug and iface fix for pslib (#36102) · 0e07f20e
      kuizhiqing 提交于
      0e07f20e
    • W
      [hybrid] optimizer sharding support optimize cast (#35878) · eef0a943
      WangXi 提交于
      eef0a943
    • Y
      Add paddle.device.cuda.get_device_properties (#35661) · 4cbed9e5
      Yanxing Shi 提交于
      * Initial Commit
      
      * add unittest and add error information
      
      * modify doc
      
      * fix some error
      
      * fix some word
      
      * fix bug cudaDeviceProp* and modify error explanation
      
      * fix cudaDeviceProp* error and unnitest samples
      
      * fix hip error and PADDLE_WITH_HIP
      
      * update style
      
      * fix error is_compiled_with_cuda
      
      * fix paddle.device.cuda.get_device_properties
      
      * fix error for multi thread safe
      
      * update style
      
      * merge conflict
      
      * modify after mentor review
      
      * update style
      
      * delete word
      
      * fix unittest error for windows
      
      * support string input and modify some code
      
      * modify doc to support string input
      
      * fix error for express information
      
      * fix error for express information
      
      * fix unnitest for windows
      
      * fix device.startswith('gpu:')
      
      * format error and doc
      
      * fix after review
      
      * format code
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix error for doc compile
      
      * fix py2 error
      
      * fix wrong words and doc
      
      * fix _gpuDeviceProperties
      4cbed9e5
    • S
      dlpack fix (#35817) · 74ff59cf
      Siming Dai 提交于
      74ff59cf
  7. 27 9月, 2021 5 次提交
    • J
      fix zero tensor for unique, unstack (#36021) · efd35384
      Jiawei Wang 提交于
      * fix extra op for expand, expand_as, tile, unstack
      
      * fix unique unstack dim 0
      
      * Update expand_v2_op.cc
      
      * fix unique_op format
      efd35384
    • J
      Added flatten and flatten2 BF16/FP32 FWD/BWD kernels (#35892) · e427a0f1
      jakpiase 提交于
      * refactored reshape multiop kernel and added flatten1/2 kernels
      
      * added formatting for flatten tests
      
      * CI fix
      
      * disabled reshape_kernel ops after succesful CI run
      
      * minor fix
      e427a0f1
    • L
      Add functional autograd API: jacobian (#35917) · ec2f68e8
      levi131 提交于
      * init functional jacobian api
      
      * finish test with dtype float32
      
      * add float64 test case
      
      * polish code
      
      * use atol=1e-5 with dtype float64
      
      * fix for ci
      
      * set timeout for test_jacobian
      
      * polish API docstring
      
      * modify docstring
      ec2f68e8
    • W
      Add roi pool (#35084) · 6d62769a
      Wenyu 提交于
      * add roi pool
      
      * rename input as x
      6d62769a
    • H
      support saving model defined parameters without add scale_op (#36119) · 8db6d221
      Haipeng Wang 提交于
      * add scale_op in model save step is not necessary, just fix the prune method to support static graph and inplace op
      
      * fix jit.save, no need to add scale_op to each outputvar anymore.
      fix prune_with_input, now it supports inplace op
      
      * temporarily disable test_trt_dynamic_shape.TRTDynamicShapeOutOfBound2Test
      
      * allow user to export parameters defined in model
      8db6d221
  8. 26 9月, 2021 2 次提交