1. 17 11月, 2021 2 次提交
  2. 16 11月, 2021 4 次提交
    • A
      Added BF16 Pool2d grad (#37081) · f95d44a2
      arlesniak 提交于
      * Added BF16 Pool2d grad
      
      * upstream pulled
      
      * fix for CI
      
      * fixes after review
      f95d44a2
    • Z
      Make Distributed Pass UT Timeout Smaller (#37199) · a01e27cc
      Zeng Jinle 提交于
      * make pass ut timeout smaller
      
      * increate ut timeout
      a01e27cc
    • J
      added onednn elu kernel (#37149) · ae40ee32
      jakpiase 提交于
      ae40ee32
    • L
      Fix attn_bias_add bug. (#37147) · a9e7a854
      Li Min 提交于
      fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
      a9e7a854
  3. 15 11月, 2021 8 次提交
  4. 12 11月, 2021 6 次提交
  5. 11 11月, 2021 8 次提交
    • W
      [Bug fixes] Add default arg to enhance varbase ClearGradient func (#36837) · 63f5c2d4
      Weilong Wu 提交于
      * Add default arg to enhance varbase ClearGradient func
      
      * Removed default arg, use a Flag to enhance varbase ClearGradient func
      
      * Renamed Flags to FLAGS_real_release
      
      * Use default arg to enhance varbase ClearGradient func and expose two func to set/get gradient isEmpty
      
      * Removed DECLARE_bool statement
      
      * Polished Code
      63f5c2d4
    • T
      add where/where_index/masked_select for kunlun (#37053) · f5e7b02a
      TTerror 提交于
      * add where/where_index/masked_select for kunlun
      
      * fix where/where_index
      
      * update where/masked_select
      f5e7b02a
    • J
      Added softplus + activation oneDNN fuse pass (#36657) · a346c4dc
      jakpiase 提交于
      * added softplus + activation fuse plass
      
      * minor change
      
      * implemented reviewer suggestion
      
      * minor fix
      
      * minor fix
      
      * added scale_out parameter
      
      * minor fix
      
      * fix for iScan CI
      
      * conditionally disabled logs
      
      * refactored pass builder
      a346c4dc
    • X
      fleet support elastic scale up/down (#36684) · 6af531b7
      xiayanming 提交于
      * fleet support elastic train
      
      * fleet support elastic train
      
      * support elastic
      
      * add unittest
      
      * fix unitest bug
      
      * fix unittest bug
      
      * fix unittest bug
      
      * fix unittest coverage
      
      * fix unittest coverage
      
      * fix unittest coverage
      
      * fix unittest coverage
      
      * fix unittest coverage
      
      * fix elastic bug
      
      * fix ci fail
      
      * fix ci fail
      
      * fix elastic bug
      
      * fix elastic bug
      
      * fix joint debugging bug
      
      * fix joint debugging bug
      
      * fix windows ci failed
      
      * fix windows ci failed
      6af531b7
    • Z
      [Heterps]Refactor Heter Pipeline Parameter Server (#36845) · a2da1efa
      zmx 提交于
      * change username
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * update
      
      * update
      
      * update unittests
      
      * fix
      
      * update
      
      * fix
      
      * update
      
      * fix
      
      * fix
      
      * fix
      
      * update
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update send_and_recv op. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix ut. test=develop
      
      * fix unit. notest,test=coverage
      
      * fix ut. notest, test=coverage
      
      * update. notest,test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix. notest, test=coverage
      
      * fix. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * add func. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix. test=develop
      
      * fix. test=develop
      a2da1efa
    • W
      [New features] Support VarBase to expose func (#36965) · 52645667
      Weilong Wu 提交于
      * Expose func for varbase
      
      * Expose func for varbase and enhance varbase init func
      
      * Change func name and add test case for _CopyGradientWith
      
      * Rename func
      
      * Add test cases to increase coverage
      
      * Refine the logic of _to func
      
      * Replace numel() with _numel(), Add test code
      52645667
    • L
      Get global cluster information (#37084) · 31673a92
      LiYuRio 提交于
      31673a92
    • W
      update ut (#37089) · 6c183a8e
      Wilber 提交于
      6c183a8e
  6. 10 11月, 2021 4 次提交
  7. 09 11月, 2021 3 次提交
  8. 08 11月, 2021 5 次提交
    • W
      Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a
      wanghuancoder 提交于
      * Use cuda virtual memory management and merge blocks, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * window dll, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * use autogrowthv2 for system allocator, test=develop
      
      * remove ~CUDAVirtualMemAllocator(), test=develop
      
      * refine, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop
      
      * fix bug, test=develop
      
      * revert system allocator, test =develop
      
      * revert multiprocessing, test=develop
      
      * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop
      
      * catch cudaErrorInitializationError when create allocator, test=develop
      
      * fix cuMemSetAccess use, test=develop
      
      * refine cuda api use, test=develop
      
      * refine, test=develop
      
      * for test, test=develop
      
      * for test, test=develop
      
      * switch to v2, test=develop
      
      * refine virtual allocator, test=develop
      
      * Record cuMemCreate and cuMemRelease, test=develop
      
      * refine, test=develop
      
      * avoid out of bounds, test=develop
      
      * rename allocator, test=develop
      
      * refine, test=develop
      
      * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop
      
      * for test,test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      a1ec1d5a
    • L
      【fix-bug】Support attn_mask=None input cases for fused_attention_op. (#36951) · 472dcca4
      Li Min 提交于
      目前的fused_attention_op不支持attn_mask=None的输入,本PR对此进行了补充,并补充了相应的单测逻辑。
      472dcca4
    • W
      add pass and mkldnn base ut. (#36967) · b7e88308
      Wilber 提交于
      b7e88308
    • Z
      aef8bf2a
    • X
      Add Support for OperatorBase in new executor (#36945) · 251f68e7
      xiongkun 提交于
      * add scope as membership
      
      * functions complete
      
      * fix bugs: garbage collectior
      
      * deal unknow variable holder
      
      * add
      
      * 1. add unittest for operator_base
      
      * code format
      251f68e7