1. 24 8月, 2021 10 次提交
    • D
      fix bmm bug (#35098) · de645153
      duanboqiang 提交于
      * fix bmm bug
      
      * bmm style
      
      * fix bmm
      de645153
    • A
      Update LearningRate for test fit a line BF16 (#34653) · 36f7e751
      Adam Osewski 提交于
      * Small corrections.
      
      * Fix lr for bf16.
      
      * Revert some changes.
      36f7e751
    • J
      [oneDNN] Concat refactoring and disabling caching (#35002) · d9c0f09b
      Jacek Czaja 提交于
      * - concat refactoring draft
      
      * - cmpilation fixes
      
      * - yet another compilation fix
      
      * - fix
      
      * - compilation fix
      
      * - fixes to compilation
      
      * - another compilation fix
      
      * - fix
      
      * - Added overloaded AcquirePrimitiveDesc for concat
      
      * - fix
      
      * - reserve introduced
      
      * - UT fixes
      
      * - test concat int8 improved
      
      * - fixes
      
      * - fix to crash
      
      * - lint fixes
      
      * - fixes after review
      
      * - some other fixes from review
      d9c0f09b
    • W
      3b0d8a7b
    • cb28753c
    • Z
      add scope guard (#35103) · b0a1d122
      Zeng Jinle 提交于
      b0a1d122
    • R
      [NPU] add conv_op_npu and test (#34055) · 00a269de
      ronnywang 提交于
      * add conv_op_npu and test
      
      * add more tests
      
      * clean headers & support fp16
      
      * update
      00a269de
    • R
      [NPU] add pool2 op and tests (#34770) · da261732
      ronnywang 提交于
      * add pool2d_op_npu and test
      
      * update
      
      * update pool2d_backward_navie
      
      * clean headers
      da261732
    • T
    • Y
      Add auto completion module for auto parallel (#34813) · 93d862b0
      Yulong Ao 提交于
      * add auto_parallel dir
      
      * mv to paddle.distributed
      
      * add shard_xx api
      
      * add distributed attrs for var
      
      * add ut, test=develop
      
      * add dist
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update, test=develop
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update, test=develop
      
      * update, test=develop
      
      * update
      
      * update
      
      * delete unused proto
      
      * resotre op_desc
      
      * restore type_defs
      
      * update var_desc
      
      * remove dimss_mapping for proto_pybind
      
      * update interface.py
      
      * update framework.py
      
      * update
      
      * update
      
      * add auto_parallel dir
      
      * mv to paddle.distributed
      
      * add shard_xx api
      
      * add distributed attrs for var
      
      * add ut, test=develop
      
      * [WIP] Add the auto completion feature and related codes
      
      * [WIP] Improve the auto completion and related codes
      
      * [WIP] Make the auto completion to support data-parallel
      
      * [WIP] Make the completion support mp and dp+mp
      
      * [WIP] Refactor auto completion unit test for MLP
      
      * [WIP] Refactor the implementation of DistributedOperatorImpl
      
      * [WIP] Improve dims_mapping update rule and fix a bug
      
      * [WIP] Support auto completion for one transformer decoder layer
      
      * [WIP] Add a minor change
      
      * [WIP] Fix a bug within the uint test
      
      * Shard XShape tensor, add embedding completion and refactor code
      
      * Add the distributed_operators dir to setup.py.in
      
      * Improve the completion process and add the unittest for gpt
      
      * fix process_mesh ut
      
      * fix process_mesh ut
      
      * update
      
      * update, test=develop
      
      * Add support for automatically completing distributed attrs of special ops
      
      * update
      
      * update
      
      * update
      
      * fix doc sample codes, test=develop
      
      * improve coverage, test=develop
      
      * add static_mode check, test=develop
      
      * Model the cluster for cost model and physical mapping
      
      * update, test=develop
      
      * add set_placement, test=develop
      
      * Add the check to make sure the candidate tensors' size is great than zero
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update doc, test=develop
      
      * update, test=develop
      
      * Auto mark dist attrs annotated by user
      
      * update ndarray to nested list, test=develop
      
      * update, test=develop
      
      * Add auto-completion module for auto-parallel (based on PR#33804)
      
      * Remove unnecessary files
      
      * Remove unrelated files for the auto completion pr
      
      * Update the unit test to improve the coverage
      
      * Modify codes based on reviews
      
      * Minor changes for CI
      
      * Improve some codes based on new comments
      
      * Fix bugs caused by shallow copy in attributes.py
      * Imporve amend_distributed_attr_for_program in context.py
      * Other changes for weihang's comments
      Co-authored-by: Nsandyhouse <lilong12@baidu.com>
      93d862b0
  2. 23 8月, 2021 17 次提交
  3. 22 8月, 2021 1 次提交
  4. 20 8月, 2021 12 次提交