提交 · 2fa3ce2b4a9b7f2bbd53ae58d0d0b9b27eb00ca2 · 机器未来 / Paddle

26 8月, 2021 9 次提交
- Z
  Revert "Add copy from tensor (#34406)" · 2fa3ce2b
  由 zhangchunle 提交于 8月 26, 2021
```
This reverts commit ac33c0ca.
```
  2fa3ce2b
- B
  
  [NPU] Support npu kernel for StridedSlice op without grad (#34601) · fa6c59a4
  由 Bo Liu 提交于 8月 26, 2021
  
  fa6c59a4
- S
  Add copy from tensor (#34406) · ac33c0ca
  由 Shang Zhizhou 提交于 8月 26, 2021
```
* add api

* temp save

* revert

* copytocpu async ok

* fix style

* copy sync ok

* fix compile error

* fix compile error

* api done

* update python async api

* fix compile

* remove async python api; add c++ async unittest

* remove python async api

* update unittest

* update unittest

* add C++ unittest for copytensor

* add unittest

* update namespace utils to class TensorUtils

* add unittest

* update unittest

* update unittest

* update code style

* update code style

* update unittest
```
  ac33c0ca
- S
  Add roi align op npu (#34973) · 289e1818
  由 shiyutang 提交于 8月 26, 2021
```
* add_roi_align_npu

* update

* update

* update
```
  289e1818
- W
  
  [Inference] Replace unordered_map with map to support subgraph stability (#35147) · a1aae040
  由 Wilber 提交于 8月 26, 2021
  
  a1aae040
- L
  
  add temporary MultiThreadedWorkQueue (#35158) · e4a8815d
  由 liutiexing 提交于 8月 26, 2021
  
  e4a8815d
- D
  
  fix cast op (#35156) · 412877e6
  由 duanboqiang 提交于 8月 26, 2021
  
  412877e6
- X
  
  fix the bug of channel-wise quantization for ernie (#34948) · c71025eb
  由 XGZhang 提交于 8月 26, 2021
  
  c71025eb
- W
  use spinlock in auto growth (#35139) · 0efda9d9
  由 wanghuancoder 提交于 8月 26, 2021
```
* use spinlock in auto growth, test=develop

* refine,test=develop
```
  0efda9d9
25 8月, 2021 11 次提交
- P
  
  disable test_resnet50_quant (#35149) · b3ef9a68
  由 Peihan 提交于 8月 25, 2021
  
  b3ef9a68
- W
  fix cmaklist for new executor (#35137) · 03cb3132
  由 wanghuancoder 提交于 8月 25, 2021
```
* fix cmaklist for new executor, test=develop

* refine, test=develop

* refine, test=develop
```
  03cb3132
- P
  Modify ci time count & fix resnet50_quant multi_thread tests (#35141) · 9e54209d
  由 Peihan 提交于 8月 25, 2021
```
* Modify ci time count & fix resnet50_quant multi_thread tests

* fix wrong time variable
```
  9e54209d
- J
  Fix for expand_v2 op (#35101) · 1f34f7ec
  由 jakpiase 提交于 8月 25, 2021
```
* temporary change

* fix for expand_v2

* changes after review, activated ppyolov inference test
```
  1f34f7ec
- Z
  
  fix cpu adamw problem for np.float64 (#35124) · 700205e8
  由 zhaoyingli 提交于 8月 25, 2021
  
  700205e8
- R
  
  [NPU] Fix the performance problem when 'axis' is not specified (#35116) · 91ba86b1
  由 ronnywang 提交于 8月 25, 2021
  
  91ba86b1
- L
  fix potential tensor leak in tensor.__setitem__ (#35013) · 763b6d91
  由 Leo Chen 提交于 8月 25, 2021
```
* fix index tensor leak in __setitem__

* fix another usage of PyTuple_Pack

* refine code

* refine code

* handle None index

* add Py_DecRef

* revert ut

* refine code

* merge develop

* use RAII

* follow comments
```
  763b6d91
- Y
  
  [hybrid performance] optim npu coalesce set constant (#35105) · 4bfd0445
  由 Yuang Liu 提交于 8月 25, 2021
  
  4bfd0445
- R
  
  [NPU] add npu_one_hot_v2 (#34937) · d710c3a0
  由 ronnywang 提交于 8月 25, 2021
  
  d710c3a0
- L
  
  high-performance SingleThreadedWorkQueue (#35086) · 751a7942
  由 liutiexing 提交于 8月 25, 2021
  
  751a7942
- T
  
  update elementwise api in kunlun (#35021) · ff96a7d5
  由 taixiurong 提交于 8月 25, 2021
  
  ff96a7d5
24 8月, 2021 13 次提交

G

Add flags to control whether to check Nan value of hccl_allreduce_sum. (#35093) · 5b737834
由 gongweibao 提交于 8月 24, 2021

5b737834

add fetch, test=develop (#35019) · a5060b55

由 wanghuancoder 提交于 8月 24, 2021

* add fetch, test=develop

* fix fetch2op, test=develop

* fix fetch2op, test=develop

* refine, test=develop

* fix fetch ctx, test=develop

* add wait, test=develop

* rename fetch2 to fetch_v2, test=develop

* merge, test=develop

a5060b55

Add no_sync in data parallel for dynamic graph (#34740) · b09f4d7f

由 Haohongxiang 提交于 8月 24, 2021

* Add no_sync in data parallel for dynamic graph

* modify UT of no_sync

* delete test_parallel_dygraph_dataparallel_no_sync.py

* add test_parallel_dygraph_no_sync.py

* modify run_trainer_with_spawn in UTs

* Add UT of complex control flow in no_sync

* add specific descriptions and notes for no_sync

* check code style

* modify UT's TIMEOUT in CMakeLists.txt

b09f4d7f

Q

[NPU] fix NPU ci scripts, test=develop (#35095) · a332352a
由 Qi Li 提交于 8月 24, 2021

a332352a
D
fix bmm bug (#35098) · de645153
由 duanboqiang 提交于 8月 24, 2021
```
* fix bmm bug

* bmm style

* fix bmm
```
de645153

[oneDNN] Concat refactoring and disabling caching (#35002) · d9c0f09b

由 Jacek Czaja 提交于 8月 24, 2021

* - concat refactoring draft

* - cmpilation fixes

* - yet another compilation fix

* - fix

* - compilation fix

* - fixes to compilation

* - another compilation fix

* - fix

* - Added overloaded AcquirePrimitiveDesc for concat

* - fix

* - reserve introduced

* - UT fixes

* - test concat int8 improved

* - fixes

* - fix to crash

* - lint fixes

* - fixes after review

* - some other fixes from review

d9c0f09b

W

cache runtime ctx for executor, test=develop (#35108) · 3b0d8a7b
由 wanghuancoder 提交于 8月 24, 2021

3b0d8a7b
王

add the extra and quantization for op def, test=develop (#35076) · cb28753c
由王明冬提交于 8月 24, 2021

cb28753c
Z

add scope guard (#35103) · b0a1d122
由 Zeng Jinle 提交于 8月 24, 2021

b0a1d122
R
[NPU] add conv_op_npu and test (#34055) · 00a269de
由 ronnywang 提交于 8月 24, 2021
```
* add conv_op_npu and test

* add more tests

* clean headers & support fp16

* update
```
00a269de
R
[NPU] add pool2 op and tests (#34770) · da261732
由 ronnywang 提交于 8月 24, 2021
```
* add pool2d_op_npu and test

* update

* update pool2d_backward_navie

* clean headers
```
da261732
T

Fix a bug of transpose op, about accessing memory out of bounds of the perm param (#35079) · 10563791
由 TeslaZhao 提交于 8月 24, 2021

10563791

Add auto completion module for auto parallel (#34813) · 93d862b0

由 Yulong Ao 提交于 8月 24, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments
Co-authored-by: Nsandyhouse <lilong12@baidu.com>

93d862b0

23 8月, 2021 7 次提交
- B
  
  [CPU] Enable barrier op upon gloo (#34671) · e8f146a9
  由 Bo Liu 提交于 8月 23, 2021
  
  e8f146a9
- W
  
  trt convert ut add dynamic_shape and int8, etc. (#35061) · 17188e8d
  由 Wilber 提交于 8月 23, 2021
  
  17188e8d
- W
  
  remove old data check (#35077) · 5b814fd5
  由 wenbin 提交于 8月 23, 2021
  
  5b814fd5
- J
  [oneDNN] disable caching for interpolate and batch Norm (#35030) · 673bf719
  由 Jacek Czaja 提交于 8月 23, 2021
```
* - disabled interpolate onednn

* - compilation fix

* - draft of batch_norm cache disabling

* - fixes to UT
```
  673bf719
- P
  support infer_ut on windows nightly build (#35049) · 4f86aae0
  由 Peihan 提交于 8月 23, 2021
```
* enable infer_ut on windows

* remove lib calculation & time

* unset http_proxy when download bos file on windows
```
  4f86aae0
- L
  Refactor the organization of layer_norm cuda impl. (#34883) · 7f5eb533
  由 Li Min 提交于 8月 23, 2021
```
Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op.

    Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h.
    Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.
```
  7f5eb533
- Z
  Support gettiem by Bool index (#35026) · b6dc16cb
  由 zyfncg 提交于 8月 23, 2021
```
* Support getitem by Bool index

* delete some debug info of bool index

* support the case that the shape of bool index is different from indexed tensor
```
  b6dc16cb

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致