提交 · 5981bee2af22e58b48fafb2d69b1c4e243bb1227 · PaddlePaddle / Paddle

01 6月, 2021 2 次提交
- Z
  
  Fix duplicate download when incremental compilation (#33230) · e939236e
  由 Zhou Wei 提交于 6月 01, 2021
  
  e939236e
- C
  replace and remove complex64/128 types in custom OP and other files (#33195) · 06c63ca0
  由 chentianyu03 提交于 6月 01, 2021
```
* replace and remove complex64/128 types in custom OP and other files

* fix custom_tensor_test fail bug

* fix custom_conj_test fail bug

* fix dispatch_test_op build fail bug
```
  06c63ca0
28 5月, 2021 1 次提交
- Z
  
  fix ninja compile bug of warpctc and mkldnn (#33155) · 5b910f95
  由 Zhou Wei 提交于 5月 28, 2021
  
  5b910f95
27 5月, 2021 2 次提交

[PsCore] support ssd (#33031) · 988b5fe1

由 Thunderbrook 提交于 5月 27, 2021

* support ssd in PsCore

* remove log

* remove bz2

* defalut value

* code style

* parse table class

* code style

* add define

988b5fe1

Z
Unify all external API error message mechanism and enhance third-party API error msg (#33003) · b425215a
由 Zhou Wei 提交于 5月 27, 2021
```
* Unify all external API error message mechanism and enhance third-party API error msg

* fix some comment

* fix some comment
```
b425215a

26 5月, 2021 1 次提交
- Z
  Fix ninja compilation bug and warning on windows (#32987) · accf284b
  由 Zhou Wei 提交于 5月 26, 2021
```
* fix ninja compilation bug on windows

* polish windows ci

* polish windows ci
```
  accf284b
24 5月, 2021 1 次提交

[oneDNN] bump up oneDNN to 2.2.2 (#32685) · b8e4ec7d

由 Jacek Czaja 提交于 5月 24, 2021

* - bump up oneDNN to 2.2.2 (should reduce perf drops of mobilenet)

* - more recnet onednn 2.2.2 (some more bugfixes)

b8e4ec7d

20 5月, 2021 1 次提交

fix gather op and add logsumexp op on kunlun (#32931) · a96e8bc9

由 TTerror 提交于 5月 20, 2021

* fix gather op and add logsumexp op on kunlun

* update xpu depence

* update tests and fix elementwise_add

a96e8bc9

19 5月, 2021 1 次提交

CI skip inference test if only python files modified (#32962) · 7896b51a

由 wuhuanzhou 提交于 5月 19, 2021

* CI skip inference test if only python files modified, test=develop

* fix compilation error on ROCM, test=develop

* fix cmake error on PR-CI-ROCM-Compile, test=develop

7896b51a

18 5月, 2021 1 次提交
- Q
  
  update kunlun bkcl to support multi-machine (#32577) · 59b74ee7
  由 QingshuChen 提交于 5月 18, 2021
  
  59b74ee7
12 5月, 2021 1 次提交
- Z
  Polish Windows CI and open the normal GPU unittest on CI (#32794) · eeca9639
  由 Zhou Wei 提交于 5月 12, 2021
```
* fix windows CI

* fix windows CI
```
  eeca9639
11 5月, 2021 1 次提交
- W
  
  fix cmake expressions error, test=develop (#32815) · 84eca16d
  由 wuhuanzhou 提交于 5月 11, 2021
  
  84eca16d
10 5月, 2021 1 次提交
- T
  [pslib] pslib with cmake (#32800) · fbbc3394
  由 Thunderbrook 提交于 5月 10, 2021
```
* pslib with cmake

* heter util

* vlog

* heter server test

* add dtor

* cmake
```
  fbbc3394
07 5月, 2021 1 次提交
- L
  Fix compile error on jetson platform (#32748) · 8ce6b393
  由 LielinJiang 提交于 5月 07, 2021
```
* fix compile error on jetson platform
```
  8ce6b393
06 5月, 2021 2 次提交

[Rocm] fix expand as (#32704) · 2fe45806

由 zhulei 提交于 5月 06, 2021

* [Rocm] fix test_expand_as_op

* [Rocm] fix test_expand_as_op

* [Rocm] fix test_expand_as_op

* [Rocm] fix test_expand_as_op

* [Rocm] fix test_expand_as_op

* [Rocm] fix test_expand_as_op

2fe45806

[ROCM] bugfix for unittest (#32392) · 31392627

由 ronnywang 提交于 5月 06, 2021

* fix test_unpool_op

* fix test_inplace_addto_strategy

* fix test_conv2d_fusion_op

* fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor

* fix test_dot_op

* fix test_correlation_op

* fix tracer

* fix test_memcpy_op

31392627

29 4月, 2021 1 次提交
- L
  Add op read_file and decode_jpeg (#32564) · b22f6d69
  由 LielinJiang 提交于 4月 29, 2021
```
* add op read_file and decode_jpeg
```
  b22f6d69
24 4月, 2021 1 次提交
- W
  
  refator paddle inference c api.test=develop (#32225) · 18d3e2ca
  由 winter-wang 提交于 4月 24, 2021
  
  18d3e2ca
23 4月, 2021 1 次提交

fix Windows CI MP compile and environment install script and openblas CI (#32378) · 7a681f0b

由 Zhou Wei 提交于 4月 23, 2021

* fix Windows CI MP compile and environment install script

* clear Windows CI environment

* clear Windows CI environment

* clear Windows CI environment

7a681f0b

22 4月, 2021 1 次提交
- T
  
  Delete WITH_GRPC flag and Distributed old code (#32383) · e58c705b
  由 tianshuo78520a 提交于 4月 22, 2021
  
  e58c705b
21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

15 4月, 2021 2 次提交

heterps support pscore (#32093) · 9f8c8f96

由 Thunderbrook 提交于 4月 15, 2021

* pscore support heterps

* fleet cmake

* fleet wrapper

* macro

* solve conflict

* solve conflict

* add unitest

* paddle enforce

* unitest

* unitest

* unitest

9f8c8f96

【NPU】Cherry-pick ascendrc ops code by 0325 to develop (#32197) · e6bc358d

由 zhang wenhui 提交于 4月 15, 2021

* merge 31065

* Fix typo of selected_npus (#31230)

* merge 31249

* [NPU] Support npu op pow and pow grad (#31247)

* [NPU] Support npu op: (1) pow (2) pow_grad

* Support fp16

* Fix pow npu fp16 test (#31256)

* support list of list attribute for NPU (#31299)

* support list of list attribute for NPU

* fix compile problem

* fix reference

* [NPU] Support npu op: (1) slice (2) slice_grad (#31275)

* fix reading flags from env (#31329)

* merge 31347

* [NPU] Support npu op layer_norm and layer_norm_grad (#31310)

* init commit, add layer_norm npu kernel

* fix typo

* add unittest

* add unittest

* fix bug

* fix bug

* refine ut

* [NPU] add npu kernel for equal op (#31393)

* add npu kernel for equal op

* refine code

* add more ut

* update year

* [NPU] Support npu kernel for shape op  (#31427)

* add shape npu

* fix

* fix

* fix endif (#31431)

* Fix pow, use fillD instead of broadcast (#31433)

* Fix pow, refine code (#31440)

* fix cmake of cryptopp to avoid downloading every time (#31451)

* [NPU] squeeze and unsqueeze op for ascend (#31452)
Co-authored-by: Nroot <xiayanming@baidu.com>

* Support npu kernel for gather op (#31458)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* 【NPU】add scale op for npu (#31499)

* add scale npu

* fix

* fix

* Support TensorFormVector, TensorToVector of bool type (#31518)

* support TensorFormVector, TensorToVector of bool type

* add ut

* fix compile problem

* 【NPU】support npu kernel for fill_constant op (#31521)

* add fill_constant npu

* add fill_constant npu

* fix

* cherry-pick 31422, solve conflict

* 【NPU】Support npu kernel for matmul op (#31544)

* add matmulv2_npu

* add matmul

* add matmul

* [NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571)

* [NPU] Support npu op elementwise_max (#31574)

* 【NPU】add relu op for  npu (#31515)

* add relu npu

* fixed

* fix

* 【NPU】Suppert npu kernel for reshape2 op (#31524)

* add reshape2 npu

* add reshpe2

* [NPU] Support npu kernel for gather op fix bug (#31541)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* update gather_grad

* fix bug

* fix bug

* [NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457)

* Support npu kernel for amp_check_finite_and_unscale_npu op

* support EnforceNotMet exception

* fix exception bug

* modify python unittest

* precommit

* update c++ unittest

* fix review

* fix review

* [NPU] accuracy op (#31492)

* accuracy op

* fix license

* fix

* add test and fix bug

* [NPU] add Assign OP (#31561)

* add assign op

* add test assign npu test

* dele if def
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] fix npu op elementwise_mul_grad (#31592)

* 【NPU】Support npu op gelu and gelu_grad (#31530)

* Support npu op gelu and gelu_grad

* Support npu op gelu and gelu_grad

* [NPU] fix assgin cmake (#31595)

* fix gather_grad bug (#31607)

* [NPU] add range op (#31560)

* add range op

* fix codestyle; call GetSize directly
Co-authored-by: Noyjxer <1728722986@qq.com>

* 【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573)

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* [NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600)

* [NPU] Support npu op logicalnot_op (#31534)

* [NPU] Support npu op elementwise_min (#31575)

* [NPU] Support npu op elementwise_pow (#31576)

* [NPU] Support npu op table_lookup_v2 and table_lookup_v2_grad (#31399)

* [npu] support npu kernel `table_lookup_v2`

* clean up

* +python test

* +cmake

* clean up

* remove int8 kernel
+ python unitest for fp16

* clean up

* [NPU] support npu kernel for `less_than` (#31327)

* [npu] support npu kernel for `less than`

* remove int* kernel

* cleanup

* [NPU] Support npu kernel scatter op (#31624)

* Support npu kernel scatter op

* Add more test

* [NPU] fix allocator min chunk size (#31632)

* [NPU] Support NPU kernel cast op (#31635)
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for sgd (#31639)

* 【NPU】Support NPU kernel for reduce_sum op v2 (#31620)

* add reduce_sum

* fix broadcastd

* fix test

* fix

* add unsqueeze in reduce_sum

* add template

* add unittest for keep_dim

* test reduce_all
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for adam (#31644)

* add npu kernel for adam

* refine code

* disable test

* modify atol

* 【NPU】Support npu kernel for mul op (#31584)

* add mul

* add test mul

* [NPU] add npu kernel for softmax_with_cross_entropy (#31656)

* init

* fix bugs

* [NPU] add npu kernel for mean Op (#31562)

* update mean op

* update mean op

* give a better test activation
Co-authored-by: Noyjxer <1728722986@qq.com>

* Revert "[NPU] add npu kernel for mean Op (#31562)" (#31665)

This reverts commit 468ac699.

* 【NPU】Add TensorCopy to NPU kernel for reduce_sum op  (#31667)

* update unittest

* add TensorCopy in npu grad kernel

* [NPU] Support npu op `expand` (#31405)

* [npu] support npu kernel  for `expand`

* [NPU] fix shape of dx in mul_grad (#31675)

* fix shape of dx

* refine code

* [NPU] add Increment op (#31563)

* add increment

* fix

* update test increment op inplace

* update increment op

* increment b = 2
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] add NPU add topk  (#31596)

* add topk op

* add cmake

* update topk npu op

* refactor func

* fix test not go npu TopKD bug

* NPUPlace(4) to NPUPlace(0)

* update comment
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] Support NPU kernel sum op (#31671)

* [NPU] npu support `transpose` (#31486)

* cherry-pick 31564, solve conflict

* [NPU] Fix bug: Fix calculation errors of pow grad npu kernel (#31699)

* [NPU] Support testing grad of NPU ops in OpTest (#31697)

* [NPU] Support NPU kernel of stack op (#31711)

* [NPU] Remove redundant ctest of top_k_op_npu_test (#31718)

* [NPU] fix reshape npu op kernel (#31726)

* rename npu op file

* fix reshape

* [NPU] change transpose to transpose2 (#31734)

* change transpose to transpose2

* fix bug

* [NPU] Support  mean npu kernel (#31729)

* [NPU] fix some bugs of npu op (#31739)

* fix softmax

* fix mean

* fix lookup_table_v2

* 【NPU】Fix npu kernel elementwise_div_grad  (#31753)

* [NPU] fix the grad kernel diff bug of gather op (#31757)

* fix gather grad kernel diff

* fix gather grad kernel diff

* fix gather review bug

* 【NPU】Fix reshape test & add grad test (#31776)

* fix

* fix

* [NPU] support fp16 for npu accuracy op (#31797)

* [NPU] support list of tensor input (#31801)

* support list of tensor as npu input

* add comment

* fix typo

* fix typo

* [NPU] add npu kernel for concat op (#31695)

* add npu kernel for concat op

* add npu kernel for concat op

* refine code

* update

* refine concat_grad

* [NPU] Support npu kernel for op elementwise_floordiv (#31822)

* [NPU] fix bug of lookup_table_v2_grad (#31834)

* [NPU] support default stream (#31510)

* [NPU] support mixed precision input for npu layer norm (#31847)

* support mixed precision input for npu layer norm

* fix layer_norm npu kernel
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* 【NPU】Support npu kernel for update_loss_scaling op (#31830)

* add update_loss_scaling_npu NPU kernel

* change TensorFromVec to Memset

* fix compile problem (#31850)

* [NPU] support npu for conditional_block op (#31854)

* 【NPU】Add int dtype kernel for reshape2 op (#31864)

* fix

* fix

* [NPU] fix some op bugs (#31855)

* fix some op bugs

* fix some bugs

* follow comments

* fix log level

* add ut

* [NPU] support fp16 of input for api pow (#31871)

* [NPU] add npu kernel for truncated_gaussian_random op (#31654)

* init

* add todo

* add npu kernel for truncated_gaussian_random

* add sync

* fix concat_grad

* fix typo

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix code style

* fix code style

* fix code

* Fix op test (#32231)

* fix conditional block (#32243)

* fix style code
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: NReventon_L <luyuxiang1994@qq.com>
Co-authored-by: Nroot <xiayanming@baidu.com>
Co-authored-by: Noyjxer <1728722986@qq.com>
Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: NOleNet <olenet@126.com>
Co-authored-by: NMeiyim <chen_xuyi@outlook.com>
Co-authored-by: Noyxuan-11 <963650125@qq.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

e6bc358d

14 4月, 2021 1 次提交

Delete grpc.cmake/distribeted/distributed_ops (#32166) · 22ea4c30

由 tianshuo78520a 提交于 4月 14, 2021

* Delete grpc.cmake/distribeted/distributed_ops

* reset operators/CMakeLists.txt

* rm test_transpiler_ops.py

* del test_transpiler_ops.py

22ea4c30

13 4月, 2021 1 次提交
- L
  
  upgrade to oneDNN2.2.1 (fix when prim descriptor or attr contain NaN) (#32227) · b9e543f8
  由 lidanqing 提交于 4月 13, 2021
  
  b9e543f8
12 4月, 2021 1 次提交
- T
  fix concat_grad on kunlun (#32151) · a2387ef2
  由 TTerror 提交于 4月 12, 2021
```
* fix concat_grad on kunlun

* fix concat_grad on kunlun
```
  a2387ef2
10 4月, 2021 1 次提交
- T
  
  Ci py3 gcc5.4 (#32045) · afa3720c
  由 tianshuo78520a 提交于 4月 10, 2021
  
  afa3720c
09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

06 4月, 2021 1 次提交
- T
  
  Del cudnn6 code2 (#31986) · b8b82b72
  由 tianshuo78520a 提交于 4月 06, 2021
  
  b8b82b72
01 4月, 2021 2 次提交
- T
  LOG CLEAN (#31819) · 0589ed21
  由 tangwei12 提交于 4月 01, 2021
```
* upgrade vlog

* train from dataset fetch optimize
```
  0589ed21
- W
  
  fix compilation error on rocm, test=develop (#31991) · eb3199fc
  由 wuhuanzhou 提交于 4月 01, 2021
  
  eb3199fc
31 3月, 2021 5 次提交

T

delete cuda9 code (#31883) · ea738dda
由 tianshuo78520a 提交于 3月 31, 2021

ea738dda

Update eigen version to f612df27 (#31832) · 495e7f9c

由 wuhuanzhou 提交于 3月 31, 2021

* update eigen version to f612df27, test=develop

* fix compilation error, test=develop

* remove patch command in eigen, test=develop

* fix compilation error caused by call Eigen function with float16 and bfloat16, test=develop

* fix unittest error, test=develop

* fix unittest error caused by precision, test=develop

* remove patch files used by old version eigen, test=develop

495e7f9c

update compilation with C++14 (#31815) · 587d99ae

由 wuhuanzhou 提交于 3月 31, 2021

* update compilation with C++14, test=develop

* fix compilation error in eigen, test=develop

587d99ae

T

fix some bug in transformer training in xpu (#31918) · 52b05bac
由 taixiurong 提交于 3月 31, 2021

52b05bac

[ROCM] Add ROCm support for warpctc op (#31817) · ef8323d4

由 furnace 提交于 3月 31, 2021

* bugfix for warpctc

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix WARPCTC_WITH_HIP invalid

* Add logs to find out why can not dlopen libwarpctc.so

* fix warpctc commit id

* fix unit test test_warpctc_op

* Optime failed log for dlopen

* Optime failed log for dlopen

* Delete extra changes

* fix warpctc commit id

* fix warpctc commit id

* Add is_compiled_with_rocm for test_warpctc_op

* fix warpctc commit id

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* fix code style problems

ef8323d4

30 3月, 2021 1 次提交
- Y
  
  Enhance cmake to support specifying CUDA_ARCH_NAME to Ampere. (#31923) · e50bc2c2
  由 Yiqun Liu 提交于 3月 30, 2021
  
  e50bc2c2
23 3月, 2021 1 次提交
- Z
  Update windows compiler and CI from VS2015 to VS2017 (#31652) · a70de87d
  由 Zhou Wei 提交于 3月 23, 2021
```
* modify windows CI to VS2017

* modify windows CI to VS2017

* modify windows CI to VS2017
```
  a70de87d

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功