提交 · 5b910f9527cba7af896b349c15e4597929391d0c · PaddlePaddle / Paddle

28 5月, 2021 1 次提交
- Z
  
  fix ninja compile bug of warpctc and mkldnn (#33155) · 5b910f95
  由 Zhou Wei 提交于 5月 28, 2021
  
  5b910f95
26 5月, 2021 1 次提交
- Z
  Fix ninja compilation bug and warning on windows (#32987) · accf284b
  由 Zhou Wei 提交于 5月 26, 2021
```
* fix ninja compilation bug on windows

* polish windows ci

* polish windows ci
```
  accf284b
11 5月, 2021 1 次提交
- W
  
  fix cmake expressions error, test=develop (#32815) · 84eca16d
  由 wuhuanzhou 提交于 5月 11, 2021
  
  84eca16d
15 4月, 2021 1 次提交

【NPU】Cherry-pick ascendrc ops code by 0325 to develop (#32197) · e6bc358d

由 zhang wenhui 提交于 4月 15, 2021

* merge 31065

* Fix typo of selected_npus (#31230)

* merge 31249

* [NPU] Support npu op pow and pow grad (#31247)

* [NPU] Support npu op: (1) pow (2) pow_grad

* Support fp16

* Fix pow npu fp16 test (#31256)

* support list of list attribute for NPU (#31299)

* support list of list attribute for NPU

* fix compile problem

* fix reference

* [NPU] Support npu op: (1) slice (2) slice_grad (#31275)

* fix reading flags from env (#31329)

* merge 31347

* [NPU] Support npu op layer_norm and layer_norm_grad (#31310)

* init commit, add layer_norm npu kernel

* fix typo

* add unittest

* add unittest

* fix bug

* fix bug

* refine ut

* [NPU] add npu kernel for equal op (#31393)

* add npu kernel for equal op

* refine code

* add more ut

* update year

* [NPU] Support npu kernel for shape op  (#31427)

* add shape npu

* fix

* fix

* fix endif (#31431)

* Fix pow, use fillD instead of broadcast (#31433)

* Fix pow, refine code (#31440)

* fix cmake of cryptopp to avoid downloading every time (#31451)

* [NPU] squeeze and unsqueeze op for ascend (#31452)
Co-authored-by: Nroot <xiayanming@baidu.com>

* Support npu kernel for gather op (#31458)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* 【NPU】add scale op for npu (#31499)

* add scale npu

* fix

* fix

* Support TensorFormVector, TensorToVector of bool type (#31518)

* support TensorFormVector, TensorToVector of bool type

* add ut

* fix compile problem

* 【NPU】support npu kernel for fill_constant op (#31521)

* add fill_constant npu

* add fill_constant npu

* fix

* cherry-pick 31422, solve conflict

* 【NPU】Support npu kernel for matmul op (#31544)

* add matmulv2_npu

* add matmul

* add matmul

* [NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571)

* [NPU] Support npu op elementwise_max (#31574)

* 【NPU】add relu op for  npu (#31515)

* add relu npu

* fixed

* fix

* 【NPU】Suppert npu kernel for reshape2 op (#31524)

* add reshape2 npu

* add reshpe2

* [NPU] Support npu kernel for gather op fix bug (#31541)

* add gather npu op

* code review done

* update python new line

* precommit

* fix review

* del commit

* update gather_grad

* fix bug

* fix bug

* [NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457)

* Support npu kernel for amp_check_finite_and_unscale_npu op

* support EnforceNotMet exception

* fix exception bug

* modify python unittest

* precommit

* update c++ unittest

* fix review

* fix review

* [NPU] accuracy op (#31492)

* accuracy op

* fix license

* fix

* add test and fix bug

* [NPU] add Assign OP (#31561)

* add assign op

* add test assign npu test

* dele if def
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] fix npu op elementwise_mul_grad (#31592)

* 【NPU】Support npu op gelu and gelu_grad (#31530)

* Support npu op gelu and gelu_grad

* Support npu op gelu and gelu_grad

* [NPU] fix assgin cmake (#31595)

* fix gather_grad bug (#31607)

* [NPU] add range op (#31560)

* add range op

* fix codestyle; call GetSize directly
Co-authored-by: Noyjxer <1728722986@qq.com>

* 【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573)

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* Support npu op elementwise_div and elementwise_div_grad

* [NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600)

* [NPU] Support npu op logicalnot_op (#31534)

* [NPU] Support npu op elementwise_min (#31575)

* [NPU] Support npu op elementwise_pow (#31576)

* [NPU] Support npu op table_lookup_v2 and table_lookup_v2_grad (#31399)

* [npu] support npu kernel `table_lookup_v2`

* clean up

* +python test

* +cmake

* clean up

* remove int8 kernel
+ python unitest for fp16

* clean up

* [NPU] support npu kernel for `less_than` (#31327)

* [npu] support npu kernel for `less than`

* remove int* kernel

* cleanup

* [NPU] Support npu kernel scatter op (#31624)

* Support npu kernel scatter op

* Add more test

* [NPU] fix allocator min chunk size (#31632)

* [NPU] Support NPU kernel cast op (#31635)
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for sgd (#31639)

* 【NPU】Support NPU kernel for reduce_sum op v2 (#31620)

* add reduce_sum

* fix broadcastd

* fix test

* fix

* add unsqueeze in reduce_sum

* add template

* add unittest for keep_dim

* test reduce_all
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* [NPU] add npu kernel for adam (#31644)

* add npu kernel for adam

* refine code

* disable test

* modify atol

* 【NPU】Support npu kernel for mul op (#31584)

* add mul

* add test mul

* [NPU] add npu kernel for softmax_with_cross_entropy (#31656)

* init

* fix bugs

* [NPU] add npu kernel for mean Op (#31562)

* update mean op

* update mean op

* give a better test activation
Co-authored-by: Noyjxer <1728722986@qq.com>

* Revert "[NPU] add npu kernel for mean Op (#31562)" (#31665)

This reverts commit 468ac699.

* 【NPU】Add TensorCopy to NPU kernel for reduce_sum op  (#31667)

* update unittest

* add TensorCopy in npu grad kernel

* [NPU] Support npu op `expand` (#31405)

* [npu] support npu kernel  for `expand`

* [NPU] fix shape of dx in mul_grad (#31675)

* fix shape of dx

* refine code

* [NPU] add Increment op (#31563)

* add increment

* fix

* update test increment op inplace

* update increment op

* increment b = 2
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] add NPU add topk  (#31596)

* add topk op

* add cmake

* update topk npu op

* refactor func

* fix test not go npu TopKD bug

* NPUPlace(4) to NPUPlace(0)

* update comment
Co-authored-by: Noyjxer <1728722986@qq.com>

* [NPU] Support NPU kernel sum op (#31671)

* [NPU] npu support `transpose` (#31486)

* cherry-pick 31564, solve conflict

* [NPU] Fix bug: Fix calculation errors of pow grad npu kernel (#31699)

* [NPU] Support testing grad of NPU ops in OpTest (#31697)

* [NPU] Support NPU kernel of stack op (#31711)

* [NPU] Remove redundant ctest of top_k_op_npu_test (#31718)

* [NPU] fix reshape npu op kernel (#31726)

* rename npu op file

* fix reshape

* [NPU] change transpose to transpose2 (#31734)

* change transpose to transpose2

* fix bug

* [NPU] Support  mean npu kernel (#31729)

* [NPU] fix some bugs of npu op (#31739)

* fix softmax

* fix mean

* fix lookup_table_v2

* 【NPU】Fix npu kernel elementwise_div_grad  (#31753)

* [NPU] fix the grad kernel diff bug of gather op (#31757)

* fix gather grad kernel diff

* fix gather grad kernel diff

* fix gather review bug

* 【NPU】Fix reshape test & add grad test (#31776)

* fix

* fix

* [NPU] support fp16 for npu accuracy op (#31797)

* [NPU] support list of tensor input (#31801)

* support list of tensor as npu input

* add comment

* fix typo

* fix typo

* [NPU] add npu kernel for concat op (#31695)

* add npu kernel for concat op

* add npu kernel for concat op

* refine code

* update

* refine concat_grad

* [NPU] Support npu kernel for op elementwise_floordiv (#31822)

* [NPU] fix bug of lookup_table_v2_grad (#31834)

* [NPU] support default stream (#31510)

* [NPU] support mixed precision input for npu layer norm (#31847)

* support mixed precision input for npu layer norm

* fix layer_norm npu kernel
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* 【NPU】Support npu kernel for update_loss_scaling op (#31830)

* add update_loss_scaling_npu NPU kernel

* change TensorFromVec to Memset

* fix compile problem (#31850)

* [NPU] support npu for conditional_block op (#31854)

* 【NPU】Add int dtype kernel for reshape2 op (#31864)

* fix

* fix

* [NPU] fix some op bugs (#31855)

* fix some op bugs

* fix some bugs

* follow comments

* fix log level

* add ut

* [NPU] support fp16 of input for api pow (#31871)

* [NPU] add npu kernel for truncated_gaussian_random op (#31654)

* init

* add todo

* add npu kernel for truncated_gaussian_random

* add sync

* fix concat_grad

* fix typo

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix compile

* fix code style

* fix code style

* fix code

* Fix op test (#32231)

* fix conditional block (#32243)

* fix style code
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: NReventon_L <luyuxiang1994@qq.com>
Co-authored-by: Nroot <xiayanming@baidu.com>
Co-authored-by: Noyjxer <1728722986@qq.com>
Co-authored-by: Nyinhaofeng <66763551+yinhaofeng@users.noreply.github.com>
Co-authored-by: NOleNet <olenet@126.com>
Co-authored-by: NMeiyim <chen_xuyi@outlook.com>
Co-authored-by: Noyxuan-11 <963650125@qq.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

e6bc358d

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

31 3月, 2021 1 次提交

[ROCM] Add ROCm support for warpctc op (#31817) · ef8323d4

由 furnace 提交于 3月 31, 2021

* bugfix for warpctc

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix warpctc commit id

* fix WARPCTC_WITH_HIP invalid

* Add logs to find out why can not dlopen libwarpctc.so

* fix warpctc commit id

* fix unit test test_warpctc_op

* Optime failed log for dlopen

* Optime failed log for dlopen

* Delete extra changes

* fix warpctc commit id

* fix warpctc commit id

* Add is_compiled_with_rocm for test_warpctc_op

* fix warpctc commit id

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed

* fix code style problems

ef8323d4

23 3月, 2021 1 次提交
- Z
  Update windows compiler and CI from VS2015 to VS2017 (#31652) · a70de87d
  由 Zhou Wei 提交于 3月 23, 2021
```
* modify windows CI to VS2017

* modify windows CI to VS2017

* modify windows CI to VS2017
```
  a70de87d
16 3月, 2021 1 次提交

Optimize compilation with Ninja (#31449) · 41e9ecfd

由 wuhuanzhou 提交于 3月 16, 2021

* Optimize compilation with Ninja, notest, test=windows_ci, test=windows_op

* no cache on windows ci, notest, test=windows_ci, test=windows_op

* delete /Zc:inline compiled in NVCC, notest, test=windows_ci, test=windows_op

* fix test_warpctc_op, notest, test=windows_ci

* remove test code, test=develop

41e9ecfd

26 10月, 2020 1 次提交
- X
  
  add git mirror url to speed up clone (#28241) · d2522197
  由 XiaoguangHu 提交于 10月 26, 2020
  
  d2522197
27 9月, 2020 1 次提交

add support to float64 input of warpctc op. (#27399) · 1501a80f

由 Li Fuchen 提交于 9月 27, 2020

* add float64 input to ctc_loss

* modified error message of  warpctc

* update repo and tag of warpctc

* add test for warpctc with float64 input

* modified warpctc.cmake to make sure build always

* resolved sample code bug of warpctc

* add core.ops in warpctc dygraph

* fix a bug of test

1501a80f

09 9月, 2020 1 次提交
- W
  
  [cuda11 support] change the CMakeLists to support the cuda11 (#27124) · c71d79b1
  由 wangchaochaohu 提交于 9月 09, 2020
  
  c71d79b1
11 4月, 2020 1 次提交
- Z
  
  modify cmake/external/*.cmake (#23710) · faf284a9
  由 zhangchunle 提交于 4月 11, 2020
  
  faf284a9
09 1月, 2020 1 次提交
- Z
  tweak the interface of cache_third_party function - expose the SOURCE_DIR for... · 4f7a2bd0
  由 zhouwei25 提交于 1月 09, 2020
```
tweak the interface of cache_third_party function - expose the SOURCE_DIR for each external library (#21899)
```
  4f7a2bd0
26 12月, 2019 1 次提交
- Z
  
  remove patch command and file of warpctc to Improved quality of Paddle Repo (#21929) · 8b15acd7
  由 zhouwei25 提交于 12月 26, 2019
  
  8b15acd7
24 12月, 2019 1 次提交
- Z
  
  fix cp bug of warpctc repository,test=develop (#21901) · 3e1404d2
  由 zhouwei25 提交于 12月 24, 2019
  
  3e1404d2
16 12月, 2019 1 次提交
- Z
  
  fix wrong commitID with patch file of warpctc (#21755) · 34dc7106
  由 zhouwei25 提交于 12月 16, 2019
  
  34dc7106
12 12月, 2019 1 次提交
- Z
  
  fix the bug that cannot pathch command for the second time (#21596) · 03133c2c
  由 zhouwei25 提交于 12月 12, 2019
  
  03133c2c
04 12月, 2019 1 次提交

modify the personal repo address of eigen and warpctc (#21445) · 46401786

由 silingtong123 提交于 12月 04, 2019

* modify the repo address of eigen and warpctc

* fix the eigen not work on windows

* fix the eigen and warpctc can't recompile

46401786

25 11月, 2019 1 次提交
- Z
  
  Cache 3rd source code, improve stability, reduce the compilation time (#21190) · 341dee06
  由 zhouwei25 提交于 11月 25, 2019
  
  341dee06
11 11月, 2019 1 次提交
- M
  Add Shallow clone to ExternalProjects (#21060) · 6cc544aa
  由 Michał Gallus 提交于 11月 11, 2019
```
test=develop
```
  6cc544aa
18 10月, 2019 1 次提交
- W
  add support to gcc8, add docker env test=develop (#19807) · 9e594823
  由 wopeizl 提交于 10月 18, 2019
```
* add support to gcc8, add docker env test=develop
```
  9e594823
20 5月, 2019 1 次提交
- W
  fix the random compilation failure on windows test=develop (#17475) · ca3ba378
  由 wopeizl 提交于 5月 20, 2019
```
* fix the random compilation failure on windows 
```
  ca3ba378
07 5月, 2019 1 次提交
- T
  remove unused FLAGS_warpctc_dir (#17162) · ff1661f1
  由 Tao Luo 提交于 5月 07, 2019
```
* remove unused FLAGS_warpctc_dir

test=develop

* remove FLAGS_warpctc_dir

test=develop
```
  ff1661f1
20 2月, 2019 1 次提交
- T
  remove legacy $external_project_dependencies variable · 60cb0b97
  由 Tao Luo 提交于 2月 20, 2019
```
test=develop
```
  60cb0b97
21 1月, 2019 1 次提交
- T
  
  remove legacy MOBILE_INFERENCE option · 9353bc58
  由 Tao Luo 提交于 1月 21, 2019
  
  9353bc58
18 12月, 2018 2 次提交
- P
  
  add ctc support for windows · 19ebd8b4
  由 peizhilin 提交于 12月 18, 2018
  
  19ebd8b4
- P
  
  add mkl,ctc support for windows · 5a6d7fe2
  由 peizhilin 提交于 12月 18, 2018
  
  5a6d7fe2
01 11月, 2018 1 次提交
- D
  
  clean cmake. test=develop · 0a180584
  由 dzhwinter 提交于 11月 01, 2018
  
  0a180584
14 10月, 2018 1 次提交
- W
  
  compile in linux · 3ae96450
  由 wanghaoshuang 提交于 10月 14, 2018
  
  3ae96450
29 4月, 2018 1 次提交
- D
  "fix cuda9 error" (#10271) · c2620402
  由 dzhwinter 提交于 4月 29, 2018
```
* "fix cuda9 error"

* "change commit id"

* "remote git tag"
```
  c2620402
08 4月, 2018 1 次提交
- Y
  Fix cpplint errors with paddle/fluid/platform/dynload (#9715) · e185502e
  由 Yi Wang 提交于 4月 07, 2018
```
* Update source files.

* Update headers

* Update

* Update

* Update

* Update

* Fix a CMake dependency
```
  e185502e
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
07 2月, 2018 1 次提交
- Y
  Disable BUILD_TESTS for warpctc (#8210) · b41205d9
  由 Yu Yang 提交于 2月 07, 2018
```
* It will sightly faster compile and make warpctc compile
well on CUDA 9 and GCC 5.5
```
  b41205d9
09 1月, 2018 1 次提交

Port WarpCTC Operator (#5107) · b5fda272

由 Yiqun Liu 提交于 1月 09, 2018

* Add Seq2BatchFunctor, which will be used in WarpCTCOp.

* Implement WrapCTCFunctor and WrapCTCKernel.

* Add unittest of warpctc_op.

* Modify the check_output inferface in python unittest framework to allow check a subset of outputs.

* Use absolute offset lod in warpctc_op and related functors.

* Refine the comments of warpctc_op.

* The new python unittest supports checking a subset of the outputs, so revoke the previous change.

* Rename the transform from LoDTensor to Tensor with shape [max_sequence_length, num_sequences, sequence_width] to PaddingSequenceFunctor.

* Update to the newest codes.

* Rename the PaddingSequenceFunctor to PaddingLoDTensorFunctor and remove the computation of dimensions out of the functos.

b5fda272

17 11月, 2017 1 次提交
- Y
  Support the build for multiple architectures at one cmake command (iOS). (#5677) · c808fbbf
  由 Yiqun Liu 提交于 11月 17, 2017
```
* Support the build for multiple architectures at one cmake command (iOS).

* Update the documentations.
```
  c808fbbf
13 10月, 2017 1 次提交
- H
  
  Add GIT tag for all cmake dependencies. (#4776) · ce91f85e
  由 helinwang 提交于 10月 12, 2017
  
  ce91f85e
12 10月, 2017 1 次提交
- H
  
  Use MinSizeRel compile third_party library when build for mobile inference. · 773d064a
  由 hedaoyuan 提交于 10月 12, 2017
  
  773d064a
04 9月, 2017 1 次提交
- L
  
  Refine the toolchain file of Android to use clang as default compiler. · 8e5f5432
  由 Liu Yiqun 提交于 9月 04, 2017
  
  8e5f5432
30 8月, 2017 2 次提交
- L
  Remove the linking of train-related libraries when cross-compiling for Android and iOS. · aeea8ab1
  由 Liu Yiqun 提交于 8月 30, 2017
```
Recover the mistakenly deleted WARPCTC variable in cmake.
```
  aeea8ab1
- L
  
  Deliver the cross-compilng platform-specific args to external libraries. · d57ffc45
  由 Liu Yiqun 提交于 8月 30, 2017
  
  d57ffc45

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功