提交 · dc62a227d4b22a1f81e6476b8caaf7ddb8850daa · PaddlePaddle / Paddle

03 8月, 2021 1 次提交
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

22 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part4), test=develop (#30936) · 33429630
  由 Qi Li 提交于 2月 22, 2021
  
  33429630
04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
15 1月, 2021 1 次提交
- 石
  
  export global google flags to users, test=develop (#30448) · 715d8628
  由石晓伟提交于 1月 15, 2021
  
  715d8628
17 12月, 2020 1 次提交
- W
  Windows generate pdb and dump, for debug (#29628) · 0c59ad2a
  由 wanghuancoder 提交于 12月 17, 2020
```
* Windows generate pdb and dump, for debug

* fix code style, test=develop

* modify cmakelist
```
  0c59ad2a
20 11月, 2020 1 次提交
- G
  
  Fix gpu memory allocation bug. (#28703) · 1dad8cea
  由 gongweibao 提交于 11月 20, 2020
  
  1dad8cea
04 11月, 2020 1 次提交
- C
  
  show cpp stack when catch signal (#28415) · 23439b16
  由 Chen Weihang 提交于 11月 04, 2020
  
  23439b16
30 10月, 2020 1 次提交
- L
  
  hide some logs of p2p (#28307) · 18c86fb2
  由 Leo Chen 提交于 10月 30, 2020
  
  18c86fb2
21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

04 8月, 2020 1 次提交
- C
  
  refine init signal handler meg dumper (#25911) · 9b5a65b8
  由 Chen Weihang 提交于 8月 04, 2020
  
  9b5a65b8
29 7月, 2020 2 次提交

C
Unified paddle error format when catch system signal (#25765) · 2469b578
由 Chen Weihang 提交于 7月 29, 2020
```
* unified signal error format

* refine signal error message
```
2469b578

Simplify BufferedReader to improve DataLoader performance (#25648) · 1b3081b1

由 Chen Weihang 提交于 7月 29, 2020

* simplify buffered reader to improve DataLoader performance

* fix 22 failed unittests

* fix cuda pinned context condition

* fix test_reader_reset failed

* fix two failed unittests

* change unittest place

* polish error messaage

* polish cast op GetExpecctedKernelType

* remove debug info in unittest

1b3081b1

15 7月, 2020 1 次提交
- G
  refine PADDLE_ENFORCE (#25456) · c10dcff1
  由 GaoWei8 提交于 7月 15, 2020
```
* Refine PADDLE_ENFORCE in paddle/fluid/platform
test=develop
```
  c10dcff1
07 7月, 2020 1 次提交
- G
  Refine PADDLE_ENFORCE (#25369) · ea7e5325
  由 GaoWei8 提交于 7月 07, 2020
```
* refine PADDLE_ENFORCE
test=develop
```
  ea7e5325
03 6月, 2020 1 次提交

Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759) · d1062d52

由 Chen Weihang 提交于 6月 03, 2020

* remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop

* remove ci test case, test=develop

* replace all LOG(FATAL) & polish message, test=develop

* fix typo, test=develop

* polish error info detail, test=develop

d1062d52

01 6月, 2020 1 次提交
- W
  
  [Inference] [unittest] Inference unit tests rely on dynamic libraries (#24743) · f8e370ac
  由 Wilber 提交于 6月 01, 2020
  
  f8e370ac
19 5月, 2020 1 次提交
- L
  
  use vector instead of pointer, test=develop (#24620) · 1d034696
  由 Leo Chen 提交于 5月 19, 2020
  
  1d034696
29 4月, 2020 1 次提交

石

update the analysis predictor for multi-stream support, test=develop (#24046) · 17ac6e25

由石晓伟提交于 4月 29, 2020

* update the analysis predictor, test=develop

* update the unit test, test=develop

* no priority set before the inferface determined, test=develop

* interface name generalization, test=develop

17ac6e25

04 4月, 2020 1 次提交

Dev/fix init flags (#23465) · f297a332

由 Leo Chen 提交于 4月 04, 2020

* fix init_gflags with 'python -c', test=develop

* add test, test=develop

* use sys.executable instead of python, test=develop

* keep dummy, test=develop

f297a332

05 12月, 2019 1 次提交
- H
  Refine a Warning Which Can Occur Not Only During Init (#21546) · b241c732
  由 Huihuang Zheng 提交于 12月 05, 2019
```
As the title
```
  b241c732
04 12月, 2019 1 次提交
- P
  make config option DisableGlogInfo() able to mute all inference logs (#21318) · 122b37ce
  由 Pei Yang 提交于 12月 04, 2019
```
* make DisableGlogInfo able to mute all logs in inference. 
```
  122b37ce
03 12月, 2019 1 次提交
- H
  Add warning message when initialize GLOG failed. (#21487) · a71f53d7
  由 Huihuang Zheng 提交于 12月 03, 2019
```
Add warning message when initialize GLOG failed
```
  a71f53d7
18 10月, 2019 1 次提交
- W
  
  Fix dgc nan by stripping nccl from sparseReduce. (#20630) · 507afa8a
  由 WangXi 提交于 10月 17, 2019
  
  507afa8a
11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

30 8月, 2019 2 次提交
- L
  
  add dynamic C runtime support on windows, test=develop (#19502) · d6cb1a41
  由 liuwei1031 提交于 8月 30, 2019
  
  d6cb1a41
- Z
  
  remove signal raise msg, test=develop (#19527) · c2c5b1b9
  由 Zeng Jinle 提交于 8月 30, 2019
  
  c2c5b1b9
28 8月, 2019 1 次提交

Add signal message to stderr (#19421) · caf59d0f

由 Zeng Jinle 提交于 8月 28, 2019

* add signal message to stderr, test=develop

* add unittests for ugly SignalHandle, test=develop

caf59d0f

16 8月, 2019 1 次提交
- Z
  
  move_flags_to_unified_files_for_management, test=develop (#19224) · 708bd979
  由 Zeng Jinle 提交于 8月 16, 2019
  
  708bd979
04 7月, 2019 1 次提交
- C
  Enhance execution error info (#18482) · 55baeced
  由 chengduo 提交于 7月 04, 2019
```
* enhance execution error info
test=develop
```
  55baeced
05 6月, 2019 1 次提交
- C
  remove InstallFailureSignalHandler (#17828) · d1169afa
  由 chengduo 提交于 6月 05, 2019
```
test=develop
```
  d1169afa
18 4月, 2019 1 次提交
- G
  
  Polish DGC code (#16818) · cbdb8a17
  由 gongweibao 提交于 4月 18, 2019
  
  cbdb8a17
28 3月, 2019 1 次提交
- G
  
  Add DGC(Deep Gradient Compression) interface. (#15841) · eb83abea
  由 gongweibao 提交于 3月 28, 2019
  
  eb83abea
15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

21 2月, 2019 1 次提交

Profiler refine and add CUDA runtime api tracer (#15301) · a83e4704

由 Dun 提交于 2月 21, 2019

* refine profiler && add runtime tracer

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* fix bug && test=develop

* add thread id map && test=develop

* test=develop

* testing

* bug fix

* remove cuda event && refine code && test=develop

* test=develop

* test=develop

* test=develop

* fix windows temp file && test=develop

* test=develop

* fix windows bug && test=develop

* fix start up issue && test=develop

* code polish &&  test=develop

* remove unused code && test=develop

* add some cupti cbid && test=develop

* add FLAGS_multiple_of_cupti_buffer_size && test=develop

* fix compile error && test=develop

* add keyword && test=develop

* fix && test=develop

* code polish && test=develop

a83e4704

21 12月, 2018 1 次提交

[Feature] Add Temporary Allocator (#14875) · 79bd6dfa

由 chengduo 提交于 12月 21, 2018

* Add Temporal Allocator

* add Temporay Allocator to DeviceContext
test=develop

* code refine
test=develop

* fix mean_iou
test=develop

* Add DeviceTemporaryAllocator
test=develop

* fix conv_op bug
test=develop

* small fix
test=develop

* code refine
test=develop

* log refine
test=develop

* fix unit test
test=develop

* move double check

* refine concat_and_split
test=develop

* add limit_of_temporary_allocation
test=develop

* fix name
test=develop

79bd6dfa

05 12月, 2018 2 次提交
- T
  remove jit namespace · b523787f
  由 tensor-tang 提交于 12月 05, 2018
```
test=develop
```
  b523787f
- T
  remove jit namespace · 4a93db92
  由 tensor-tang 提交于 12月 05, 2018
```
test=develop
```
  4a93db92
04 12月, 2018 1 次提交

[Feature] multi process multi gpu dist training, boost v100 performance by 20% (#14661) · 29d9fb53

由 Wu Yi 提交于 12月 04, 2018

* wip multi process multi gpu dist training

* workable for p2p

* update test=develop

* change back env name test=develop

* fix alloc init

* fix cpu build test=devlop

* fix mac tests test=develop

* refine code

* refine test=develop

29d9fb53

26 11月, 2018 1 次提交
- M
  Revert the changes of VLOG · 53433d7f
  由 minqiyang 提交于 11月 26, 2018
```
test=develop
```
  53433d7f

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功