提交 · cc9c6196795735328ac2c7c70f703b815b474fa0 · PaddlePaddle / Paddle

26 11月, 2020 1 次提交
- W
  
  optimize fast graph executor (#28962) · 173c22ae
  由 WangXi 提交于 11月 26, 2020
  
  173c22ae
27 10月, 2020 1 次提交
- Z
  add Fuse bn add act pass (#28196) · fdc06f21
  由 Zhang Ting 提交于 10月 27, 2020
```
* add fuse_bn_add_act pass
```
  fdc06f21
22 10月, 2020 1 次提交

Fix bug of fetch_async_op_handle when fetching the feed variable (#28194) · 1f3be859

由 Leo Chen 提交于 10月 22, 2020

* fix bug of fetch_async_op_handle

* revert some changes of test_buffer_shared_memory_reuse_pass

* revert some changes of test_buffer_shared_memory_reuse_pass

1f3be859

27 9月, 2020 1 次提交

Refine error msg in paddle/fluid/framework/details [part 2] (#27429) · 35074963

由 Leo Chen 提交于 9月 27, 2020

* refine broadcast_op_handle

* refine some error messages

* refine some files

* fix bug

* fix bug

* fix bug

* follow comments

* follow comments

35074963

24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

21 9月, 2020 2 次提交

[Feature] Enhance inplace addto strategy for gradient accumulation in static graph (#27112) · aba759ba

由 Leo Chen 提交于 9月 21, 2020

* support use add instead of sum to do gradient accumulation

* add inplace addto pass

* add grad_add op and inplace addto pass

* remove debug code

* code refine

* fix bug when sereral sum ops inserts at same op_idx

* fix Flags type

* add addto attribute for conv3d

* fix ut

* code clean

* fix type

aba759ba

Refine error msg in paddle/fluid/framework/details [part 1] (#25631) · bbc84e0f

由 Leo Chen 提交于 9月 21, 2020

* refine error msg in var_handle.h, test=develop

* refine all_reduce_op_handle

* fix some error msg

* refine variable_visitor

* refine threaded_ssa_graph_executor

* refine inplace related files

* refine executor related files

* refine fetch_op_handle.cc

* fix bug

* follow comments

bbc84e0f

03 9月, 2020 2 次提交
- F
  
  add template specialization for bfloat16 for gcc 4.8 compatability (#26985) · c8cc0945
  由 Feiyu Chan 提交于 9月 03, 2020
  
  c8cc0945
- J
  
  Add bfloat16 data type (#25402) · 95e1434b
  由 joanna.wozna.intel 提交于 9月 03, 2020
  
  95e1434b
02 9月, 2020 1 次提交

Add FetchAsyncOpHandle, and use it in FastThreadedExecutor (#26643) · 2d2c31a6

由 wanghuancoder 提交于 9月 02, 2020

* optimized transformation form tensor to numpy, test=develop

* Modify fetch op handle, from memcpy Sync to memcpy Async, test=develop

* modify CUDAPinnedPlace to CPUPlace, test=develop

* modify CPUPlace to CUDAPinnedPlace, and set default inplace to false, test=develop

* revert fetch_op_handle, add fetch_async_op_handle, test=develop

* revert fetch_op_handle, add fetch_async_op_handle, test=develop

* fix error msg report, test=develop

* fix bug in cpuplace, test=develop

* fix bug in unmerge and tensorarray modle, test=develop

* fix bug, double copy gpu memory, test=develop

* fix chenweihang¡¯s review advice, test=develop

2d2c31a6

25 8月, 2020 1 次提交

optimized transformation form tensor to numpy (#26447) · c1f5df52

由 wanghuancoder 提交于 8月 25, 2020

* optimized transformation form tensor to numpy, test=develop

* optimized transformation form tensor to numpy, pass pre-commit, test=develop

* modify fetchophandle zerocopy to deepcopy in PE&CUP, test=develop

* modify py:array construct, test=develop

* fix _fetch_var to use deep copy, test=develop

c1f5df52

07 8月, 2020 1 次提交
- T
  Fix/large scale fix (#25999) · 3755564a
  由 tangwei12 提交于 8月 07, 2020
```
* fix large scale KV 
* fix single training using async ssa graph
```
  3755564a
30 7月, 2020 1 次提交

Integrated Trainer of Parameter Server (API add... · caa90a65

由 tangwei12 提交于 7月 30, 2020

Integrated Trainer of Parameter Server (API add `fluid.contrib.layers.sparse_embedding` only) (#22957)

* Integrated Trainer of Parameter Server

caa90a65

10 7月, 2020 1 次提交
- C
  Polish ParallelExecutor exception process logic (#25449) · 4061aa64
  由 Chen Weihang 提交于 7月 10, 2020
```
* polish pe exception process logic, test=develop

* fix unittest, test=develop

* add unittests, test=develop
```
  4061aa64
07 7月, 2020 1 次提交

catch bad alloc exception (#25140) · 70d7d07f

由 hong 提交于 7月 07, 2020

* cat bad alloc exception; test=develop

* add unitest; test=develop

* move bad alloc catch to the first place; test=develop

* polish error message; test=develop

* polish error message; test=develop

* add mutex header; test=develop

70d7d07f

03 6月, 2020 1 次提交

Replace all errors thrown by LOG(FATAL) with PADDLE_THROW (#24759) · d1062d52

由 Chen Weihang 提交于 6月 03, 2020

* remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop

* remove ci test case, test=develop

* replace all LOG(FATAL) & polish message, test=develop

* fix typo, test=develop

* polish error info detail, test=develop

d1062d52

11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

23 4月, 2020 1 次提交
- Z
  
  fix isolated var fetch bug, test=develop (#24070) · acef55df
  由 Zeng Jinle 提交于 4月 23, 2020
  
  acef55df
20 4月, 2020 1 次提交

Optimize the error messages of paddle CUDA API (#23816) · 78170037

由 Zhou Wei 提交于 4月 20, 2020

* Optimize the error messages of paddle CUDA API, test=develop

* fix the error messages of paddle CUDA API, test=develop

* Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop

* remove build_ex_string,test=develop

* merge conflict,test=develop

78170037

19 4月, 2020 1 次提交

Support LoDTensorArray in fetch (#23645) · 2b896c1f

由 guofei 提交于 4月 19, 2020

* Support LoDTEnsorArray in fetch op

test=develop

* Support LoDTensorArray in fetch

test=develop

* Support LoDTensorArray in fetch

test=develop

* Support LoDTensorArray in fetch

test=develop

* Support LoDTensorArray in fetch

test=develop

* Support LoDTensorArray in fetch

test=develop

* Support LoDTensorArray in fetch

test=develop

* Support LoDTensorArray in fetch

test=develop

* Support LoDTensorArray in fetch

test=develop

* Support LoDTensorArray in fetch

test=develop

2b896c1f

14 4月, 2020 1 次提交

Correct reader device index (#23802) · c4979136

由 Zeng Jinle 提交于 4月 14, 2020

* correct reader device index, test=develop

* fix async executor scope var initialization, test=develop

c4979136

10 4月, 2020 2 次提交
- L
  
  API (BuildStrategy) error message enhancement. (#23462) · 06d4aa4e
  由 liym27 提交于 4月 10, 2020
  
  06d4aa4e
- Z
  Solve the conflict of ops with the same name, test for CI. (#23573) · 84cd45f6
  由 Zhen Wang 提交于 4月 10, 2020
```
* solve the conflict of ops with the same name. test=develop
```
  84cd45f6
09 4月, 2020 1 次提交

Remove: NGraph engine from PDPD repository (#23545) · 3baaee9a

由 mozga-intel 提交于 4月 09, 2020

* Remove the NGraph engine from PDPD repository
1. Each operator was removed from the operator's directory
2. Each test was removed from the unittest directory
3. The parallel executor support was removed from the PDPD
4. The CMake file was removed from the PDPD
5. The NG flags were removed from the repository
test=develop

* Remove ngraph from:
1. Cmake file
2. Python file
test=develop

3baaee9a

07 4月, 2020 1 次提交
- Q
  Make optimizer consistent in dygraph and static-graph and remove some LOG-INFO. (#23426) · 6162cf2f
  由 qingqing01 提交于 4月 07, 2020
```
* Make optimizer consistent in dygraph and static-graph and remove some LOG-INFO
```
  6162cf2f
05 4月, 2020 1 次提交
- T
  Revert "Solve the conflict of ops with the same name. (#23199)" (#23494) · 0b583235
  由 Tao Luo 提交于 4月 05, 2020
```
This reverts commit abe3e690.
test=develop
```
  0b583235
04 4月, 2020 1 次提交
- Z
  Solve the conflict of ops with the same name. (#23199) · abe3e690
  由 Zhen Wang 提交于 4月 04, 2020
```
* solve the conflict of ops with the same name. test=develop
```
  abe3e690
03 4月, 2020 1 次提交
- Z
  
  fix conflict of inferne partial feed with gpu parallel ssa graph executor, test=develop (#23400) · 29337f4e
  由 Zeng Jinle 提交于 4月 02, 2020
  
  29337f4e
01 4月, 2020 1 次提交
- Z
  
  add reader dependency pass, test=develop (#23301) · 3a21980b
  由 Zeng Jinle 提交于 4月 01, 2020
  
  3a21980b
25 3月, 2020 2 次提交
- Z
  
  add Tensor::IsSharedBufferWith method, test=develop (#23175) · 7ca77a90
  由 Zeng Jinle 提交于 3月 25, 2020
  
  7ca77a90
- Z
  
  fix graph attr copy issues, test=develop (#23191) · bae5930b
  由 Zeng Jinle 提交于 3月 24, 2020
  
  bae5930b
20 3月, 2020 1 次提交

Reader sequential and inference partial feed (#22699) · acfc9b8a

由 Zeng Jinle 提交于 3月 20, 2020

* sequential reader stage 1, test=develop

* fix ut, test=develop

* fix iterable=False reset bug, add some logs and polish code, test=develop

* inference feed partial data, test=develop

* Turn on keep_order=True for test, test=develop

* enhance ut to test more cases, test=develop

* test commit for reverting

* Revert "test commit for reverting", test=develop

This reverts commit 80aef42e.

* add ut of merged and unmerged results, test=develop

* add more uts for coverages and add en doc of api, test=develop

* follow comments, test=develop

* change note style, test=develop

acfc9b8a

09 3月, 2020 1 次提交

Imperative tracer refactoring (#22457) · d33c4343

由 Zeng Jinle 提交于 3月 09, 2020

* refine grad maker, test=develop

* refactor tracer stage 1, test=develop

* merge develop to solve conflict third times, test=develop

d33c4343

02 3月, 2020 1 次提交

Unmerged fetch list (#22635) · 89cfa491

由 Zhen Wang 提交于 3月 02, 2020

* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results.

* add the unit test for fetch_unmerged.

* update ut for multi-card and multi-cpu.

* add the error message and the user suggestion in FetchOpHandle. test=develop

89cfa491

23 2月, 2020 1 次提交
- T
  
  fix typo words (#22653) · d2ba91aa
  由 tianshuo78520a 提交于 2月 23, 2020
  
  d2ba91aa
22 2月, 2020 1 次提交
- T
  SYNC with communicaotor (#22344) · 66a31501
  由 tangwei12 提交于 2月 22, 2020
```
* add sync communicator and implement
```
  66a31501
13 2月, 2020 1 次提交
- Y
  Disable fusion_group for windows and mac in build_strategy. (#22549) · 96770f51
  由 Yiqun Liu 提交于 2月 13, 2020
```
test=develop
```
  96770f51
12 2月, 2020 1 次提交
- T
  fix bug with compiledProgram (#22495) · b0675c81
  由 tangwei12 提交于 2月 12, 2020
```
* add thread barrier for the compiled program
```
  b0675c81
11 2月, 2020 1 次提交

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

07 2月, 2020 1 次提交

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功