提交 · 4e8bc02461826b7b62919e5e9ba0833027b82859 · PaddlePaddle / Paddle

03 3月, 2020 1 次提交
- Z
  add fluid.device_guard to specify the device type for Op (#22254) · 4e8bc024
  由 Zhang Ting 提交于 3月 03, 2020
```
* add fluid.device_guard to specify the device type for Op
```
  4e8bc024
02 3月, 2020 2 次提交

由 Zhen Wang 提交于 3月 02, 2020

* update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results.

* add the unit test for fetch_unmerged.

* update ut for multi-card and multi-cpu.

* add the error message and the user suggestion in FetchOpHandle. test=develop

89cfa491

H
support customized download command in dataset (#22782) · 53a2b68f
由 hutuxian 提交于 3月 02, 2020
```
* user can call dataset.set_download_cmd to set its customized download cmd
* add UT to cover this scenario
```
53a2b68f

01 3月, 2020 1 次提交
- W
  add sum op support for fusion group (#22771) · ca9e77a8
  由 wangchaochaohu 提交于 3月 01, 2020
```
* Add the codegen and auto fusion for sum Op  in fusion group
```
  ca9e77a8
28 2月, 2020 1 次提交
- T
  
  fix typo word (#22784) · 433cef03
  由 tianshuo78520a 提交于 2月 28, 2020
  
  433cef03
26 2月, 2020 1 次提交

support cond in clone, test=develop (#22657) · b2c1be85

由 Leo Chen 提交于 2月 26, 2020

* support cond in clone, test=develop

* refine code, test=develop

* refine code, test=develop

* follow comments, test=develop

* refine code, test=develop

b2c1be85

25 2月, 2020 1 次提交

PaddleBox Framework Part2 (#22466) · 175954d8

由 hutuxian 提交于 2月 25, 2020

* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.

175954d8

24 2月, 2020 1 次提交

Add an inference interface to disable FC padding (#22097) · cdf5f6fb

由 GaoWei8 提交于 2月 24, 2020

* Add an interface of disabling FC padding
* fix bert regression
* polish fc padding interface
* recover pass function
* fix argument error
* fix mkldnn error

cdf5f6fb

23 2月, 2020 1 次提交
- T
  
  fix typo words (#22653) · d2ba91aa
  由 tianshuo78520a 提交于 2月 23, 2020
  
  d2ba91aa
22 2月, 2020 1 次提交
- T
  SYNC with communicaotor (#22344) · 66a31501
  由 tangwei12 提交于 2月 22, 2020
```
* add sync communicator and implement
```
  66a31501
21 2月, 2020 1 次提交
- Y
  
  Add the support of fp16 in fusion_group (#22239) · 22bbd547
  由 Yiqun Liu 提交于 2月 21, 2020
  
  22bbd547
18 2月, 2020 1 次提交
- W
  add flag to control profile level in python API (#22319) · c65c6ae5
  由 wangchaochaohu 提交于 2月 18, 2020
```
* add python flag to control profile level test=develop
```
  c65c6ae5
17 2月, 2020 1 次提交
- 1
  
  support dumping params/grads in transpiler mode (#22490) · 00594c1c
  由 123malin 提交于 2月 17, 2020
  
  00594c1c
15 2月, 2020 1 次提交
- F
  
  remove python inference warning (#22602) · f7eafca8
  由 flame 提交于 2月 15, 2020
  
  f7eafca8
14 2月, 2020 1 次提交

fix fc_lstm_fuse when multi sub-graph use same fc_bias. test=develop (#22551) · 9a8203aa

由 Wilber 提交于 2月 14, 2020

当一个模型中有多个fc_lstm子图的时候，且其中fc共用了同一个persistable的bias，此时不应该将bias节点删除，只将非persistable的节点去除即可。

9a8203aa

13 2月, 2020 2 次提交
- Z
  [Ernie GPU Optim]: Fuse three fc to multihtead matmul (#22486) · 8acd745c
  由 Zhaolong Xing 提交于 2月 13, 2020
```
* 1. optim multihead matmul: fuse three fc to multihtead matmul

test=develop

* fix conflict
test=develop

* fix comments
test=develop
```
  8acd745c
- Y
  Disable fusion_group for windows and mac in build_strategy. (#22549) · 96770f51
  由 Yiqun Liu 提交于 2月 13, 2020
```
test=develop
```
  96770f51
12 2月, 2020 1 次提交
- T
  fix bug with compiledProgram (#22495) · b0675c81
  由 tangwei12 提交于 2月 12, 2020
```
* add thread barrier for the compiled program
```
  b0675c81
11 2月, 2020 5 次提交

Paddlebox about box_wrapper (#22497) · 1a7962be

由 hutuxian 提交于 2月 11, 2020

Refine PaddleBox Framework, Main functions: 
* Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC.
* Replace FeedPass with new interface: BeginFeedPass & EndFeedPass
* Refactor Pull/Push Sparse Function in box_wrapper.
* Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct.
* Cache copied keys in pull sparse in order to reuse it in push period.

1a7962be

multi-loss optimization by adding a DownpourOpt worker (#22025) · 2235ee1a

由 yaoxuefeng 提交于 2月 11, 2020

* update

* update test=develop

* update compile set test=develop

* update compile set test=develop

* update test=develop

* update test=develop

* update test=develop

* update compile setting test=develop

* update compile setting test=develop

* update run demo test=develop

* update test=develop

* update test=develop

* fix test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update format test=develop

* update format test=develop

* update style test=develop

* update style test=develop

* change style test=develop

* change style test=develop

* change style test=develop

* add dataset unittest test=develop

* update test=develop

* update for record test=develop

* udpate style for record test=develop

* update for record test=develop

* update for record test=develop

* update for record test=develop

* fix format test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

2235ee1a

Improve transpose performance with tile sm copy, test=develop (#22311) · 54970444

由 zhaoyuchen2018 提交于 2月 11, 2020


* Refine code, fix select tile error,test=develop

* Refine element type and some comments, test=develop

* Refine comments and gpu utils, test=develop

* Remove some useless condition

* Refine floor and ceil, test=develop

* refine for loop. test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

54970444

Compile without nccl deps. [1/2] (#22509) · a90fa540

由 Wilber 提交于 2月 11, 2020

支持不依赖nccl进行编译。[1/2]

多卡下，如果没有打开WITH_NCCL开关编译，多卡不能通信，则只能选择一张卡使用。
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

a90fa540

G
Make assign op support LoDTensorArray and modify while_loop API (#22309) · 3a59a7a1
由 guofei 提交于 2月 11, 2020
```
This PR makes assign op support LoDTensorArray and enable the loop_vars in
while_loop to support tuple or list.
```
3a59a7a1

07 2月, 2020 1 次提交

Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038

由 Yiqun Liu 提交于 2月 07, 2020

* Add the first implememtation of fusion_group op #19621 (#3)

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Enable generating code for a given subgraph. #21126 (#4)

* Enable generating code for a given subgraph.

* Support sorting the subgraph.

* Remove the rearange of expressions because we use the sorted subgraph directly.

* Enable generating code for a subgraph which is composed of grad ops.

* Use expression information to check the accuracy in unittest.

* Separate load and store from computation expressions.
test=develop

* Improve the loading statements in generated codes.
test=develop

* Remove unused arguments from formal list.
test=develop

* Enable the detection of subgraph of grad ops.

* Generate code for detected subgraph in fusion_group_pass.

* Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
test=develop

* Fix a bug when checking whether the shape of all inputs are the same.

* Add debug information.

* Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)

test=develop

* Call subgraph_detector in fusion_group pass.
test=develop

* Disable fusion_group when WITH_GPU is OFF.
test=develop

* Refine all PADDLE_ENFORCE message.
test=develop

* Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
test=develop

* Follow review comments.
test=develop

dcfb6038

06 2月, 2020 1 次提交
- J
  Add dequant-scale squash (#22409) · 17f2c089
  由 joanna.wozna.intel 提交于 2月 06, 2020
```
* Add dequant scale squash

test=develop

* Correct dequant-scale squash test

test=develop
```
  17f2c089
05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

04 2月, 2020 2 次提交
- X
  fix copy table bug (#22432) · d51ffe86
  由 xujiaqi01 提交于 2月 04, 2020
```
* fix copy table bug of lost some feasign
* test=develop
```
  d51ffe86
- 石
  
  remove anakin from code, test=develop (#22420) · e1b0d7cb
  由石晓伟提交于 2月 04, 2020
  
  e1b0d7cb
02 2月, 2020 1 次提交
- X
  add GeneralRoleMaker (#22295) · 371f377b
  由 xujiaqi01 提交于 2月 02, 2020
```
* add GeneralRoleMaker which is for general usage
* test=develop
```
  371f377b
31 1月, 2020 1 次提交

[DNNL] Fix accuracy in INT8 FC (#22404) · 269db0d1

由 Michał Gallus 提交于 1月 31, 2020

* Enable quantize to reorder to nchw as well

* Correct FC MKL-DNN input dim requirements to accept 3D

* Improve DNNL FC format, error and 3D input handling

test=develop

* Improve error checking in FC

test=develop

* Improve PADDLE_ENFORCE messages in fc-related files

* Remove data layout attribute from obligatory pass args

test=develop

* Fix message in fc_mkldnn_pass to be logically correct

test=develop

269db0d1

25 1月, 2020 1 次提交
- J
  
  Restore requantize squash (#22399) · 3099d9d4
  由 joanna.wozna.intel 提交于 1月 25, 2020
  
  3099d9d4
19 1月, 2020 1 次提交
- A
  
  [Bugfix] Preserve shape in inpalce operators (#22360) · e7a9f6bb
  由 Adam 提交于 1月 19, 2020
  
  e7a9f6bb
17 1月, 2020 2 次提交

Implement a common python unittest to test the ir passes. (#22209) · b7cac50b

由 Yiqun Liu 提交于 1月 17, 2020

* Implement a common python unittest to test the ir passes.
test=develop

* Save the results in np.array and support to startup on CPU.
test=develop

* Fix the unittest.
test=develop

* Add check_program to check whether the optimized program is different from the origin one.
test=develop

* Remove the inferface all_ops.
test=develop

* Add exception test in pass_test.
test=develop

b7cac50b

T
integrated HALF_ASYNC to communicator (#21869) · 82bc814a
由 tangwei12 提交于 1月 17, 2020
```
* add half_async in the communicator
* fix DistributedStrategy
```
82bc814a

16 1月, 2020 2 次提交

Remove unused inputs for some operators (#22284) · 3e5744aa

由 Leo Chen 提交于 1月 16, 2020

* remove unused inputs, test=develop

* remove unused inputs, test=develop

* update dtype, test=develop

* remove unused inputs, test=develop

* update op_use_default_grad_op_maker, tese=develop

* resolve conflicts, test=develop

* follow comments, test=develop

* update center_loss_grad, test=develop

3e5744aa

L

change std::cout to log(INFO), vlog (#22316) · 895f8da7
由 lidanqing 提交于 1月 16, 2020

895f8da7

15 1月, 2020 1 次提交
- Z
  
  fix the bug of assert_is_op_output. test=develop (#22262) · e40cfb10
  由 Zhen Wang 提交于 1月 15, 2020
  
  e40cfb10
14 1月, 2020 3 次提交
- W
  
  improve placement pass tests code coverage (#22197) · d3a66473
  由 Wojciech Uss 提交于 1月 14, 2020
  
  d3a66473
- Z
  faster build by reduce by-product, reduce linking library and fix compile... · 549e6de7
  由 zhouwei25 提交于 1月 14, 2020
```
faster build by reduce by-product, reduce linking library and fix compile warning of std=c++11 (#22164)
```
  549e6de7
- X
  add collective communication library in fleet (#22211) · e3a457d3
  由 xujiaqi01 提交于 1月 14, 2020
```
* add collective communication library in fleet to replace mpi
* test=develop
```
  e3a457d3

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功