提交 · 0a4002f5dcd901cf799952153fa7c0eb418a3a56 · 机器未来 / Paddle

06 12月, 2019 1 次提交

CHERRY_PICK: Better TensorRT support (#20858) (#21578) · 0a4002f5

由 Zhaolong Xing 提交于 12月 06, 2019

* Fix TensorRT detection bug

1. Add new search path for TensorRT at tensorrt.cmake
2. Add better debug message
3. Fix the bug of detection of TensorRT version

In NVIDIA official docker image, TensorRT headers are located at
`/usr/include/x86_64-linux-gnu` and TensorRT libraries are located
at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will
fail to detect TensorRT.

There is no debug/warning message to tell developer that TensorRT
is failed to be detected.

In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is
defined at `NvInferVersion.h` instead of `NvInfer.h`, so add
compatibility fix.

* Fix TensorRT variables in CMake

1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}`
2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}`

Manually type path may locate incorrect path of TensorRT. Use the
paths detected by system instead.

* Fix TensorRT library path

1. Add new variable - `${TENSORRT_LIBRARY_DIR}`
2. Fix TensorRT library path

inference_lib.cmake and setup.py.in need the path of TensorRT library
instead of the file of TensorRT library, so add new variable to fix it.

* Add more general search rule for TensoRT

Let system detect architecture instead of manually assign it, so
replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`.

* Add more general search rule for TensorRT

Remove duplicate search rules for TensorRT libraries. Use
`${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so

test=release/1.6

0a4002f5

05 12月, 2019 2 次提交

L

construct a DistributedStrategy instance if the passed one is None (#21545) (#21567) · 0dfb5c94
由 lilong12 提交于 12月 05, 2019

0dfb5c94

[Cherry-pick] fix the computation for dx (grad for x) for prelu operation. (#20949) (#21514) · 40549473

由 lilong12 提交于 12月 05, 2019

* fix the computation for dx (grad for x) for prelu operation. (#20949)

* set the default value of alpha for prelu to 0.25, test=develop

* add the call to __syncthreads(), test=develop

* fix the implementation of cpu prelu, test=develop

* repair the implementation of element mode prelu, test=develop

* modify test_prelu_op.py, test=develop

40549473

04 12月, 2019 4 次提交

Refactor fetch handler (#21264) (#21537) · 87a8caa8

由 tangwei12 提交于 12月 04, 2019

* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.

For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.

87a8caa8

W

Fix dgc clip & rampup step, test=release/1.6 (#21519) · 3f1169fe
由 WangXi 提交于 12月 04, 2019

3f1169fe
B

[cherry pick] Conv2d and Conv2d transpose MKL-DNN NHWC support (#21525) · 0e63746b
由 bingyanghuang 提交于 12月 04, 2019

0e63746b

Pick disable reshape inplace in dygraph (#21486) · 32a0eb50

由 hong 提交于 12月 04, 2019

* disable reshape inplace in dygraph model; test=develop (#21157)

* fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)

32a0eb50

03 12月, 2019 7 次提交
- L
  set dim[0] to -1 if dim[0] < 0 during compiling for c_allgather op (#21402) (#21512) · df2b4002
  由 lilong12 提交于 12月 03, 2019
```
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
```
  df2b4002
- Z
  [cherry-pick] Improve argsort performance. (#21267) (#21442) · 66c18f4a
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Improve argsort performance.

- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s

- Add fp16 support

* Refine error message
* Refine code
* Add descending sort

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  66c18f4a
- K
  [cherry-pick] add Adam beta1/beta2 support Variable (#21433) · 735a2db0
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  735a2db0
- Z
  [cherry-pick] Add Asypadding for conv fusion. (#21041) (#21439) · 2660107c
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Add Asypadding for conv fusion.

test=develop

reference: pr/20042

* Fix eigen build link error

* Change back file mode

* Use math function & add more checks.
```
  2660107c
- L
  add the framework support for distfc (#21197) (#21463) · e06f4439
  由 lilong12 提交于 12月 03, 2019
```
* add the framework support for distfc and ut, test=develop
* fix the implementation of shard_index_op, test=develop
```
  e06f4439
- K
  [cherry-pick] add bn momentum variable (#21435) · 9c63b7c1
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* batch_norm momentum support variable. test=develop
```
  9c63b7c1
- B
  
  cherry-pick LRN and Pool2d (FWD) NHWC support (#21476) · ccb508dc
  由 bingyanghuang 提交于 12月 03, 2019
  
  ccb508dc
02 12月, 2019 1 次提交

[cherry-pick] find lookup table in order & support dump param (#21347) · 893ea7e0

由 Thunderbrook 提交于 12月 02, 2019

* support dump param of model into afs (#20302)

* support dump param to afs
test=develop

* code style
test=develop

* code style
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* find lookup table in order (#20932)

test=develop

* cherry-pick
test=develop

* solve pslib core in stop worker
test=develop

* print table stat info for pslib
test=develop

893ea7e0

29 11月, 2019 2 次提交
- W
  
  Fix dgc accuracy by mv regularization to local, test=release/1.6 (#21390) · 6ce49eea
  由 WangXi 提交于 11月 29, 2019
  
  6ce49eea
- W
  
  Fp32 vs int8 qat C++ performance (#21244) (#21432) · 06545fcf
  由 Wojciech Uss 提交于 11月 29, 2019
  
  06545fcf
28 11月, 2019 1 次提交

cherry-pick1.6 fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21339) · 072eb5b6

由 xujiaqi01 提交于 11月 28, 2019

* fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052)

* fix cache table bug
* add save_paddle_inference_model
* fix hdfs util bug
* test=develop

* fix several sparse table issuses (#20686)

* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.* 
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop

* add copy table (#21086)

* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars

* fix fs_client_param bug (#21212)

* fix fs_client_param bug， user can set this config through fleet_desc_file or fleet config
* test=develop

* fix fleet util bug (#21254)

* fix fleet util bug in save paddle inference model
* test=develop

072eb5b6

26 11月, 2019 4 次提交
- L
  [Cherry pick] instance_norm, gradients and batch_norm (#21301) · 97bbab47
  由 Lv Mengsi 提交于 11月 26, 2019
```
* Fix gradients (#20857)

* fix_gradients

* fix_gradients, test=develop

* fix instance norm (#21042)

* fix instance norm

* update unitest,test=develop

* fix_bn

* revert unittest,test=develop
```
  97bbab47
- B
  
  [cherry-pick] Refactor mkldnn eletwise_mul and error message for NHWC in mkldnn (#21361) · 03dda317
  由 bingyanghuang 提交于 11月 26, 2019
  
  03dda317
- W
  
  [Cherry-pick 1.6] Fix dgc buffer illegal & reuse velocity & fix fuse (#21281) · 93c7f058
  由 WangXi 提交于 11月 26, 2019
  
  93c7f058
- W
  
  Fix INF bug of softmax_cross_entropy_op, test=release/1.6 (#21283) · 3423f0b6
  由 WangXi 提交于 11月 26, 2019
  
  3423f0b6
25 11月, 2019 3 次提交

cherry-pick error info check of Print_op for release1.6 (#21349) · 9a98d11e

由 lijianshe02 提交于 11月 25, 2019

* add input type and input data type check for Print_op test=develop (#21250)

* add input type and input data type check for Print_op test=develop

* cherry-pick error info check of Print_op for release1.6 test=develop

* cherry-pick error info check of Print_op for release1.6 test=develop

9a98d11e

fix bug of issue #21259 (#21331) · da9752fe

由 Yi Liu 提交于 11月 25, 2019

* fix bug of issue #21259 (#21287)
pass the argument `allow_out_of_range` of one_hot op to c++ back end.

da9752fe

[cherry-pick] fix crop_tensor, maxout and lrn (#21302) · 3848f720

由 Zhang Ting 提交于 11月 25, 2019

* [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756)

* All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview

* fix the bug that attr(offsets) should be initialized, test=develop

* [cherry-pick] maxout supports channel_last input (#20846)

* maxout support channel_last input, test=develop

* modified details of Input(X) and Attr(groups, axis) in doc, test=develop

* [cherry-pick] lrn supports channel_last input, test=develop (#20954)

3848f720

23 11月, 2019 2 次提交
- K
  
  add mkldnn include. test=develop (#21314) · f9cbe3bd
  由 Kaipeng Deng 提交于 11月 23, 2019
  
  f9cbe3bd
- K
  [cherry-pick] fix elementwise mod (#21315) · 5e35e5ea
  由 Kaipeng Deng 提交于 11月 23, 2019
```
* fix elementwise_mod FP kernel. test=develop

* fix unittest. test=develop
```
  5e35e5ea
21 11月, 2019 1 次提交

[cherry-pick]fix bug in pool/conv/conv_transpose: UpdatePaddingAndDilation,... · 7ab85396

由 liym27 提交于 11月 21, 2019

[cherry-pick]fix bug in pool/conv/conv_transpose: UpdatePaddingAndDilation, _get_padding_with_SAME and conv2dtranspose_forward_naive. (#20997) (#21225)

* fix bug in pool/conv/conv_transpose:
    1. It should be stride[i] not stride[0] in UpdatePaddingAndDilation;
    2. fix bug of func  _get_padding_with_SAME in test_conv/conv_transpose_op.py;
    3. fix bug of the computation process in function conv2dtranspose_forward_naive.
    test=release/1.6

7ab85396

14 11月, 2019 1 次提交
- T
  fix error message in expand API, and fix two error unit-tests (#21180) · cdb81264
  由 Tao Luo 提交于 11月 14, 2019
```
test=release/1.6
```
  cdb81264
11 11月, 2019 1 次提交
- H
  Disable cudnn_conv in Parallel Executor unit tests. (#21083) · e7d5e0ea
  由 Huihuang Zheng 提交于 11月 11, 2019
```
TODO: fix cudnn_conv and re-enable it

test=develop
test=release/1.6
```
  e7d5e0ea
07 11月, 2019 2 次提交

[cherry-pick] Add support for asymetric padding in MKLDNN pool, conv and conv_transpose (#21072) · e8890031

由 Adam 提交于 11月 07, 2019

* Add asymetric padding support for mkldnn pooling
test=develop

* Add asymetric padding support for mkldnn conv
test=develop

* Add asymetric padding support for mkldnn conv_transpose
test=develop

e8890031

H
fix uniform random (#21009) (#21057) · e112ea2b
由 hong 提交于 11月 07, 2019
```
* fix uniform random; test=develop

* add uniform random test; test=develop
```
e112ea2b

06 11月, 2019 1 次提交
- B
  
  [Cherry-pick] 21028: Remove fuse_with_relu argument from batch_norm constructor (#21049) · 4c56586a
  由 bingyanghuang 提交于 11月 06, 2019
  
  4c56586a
01 11月, 2019 7 次提交

Cherry pick bug fix for Ops: reshape,concat, split and squeeze (#20929) · 33d7aae1

由 liym27 提交于 11月 01, 2019

* [cherry-pick]fix bug in reshape: (#20781)

consider the situation that shape of input can contain more than one -1.

* [cherry-pick]support Tensor for split and concat, support -1 in num_or_sections, add check num_or_sections (#20780)

* improve split and concat op:
1. support Tensor for argument 'dim' in split op.
2. support Tensor for argument 'axis' in concat op.
* redefine function GetDataFromTensor and set unknown output shape to - 1.
* add check: Attr(sections) match Input(X).
* support Tensor for attr(sections) and attr(sections) can contain -1.
* modify error message and fix bug for concat and call Resize only when necessary.
test=release/1.6

* [cherry-pick]improve unsqueeze op to support int, Tensor for argument axes (#20824)

* improve unsqueeze op to support int, Tensor and Tensor list for argument axes.
* call Resize only when necessary. test=release/1.6

* [cherry-pick]Compatible int32 and int64 for attr in concat/split/unsqueeze. test=release/1.6 (#20912)

33d7aae1

W

[Cherry-pick 1.6] Print the rank of trainer & remove nccl sync in launch.py (#20937) · de130e95
由 WangXi 提交于 11月 01, 2019

de130e95

cherry-pick1.6 simplify master+patch，remove ins when size != merge_size or has... · 3db61dc0

由 xujiaqi01 提交于 11月 01, 2019

cherry-pick1.6 simplify master+patch，remove ins when size != merge_size or has conflict slot  (#20941)

* simplify master+patch，remove ins when size != merge_size or has conflict slot
* test=develop

3db61dc0

X
add check nan / inf in downpour worker (#20694) (#20925) · 5c3656bb
由 xujiaqi01 提交于 11月 01, 2019
```
* add check nan / inf in downpour worker during training
* test=develop
```
5c3656bb
1
Optimize decay (#20816) (#20952) · 781d2844
由 123malin 提交于 11月 01, 2019
```
* update pserver decay blocks

* update distributed notify handler
```
781d2844
L
[cherry-pick] keep the size of symmetric padding is 2 for 2d and 3 for 3d.... · 55c2329a
由 liym27 提交于 11月 01, 2019
```
[cherry-pick] keep the size of symmetric padding is 2 for 2d and 3 for 3d. test=release/1.6 (#20903) (#20939)
```
55c2329a
C
[Cherry-pick]Cherry pick paddle cloud role maker (#20947) · 0b429a22
由 Chengmo 提交于 11月 01, 2019
```
* Fix Paddle Cloud role maker (#20860)
```
0b429a22

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致