提交 · 0a4002f5dcd901cf799952153fa7c0eb418a3a56 · 机器未来 / Paddle

06 12月, 2019 1 次提交

CHERRY_PICK: Better TensorRT support (#20858) (#21578) · 0a4002f5

由 Zhaolong Xing 提交于 12月 06, 2019

* Fix TensorRT detection bug

1. Add new search path for TensorRT at tensorrt.cmake
2. Add better debug message
3. Fix the bug of detection of TensorRT version

In NVIDIA official docker image, TensorRT headers are located at
`/usr/include/x86_64-linux-gnu` and TensorRT libraries are located
at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will
fail to detect TensorRT.

There is no debug/warning message to tell developer that TensorRT
is failed to be detected.

In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is
defined at `NvInferVersion.h` instead of `NvInfer.h`, so add
compatibility fix.

* Fix TensorRT variables in CMake

1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}`
2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}`

Manually type path may locate incorrect path of TensorRT. Use the
paths detected by system instead.

* Fix TensorRT library path

1. Add new variable - `${TENSORRT_LIBRARY_DIR}`
2. Fix TensorRT library path

inference_lib.cmake and setup.py.in need the path of TensorRT library
instead of the file of TensorRT library, so add new variable to fix it.

* Add more general search rule for TensoRT

Let system detect architecture instead of manually assign it, so
replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`.

* Add more general search rule for TensorRT

Remove duplicate search rules for TensorRT libraries. Use
`${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so

test=release/1.6

0a4002f5

05 12月, 2019 5 次提交
- P
  
  cherry-pick fix muting glog warning message, test=release/1.6 (#21576) · a6433f8b
  由 Pei Yang 提交于 12月 05, 2019
  
  a6433f8b
- L
  
  construct a DistributedStrategy instance if the passed one is None (#21545) (#21567) · 0dfb5c94
  由 lilong12 提交于 12月 05, 2019
  
  0dfb5c94
- Z
  
  fix xbyak control by -DWITH_XBYAK,test=develop (#21560) · 2afe928a
  由 zhouwei25 提交于 12月 05, 2019
  
  2afe928a
- L
  [Cherry-pick] fix the computation for dx (grad for x) for prelu operation. (#20949) (#21514) · 40549473
  由 lilong12 提交于 12月 05, 2019
```
* fix the computation for dx (grad for x) for prelu operation. (#20949)

* set the default value of alpha for prelu to 0.25, test=develop

* add the call to __syncthreads(), test=develop

* fix the implementation of cpu prelu, test=develop

* repair the implementation of element mode prelu, test=develop

* modify test_prelu_op.py, test=develop
```
  40549473
- Z
  
  let WHTI_XBYAK can be adjusted by -D when cmake,test=develop (#21538) · e3dd13b1
  由 zhouwei25 提交于 12月 05, 2019
  
  e3dd13b1
04 12月, 2019 6 次提交

P
make config option DisableGlogInfo() able to mute all inference logs (#21544) · 857cd9f8
由 Pei Yang 提交于 12月 04, 2019
```
make config option DisableGlogInfo() able to mute all inference logs
```
857cd9f8

Refactor fetch handler (#21264) (#21537) · 87a8caa8

由 tangwei12 提交于 12月 04, 2019

* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.

For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.

87a8caa8

Z
[cherry-pick] NV JETSON support and auto_growth strategy for inference. (#21500) · 20a09375
由 Zhaolong Xing 提交于 12月 04, 2019
```
* ADD NV JETSON SUPPORT
test=release/1.6

* CHERRY_PICK: specify the auto growth allocator for inference.
test=release/1.6
```
20a09375
W

Fix dgc clip & rampup step, test=release/1.6 (#21519) · 3f1169fe
由 WangXi 提交于 12月 04, 2019

3f1169fe
B

[cherry pick] Conv2d and Conv2d transpose MKL-DNN NHWC support (#21525) · 0e63746b
由 bingyanghuang 提交于 12月 04, 2019

0e63746b

Pick disable reshape inplace in dygraph (#21486) · 32a0eb50

由 hong 提交于 12月 04, 2019

* disable reshape inplace in dygraph model; test=develop (#21157)

* fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)

32a0eb50

03 12月, 2019 11 次提交
- L
  set dim[0] to -1 if dim[0] < 0 during compiling for c_allgather op (#21402) (#21512) · df2b4002
  由 lilong12 提交于 12月 03, 2019
```
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
```
  df2b4002
- L
  Fix transpose conv (#21406), test=release/1.6 (#21510) · 1fbc45b7
  由 Lv Mengsi 提交于 12月 03, 2019
```
* fix transpose conv,test=develop

* fix comments
test=develop
```
  1fbc45b7
- Z
  [cherry-pick] Improve argsort performance. (#21267) (#21442) · 66c18f4a
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Improve argsort performance.

- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s

- Add fp16 support

* Refine error message
* Refine code
* Add descending sort

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  66c18f4a
- K
  [cherry-pick] add Adam beta1/beta2 support Variable (#21433) · 735a2db0
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  735a2db0
- Z
  [cherry-pick] Add Asypadding for conv fusion. (#21041) (#21439) · 2660107c
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Add Asypadding for conv fusion.

test=develop

reference: pr/20042

* Fix eigen build link error

* Change back file mode

* Use math function & add more checks.
```
  2660107c
- L
  add the framework support for distfc (#21197) (#21463) · e06f4439
  由 lilong12 提交于 12月 03, 2019
```
* add the framework support for distfc and ut, test=develop
* fix the implementation of shard_index_op, test=develop
```
  e06f4439
- K
  [cherry-pick] add bn momentum variable (#21435) · 9c63b7c1
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* batch_norm momentum support variable. test=develop
```
  9c63b7c1
- 石
  
  revert ProgOptimUnsupported check, test=release/1.6 (#21475) · 5c7c6b1e
  由石晓伟提交于 12月 03, 2019
  
  5c7c6b1e
- P
  
  show shape diff in wrong trt input shape errmsg, test=develop (#21451) (#21470) · badaaee6
  由 Pei Yang 提交于 12月 03, 2019
  
  badaaee6
- B
  
  cherry-pick LRN and Pool2d (FWD) NHWC support (#21476) · ccb508dc
  由 bingyanghuang 提交于 12月 03, 2019
  
  ccb508dc
- W
  
  cherry-pick fix shape check in density_prior_box, test=release/1.6 (#21474) · 9ab738aa
  由 wangguanzhong 提交于 12月 03, 2019
  
  9ab738aa
02 12月, 2019 5 次提交

[cherry-pick] find lookup table in order & support dump param (#21347) · 893ea7e0

由 Thunderbrook 提交于 12月 02, 2019

* support dump param of model into afs (#20302)

* support dump param to afs
test=develop

* code style
test=develop

* code style
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* find lookup table in order (#20932)

test=develop

* cherry-pick
test=develop

* solve pslib core in stop worker
test=develop

* print table stat info for pslib
test=develop

893ea7e0

[cherry-pick] Improve topk performance. (#21087) (#21441) · 5dbe9e59

由 zhaoyuchen2018 提交于 12月 02, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

5dbe9e59

[cherry-pick] Fix multihead op bug. (#20783) (#21438) · 2f0f10b3

由 zhaoyuchen2018 提交于 12月 02, 2019

The op should handle k=1024
Fix seq_len < warpsize error.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

2f0f10b3

Z
[cherry-pick] Fix gru as small frame_size has error. (#20922) (#21440) · 873b32de
由 zhaoyuchen2018 提交于 12月 02, 2019
```
seems shuffle_sync cannot handle small size

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
873b32de
Z

CHERRY_PICK: TRT int8: refine trt int8 for dynamic range set (#21112) (#21449) · 0473cdb8
由 Zhaolong Xing 提交于 12月 02, 2019

0473cdb8

30 11月, 2019 1 次提交
- Y
  Fix the crash issue when scale or bias was null-pointer. (#21284) (#21444) · 408e638c
  由 Yihua Xu 提交于 11月 30, 2019
```
* Fix the crash issue when scale or bias was null-pointer.

* Add the error message for passing CI.

test=release/1.6
```
  408e638c
29 11月, 2019 3 次提交
- P
  fix trt weight bug (#21231) (#21443) · 77268831
  由 Pei Yang 提交于 11月 29, 2019
```
added splitter "__" between weight name and suffix number to avoid conflicts.
```
  77268831
- W
  
  Fix dgc accuracy by mv regularization to local, test=release/1.6 (#21390) · 6ce49eea
  由 WangXi 提交于 11月 29, 2019
  
  6ce49eea
- W
  
  Fp32 vs int8 qat C++ performance (#21244) (#21432) · 06545fcf
  由 Wojciech Uss 提交于 11月 29, 2019
  
  06545fcf
28 11月, 2019 1 次提交

cherry-pick1.6 fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21339) · 072eb5b6

由 xujiaqi01 提交于 11月 28, 2019

* fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052)

* fix cache table bug
* add save_paddle_inference_model
* fix hdfs util bug
* test=develop

* fix several sparse table issuses (#20686)

* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.* 
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop

* add copy table (#21086)

* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars

* fix fs_client_param bug (#21212)

* fix fs_client_param bug， user can set this config through fleet_desc_file or fleet config
* test=develop

* fix fleet util bug (#21254)

* fix fleet util bug in save paddle inference model
* test=develop

072eb5b6

26 11月, 2019 4 次提交
- L
  [Cherry pick] instance_norm, gradients and batch_norm (#21301) · 97bbab47
  由 Lv Mengsi 提交于 11月 26, 2019
```
* Fix gradients (#20857)

* fix_gradients

* fix_gradients, test=develop

* fix instance norm (#21042)

* fix instance norm

* update unitest,test=develop

* fix_bn

* revert unittest,test=develop
```
  97bbab47
- B
  
  [cherry-pick] Refactor mkldnn eletwise_mul and error message for NHWC in mkldnn (#21361) · 03dda317
  由 bingyanghuang 提交于 11月 26, 2019
  
  03dda317
- W
  
  [Cherry-pick 1.6] Fix dgc buffer illegal & reuse velocity & fix fuse (#21281) · 93c7f058
  由 WangXi 提交于 11月 26, 2019
  
  93c7f058
- W
  
  Fix INF bug of softmax_cross_entropy_op, test=release/1.6 (#21283) · 3423f0b6
  由 WangXi 提交于 11月 26, 2019
  
  3423f0b6
25 11月, 2019 3 次提交

cherry-pick error info check of Print_op for release1.6 (#21349) · 9a98d11e

由 lijianshe02 提交于 11月 25, 2019

* add input type and input data type check for Print_op test=develop (#21250)

* add input type and input data type check for Print_op test=develop

* cherry-pick error info check of Print_op for release1.6 test=develop

* cherry-pick error info check of Print_op for release1.6 test=develop

9a98d11e

Fix the CAPI ZeroCopy shape error and reuse the code to get output (#21240) (#21345) · c75b162a

由 liu zhengxi 提交于 11月 25, 2019

* fix the CAPI ZeroCopy shape error and reconstruct the output obtain

* use an anonymous namespace to cover the functor

* fix unit tests because of the output of typeid(T).name() is different from linux and windows, test=develop

c75b162a

fix bug of issue (#21331) · da9752fe

由 Yi Liu 提交于 11月 25, 2019

* fix bug of issue #21259 (#21287)
pass the argument `allow_out_of_range` of one_hot op to c++ back end.

da9752fe

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致