提交 · c7248cda7136337b83ad03658edabda6ba8f6fc3 · 机器未来 / Paddle

10 1月, 2020 3 次提交

L

fix xception precision problem (#22188) · c7248cda
由 liu zhengxi 提交于 1月 10, 2020

c7248cda

[cherry-pick] Add FC padding, ernie test unit and layernorm parallel (#22198) · 3df38f5c

由 GaoWei8 提交于 1月 10, 2020

* Optimize the kernel implementation of layernorm with openmp (#20895)

* Add ernie c++ inference test (#21015)

* Add ernie unit test
test=develop

* Add ernie unit test
test=develop

* Add ernie unit test
test=develop

* remove ngraph

* optimize gpu test
test=develop

* optimize codes
test=develop

* fix cmake fails on inference_download_and_uncompress (#21185)

* solve cmake fails on inference_download_and_uncompress
test=develop

* solve cmake fails on inference_download_and_uncompress
test=develop

* Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)

* Add fc padding to solve mkl performance
test=develop

* fix gpu pass and error information
test=develop

* fix fc_fuse_pass_test
test=develop

* fix error information
test=develop

* fix error information
test=develop

* fix name and add fc op padding test
test=develop

* fix attributes
test=develop

* optimize fc padding
test=develop

* fix test
test=develop

* Polish the codes of fc when needs padding (#21378)

test=develop

* Add ernie large c++ inference test (#21365)

* add ernie-large test
test=develop

* add ernie large c++ inference test
test=develop

* Modify padding strategy: remove weight copy in fc padding (#21650)

test=develop

* optimize fc jit (#21878)

test=develop
Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>

3df38f5c

石
fix multi-thread error of fc_gru_fuse_pass.cc, test=develop (#21841) (#22185) · e8e12499
由石晓伟提交于 1月 10, 2020
```
* fix multi-thread error of fc_gru_fuse_pass.cc, test=develop

* export FLAGS and GLOG symbols, test=develop
```
e8e12499

09 1月, 2020 3 次提交
- Z
  [cherry-pick] Fix windows build no kernel issue, test=develop (#22105) (#22184) · 91706d3b
  由 zhaoyuchen2018 提交于 1月 09, 2020
```
windows conv_fusion failed as no kernel， explicit declare lambda
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  91706d3b
- C
  
  fix softmax_with_cross_entropy_fix bug, test=develop (#21810) (#22183) · bc385a29
  由 Chen Weihang 提交于 1月 09, 2020
  
  bc385a29
- W
  [Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce &... · 515b206d
  由 WangXi 提交于 1月 09, 2020
```
[Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce & sync_batch_norm hang in fleet (#22157)
```
  515b206d
08 1月, 2020 2 次提交

Z
[cherry-pick] Fix softmax cuda bug (#21720) (#22160) · b9a1d954
由 zhaoyuchen2018 提交于 1月 08, 2020
```
* Fix softmax cuda bug

* Refine multihead log and softmax logic

* Align block to 32
```
b9a1d954

Fix multi-threads memory out of bounds error for passes (#21920) (#22132) · 835201bf

由 liu zhengxi 提交于 1月 08, 2020

* fix seqconv_eltadd_relu pass during multi-threads predictor, test=develop

* fix attention_lstm_fuse_pass during multi-threads inference, test=develop

* fix embedding_fc_lstm_fuse_pass during multi-threads inference, test=develop

* fix fc_lstm_fuse_pass during multi-threads inference, test=develop

* fix seq_concat_fc_fuse_pass during multi-threads inference, test=develop

835201bf

07 1月, 2020 4 次提交

P

fix trt calib not working bug, test=develop (#21934) (#22110) · 5a611afd
由 Pei Yang 提交于 1月 07, 2020

5a611afd

Fix optimizer op infershape failed in dygraph multi-cards mode (#21374) (#22112) · 34ef38c8

由 Chen Weihang 提交于 1月 07, 2020

* add param & grad shape check for sgd op

* add _reshape_inplece interface for dygraph parallel

* refine unittest based paddle/models scripts, test=develop

* add unittest for parallel grad fuse, test=develop

34ef38c8

【cherry-pick】fix decay param and overflow in match_matrix (#22107) · eb6d3396

由 Aurelius84 提交于 1月 07, 2020

* fix decay param in DecayAdagrad test=develop (#22026)

* fix integer overflow in match_matrix (#22036)

* fix integer overflow in match_matrix test=develop

* fix integer overflow in match_matrix test=develop

* fix typo test=develop

eb6d3396

Y
Fix the global_step & continuous applying error in EMA (#22090) (#22130) · 9b64d636
由 Yibing Liu 提交于 1月 07, 2020
```
* Fix the global_step & continuous applying error in EMA

* Fix for step 0 & add unit test

test=release/1.6
```
9b64d636

16 12月, 2019 1 次提交
- 石
  
  fix analysis_predictor when func is called multiple times, test=release/1.6 (#21663) · 70c073a0
  由石晓伟提交于 12月 16, 2019
  
  70c073a0
09 12月, 2019 2 次提交
- X
  fix logger problem (#21342) (#21635) · 2de10293
  由 xiegegege 提交于 12月 09, 2019
```
* fix logger problem
test=develop

* refine logger
test=develop
```
  2de10293
- Z
  Revert "CHERRY_PICK: TRT int8: refine trt int8 for dynamic range set (#21112) (#21449)" (#21619) · f7c629d9
  由 Zhaolong Xing 提交于 12月 09, 2019
```
This reverts commit 0473cdb8.
```
  f7c629d9
08 12月, 2019 1 次提交
- Z
  CHERRY_PICK: Fix the bug for inference when using auto grwoth allocator (#21623) · d0943dbe
  由 Zhaolong Xing 提交于 12月 08, 2019
```
test=release/1.6
```
  d0943dbe
06 12月, 2019 4 次提交

B

cherry-pick MKL-DNN NHWC FWD support fix (#21593) · 1f598dfa
由 bingyanghuang 提交于 12月 06, 2019

1f598dfa
A

cherry-pick pyramid_hash op test=develop (#20779)(#18525) (#21562) · f83254d6
由 Aurelius84 提交于 12月 06, 2019

f83254d6
石

fix ZeroCopyTensor::mutable_data(), test=release/1.6 (#21581) · e228e707
由石晓伟提交于 12月 06, 2019

e228e707

CHERRY_PICK: Better TensorRT support (#20858) (#21578) · 0a4002f5

由 Zhaolong Xing 提交于 12月 06, 2019

* Fix TensorRT detection bug

1. Add new search path for TensorRT at tensorrt.cmake
2. Add better debug message
3. Fix the bug of detection of TensorRT version

In NVIDIA official docker image, TensorRT headers are located at
`/usr/include/x86_64-linux-gnu` and TensorRT libraries are located
at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will
fail to detect TensorRT.

There is no debug/warning message to tell developer that TensorRT
is failed to be detected.

In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is
defined at `NvInferVersion.h` instead of `NvInfer.h`, so add
compatibility fix.

* Fix TensorRT variables in CMake

1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}`
2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}`

Manually type path may locate incorrect path of TensorRT. Use the
paths detected by system instead.

* Fix TensorRT library path

1. Add new variable - `${TENSORRT_LIBRARY_DIR}`
2. Fix TensorRT library path

inference_lib.cmake and setup.py.in need the path of TensorRT library
instead of the file of TensorRT library, so add new variable to fix it.

* Add more general search rule for TensoRT

Let system detect architecture instead of manually assign it, so
replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`.

* Add more general search rule for TensorRT

Remove duplicate search rules for TensorRT libraries. Use
`${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so

test=release/1.6

0a4002f5

05 12月, 2019 5 次提交
- P
  
  cherry-pick fix muting glog warning message, test=release/1.6 (#21576) · a6433f8b
  由 Pei Yang 提交于 12月 05, 2019
  
  a6433f8b
- L
  
  construct a DistributedStrategy instance if the passed one is None (#21545) (#21567) · 0dfb5c94
  由 lilong12 提交于 12月 05, 2019
  
  0dfb5c94
- Z
  
  fix xbyak control by -DWITH_XBYAK,test=develop (#21560) · 2afe928a
  由 zhouwei25 提交于 12月 05, 2019
  
  2afe928a
- L
  [Cherry-pick] fix the computation for dx (grad for x) for prelu operation. (#20949) (#21514) · 40549473
  由 lilong12 提交于 12月 05, 2019
```
* fix the computation for dx (grad for x) for prelu operation. (#20949)

* set the default value of alpha for prelu to 0.25, test=develop

* add the call to __syncthreads(), test=develop

* fix the implementation of cpu prelu, test=develop

* repair the implementation of element mode prelu, test=develop

* modify test_prelu_op.py, test=develop
```
  40549473
- Z
  
  let WHTI_XBYAK can be adjusted by -D when cmake,test=develop (#21538) · e3dd13b1
  由 zhouwei25 提交于 12月 05, 2019
  
  e3dd13b1
04 12月, 2019 6 次提交

P
make config option DisableGlogInfo() able to mute all inference logs (#21544) · 857cd9f8
由 Pei Yang 提交于 12月 04, 2019
```
make config option DisableGlogInfo() able to mute all inference logs
```
857cd9f8

Refactor fetch handler (#21264) (#21537) · 87a8caa8

由 tangwei12 提交于 12月 04, 2019

* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.

For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.

87a8caa8

Z
[cherry-pick] NV JETSON support and auto_growth strategy for inference. (#21500) · 20a09375
由 Zhaolong Xing 提交于 12月 04, 2019
```
* ADD NV JETSON SUPPORT
test=release/1.6

* CHERRY_PICK: specify the auto growth allocator for inference.
test=release/1.6
```
20a09375
W

Fix dgc clip & rampup step, test=release/1.6 (#21519) · 3f1169fe
由 WangXi 提交于 12月 04, 2019

3f1169fe
B

[cherry pick] Conv2d and Conv2d transpose MKL-DNN NHWC support (#21525) · 0e63746b
由 bingyanghuang 提交于 12月 04, 2019

0e63746b

Pick disable reshape inplace in dygraph (#21486) · 32a0eb50

由 hong 提交于 12月 04, 2019

* disable reshape inplace in dygraph model; test=develop (#21157)

* fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)

32a0eb50

03 12月, 2019 9 次提交
- L
  set dim[0] to -1 if dim[0] < 0 during compiling for c_allgather op (#21402) (#21512) · df2b4002
  由 lilong12 提交于 12月 03, 2019
```
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
```
  df2b4002
- L
  Fix transpose conv (#21406), test=release/1.6 (#21510) · 1fbc45b7
  由 Lv Mengsi 提交于 12月 03, 2019
```
* fix transpose conv,test=develop

* fix comments
test=develop
```
  1fbc45b7
- Z
  [cherry-pick] Improve argsort performance. (#21267) (#21442) · 66c18f4a
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Improve argsort performance.

- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s

- Add fp16 support

* Refine error message
* Refine code
* Add descending sort

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  66c18f4a
- K
  [cherry-pick] add Adam beta1/beta2 support Variable (#21433) · 735a2db0
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  735a2db0
- Z
  [cherry-pick] Add Asypadding for conv fusion. (#21041) (#21439) · 2660107c
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Add Asypadding for conv fusion.

test=develop

reference: pr/20042

* Fix eigen build link error

* Change back file mode

* Use math function & add more checks.
```
  2660107c
- L
  add the framework support for distfc (#21197) (#21463) · e06f4439
  由 lilong12 提交于 12月 03, 2019
```
* add the framework support for distfc and ut, test=develop
* fix the implementation of shard_index_op, test=develop
```
  e06f4439
- K
  [cherry-pick] add bn momentum variable (#21435) · 9c63b7c1
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* batch_norm momentum support variable. test=develop
```
  9c63b7c1
- 石
  
  revert ProgOptimUnsupported check, test=release/1.6 (#21475) · 5c7c6b1e
  由石晓伟提交于 12月 03, 2019
  
  5c7c6b1e
- P
  
  show shape diff in wrong trt input shape errmsg, test=develop (#21451) (#21470) · badaaee6
  由 Pei Yang 提交于 12月 03, 2019
  
  badaaee6

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致