提交 · fa7ace7cf2859f927c26f1970bbc2f5551532df1 · 机器未来 / Paddle

10 1月, 2020 4 次提交

Cherry pick from #21862 (#22194) · fa7ace7c

由 Guo Sheng 提交于 1月 10, 2020

* Fix default label dim of label_smooth_op. test=develop (#21862)

* Fix unit tests of label_smooth_op's data size.

fa7ace7c

L

fix xception precision problem (#22188) · c7248cda
由 liu zhengxi 提交于 1月 10, 2020

c7248cda

[cherry-pick] Add FC padding, ernie test unit and layernorm parallel (#22198) · 3df38f5c

由 GaoWei8 提交于 1月 10, 2020

* Optimize the kernel implementation of layernorm with openmp (#20895)

* Add ernie c++ inference test (#21015)

* Add ernie unit test
test=develop

* Add ernie unit test
test=develop

* Add ernie unit test
test=develop

* remove ngraph

* optimize gpu test
test=develop

* optimize codes
test=develop

* fix cmake fails on inference_download_and_uncompress (#21185)

* solve cmake fails on inference_download_and_uncompress
test=develop

* solve cmake fails on inference_download_and_uncompress
test=develop

* Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)

* Add fc padding to solve mkl performance
test=develop

* fix gpu pass and error information
test=develop

* fix fc_fuse_pass_test
test=develop

* fix error information
test=develop

* fix error information
test=develop

* fix name and add fc op padding test
test=develop

* fix attributes
test=develop

* optimize fc padding
test=develop

* fix test
test=develop

* Polish the codes of fc when needs padding (#21378)

test=develop

* Add ernie large c++ inference test (#21365)

* add ernie-large test
test=develop

* add ernie large c++ inference test
test=develop

* Modify padding strategy: remove weight copy in fc padding (#21650)

test=develop

* optimize fc jit (#21878)

test=develop
Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>

3df38f5c

石
fix multi-thread error of fc_gru_fuse_pass.cc, test=develop (#21841) (#22185) · e8e12499
由石晓伟提交于 1月 10, 2020
```
* fix multi-thread error of fc_gru_fuse_pass.cc, test=develop

* export FLAGS and GLOG symbols, test=develop
```
e8e12499

09 1月, 2020 3 次提交
- Z
  [cherry-pick] Fix windows build no kernel issue, test=develop (#22105) (#22184) · 91706d3b
  由 zhaoyuchen2018 提交于 1月 09, 2020
```
windows conv_fusion failed as no kernel， explicit declare lambda
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  91706d3b
- C
  
  fix softmax_with_cross_entropy_fix bug, test=develop (#21810) (#22183) · bc385a29
  由 Chen Weihang 提交于 1月 09, 2020
  
  bc385a29
- W
  [Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce &... · 515b206d
  由 WangXi 提交于 1月 09, 2020
```
[Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce & sync_batch_norm hang in fleet (#22157)
```
  515b206d
08 1月, 2020 2 次提交

Z
[cherry-pick] Fix softmax cuda bug (#21720) (#22160) · b9a1d954
由 zhaoyuchen2018 提交于 1月 08, 2020
```
* Fix softmax cuda bug

* Refine multihead log and softmax logic

* Align block to 32
```
b9a1d954

Fix multi-threads memory out of bounds error for passes (#21920) (#22132) · 835201bf

由 liu zhengxi 提交于 1月 08, 2020

* fix seqconv_eltadd_relu pass during multi-threads predictor, test=develop

* fix attention_lstm_fuse_pass during multi-threads inference, test=develop

* fix embedding_fc_lstm_fuse_pass during multi-threads inference, test=develop

* fix fc_lstm_fuse_pass during multi-threads inference, test=develop

* fix seq_concat_fc_fuse_pass during multi-threads inference, test=develop

835201bf

07 1月, 2020 3 次提交

P

fix trt calib not working bug, test=develop (#21934) (#22110) · 5a611afd
由 Pei Yang 提交于 1月 07, 2020

5a611afd

Fix optimizer op infershape failed in dygraph multi-cards mode (#21374) (#22112) · 34ef38c8

由 Chen Weihang 提交于 1月 07, 2020

* add param & grad shape check for sgd op

* add _reshape_inplece interface for dygraph parallel

* refine unittest based paddle/models scripts, test=develop

* add unittest for parallel grad fuse, test=develop

34ef38c8

【cherry-pick】fix decay param and overflow in match_matrix (#22107) · eb6d3396

由 Aurelius84 提交于 1月 07, 2020

* fix decay param in DecayAdagrad test=develop (#22026)

* fix integer overflow in match_matrix (#22036)

* fix integer overflow in match_matrix test=develop

* fix integer overflow in match_matrix test=develop

* fix typo test=develop

eb6d3396

16 12月, 2019 1 次提交
- 石
  
  fix analysis_predictor when func is called multiple times, test=release/1.6 (#21663) · 70c073a0
  由石晓伟提交于 12月 16, 2019
  
  70c073a0
09 12月, 2019 1 次提交
- Z
  Revert "CHERRY_PICK: TRT int8: refine trt int8 for dynamic range set (#21112) (#21449)" (#21619) · f7c629d9
  由 Zhaolong Xing 提交于 12月 09, 2019
```
This reverts commit 0473cdb8.
```
  f7c629d9
08 12月, 2019 1 次提交
- Z
  CHERRY_PICK: Fix the bug for inference when using auto grwoth allocator (#21623) · d0943dbe
  由 Zhaolong Xing 提交于 12月 08, 2019
```
test=release/1.6
```
  d0943dbe
06 12月, 2019 3 次提交
- B
  
  cherry-pick MKL-DNN NHWC FWD support fix (#21593) · 1f598dfa
  由 bingyanghuang 提交于 12月 06, 2019
  
  1f598dfa
- A
  
  cherry-pick pyramid_hash op test=develop (#20779)(#18525) (#21562) · f83254d6
  由 Aurelius84 提交于 12月 06, 2019
  
  f83254d6
- 石
  
  fix ZeroCopyTensor::mutable_data(), test=release/1.6 (#21581) · e228e707
  由石晓伟提交于 12月 06, 2019
  
  e228e707
05 12月, 2019 2 次提交

P

cherry-pick fix muting glog warning message, test=release/1.6 (#21576) · a6433f8b
由 Pei Yang 提交于 12月 05, 2019

a6433f8b

[Cherry-pick] fix the computation for dx (grad for x) for prelu operation. (#20949) (#21514) · 40549473

由 lilong12 提交于 12月 05, 2019

* fix the computation for dx (grad for x) for prelu operation. (#20949)

* set the default value of alpha for prelu to 0.25, test=develop

* add the call to __syncthreads(), test=develop

* fix the implementation of cpu prelu, test=develop

* repair the implementation of element mode prelu, test=develop

* modify test_prelu_op.py, test=develop

40549473

04 12月, 2019 6 次提交

P
make config option DisableGlogInfo() able to mute all inference logs (#21544) · 857cd9f8
由 Pei Yang 提交于 12月 04, 2019
```
make config option DisableGlogInfo() able to mute all inference logs
```
857cd9f8

Refactor fetch handler (#21264) (#21537) · 87a8caa8

由 tangwei12 提交于 12月 04, 2019

* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.

For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.

87a8caa8

Z
[cherry-pick] NV JETSON support and auto_growth strategy for inference. (#21500) · 20a09375
由 Zhaolong Xing 提交于 12月 04, 2019
```
* ADD NV JETSON SUPPORT
test=release/1.6

* CHERRY_PICK: specify the auto growth allocator for inference.
test=release/1.6
```
20a09375
W

Fix dgc clip & rampup step, test=release/1.6 (#21519) · 3f1169fe
由 WangXi 提交于 12月 04, 2019

3f1169fe
B

[cherry pick] Conv2d and Conv2d transpose MKL-DNN NHWC support (#21525) · 0e63746b
由 bingyanghuang 提交于 12月 04, 2019

0e63746b

Pick disable reshape inplace in dygraph (#21486) · 32a0eb50

由 hong 提交于 12月 04, 2019

* disable reshape inplace in dygraph model; test=develop (#21157)

* fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)

32a0eb50

03 12月, 2019 11 次提交
- L
  set dim[0] to -1 if dim[0] < 0 during compiling for c_allgather op (#21402) (#21512) · df2b4002
  由 lilong12 提交于 12月 03, 2019
```
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
```
  df2b4002
- L
  Fix transpose conv (#21406), test=release/1.6 (#21510) · 1fbc45b7
  由 Lv Mengsi 提交于 12月 03, 2019
```
* fix transpose conv,test=develop

* fix comments
test=develop
```
  1fbc45b7
- Z
  [cherry-pick] Improve argsort performance. (#21267) (#21442) · 66c18f4a
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Improve argsort performance.

- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s

- Add fp16 support

* Refine error message
* Refine code
* Add descending sort

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  66c18f4a
- K
  [cherry-pick] add Adam beta1/beta2 support Variable (#21433) · 735a2db0
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  735a2db0
- Z
  [cherry-pick] Add Asypadding for conv fusion. (#21041) (#21439) · 2660107c
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Add Asypadding for conv fusion.

test=develop

reference: pr/20042

* Fix eigen build link error

* Change back file mode

* Use math function & add more checks.
```
  2660107c
- L
  add the framework support for distfc (#21197) (#21463) · e06f4439
  由 lilong12 提交于 12月 03, 2019
```
* add the framework support for distfc and ut, test=develop
* fix the implementation of shard_index_op, test=develop
```
  e06f4439
- K
  [cherry-pick] add bn momentum variable (#21435) · 9c63b7c1
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* batch_norm momentum support variable. test=develop
```
  9c63b7c1
- 石
  
  revert ProgOptimUnsupported check, test=release/1.6 (#21475) · 5c7c6b1e
  由石晓伟提交于 12月 03, 2019
  
  5c7c6b1e
- P
  
  show shape diff in wrong trt input shape errmsg, test=develop (#21451) (#21470) · badaaee6
  由 Pei Yang 提交于 12月 03, 2019
  
  badaaee6
- B
  
  cherry-pick LRN and Pool2d (FWD) NHWC support (#21476) · ccb508dc
  由 bingyanghuang 提交于 12月 03, 2019
  
  ccb508dc
- W
  
  cherry-pick fix shape check in density_prior_box, test=release/1.6 (#21474) · 9ab738aa
  由 wangguanzhong 提交于 12月 03, 2019
  
  9ab738aa
02 12月, 2019 3 次提交

[cherry-pick] find lookup table in order & support dump param (#21347) · 893ea7e0

由 Thunderbrook 提交于 12月 02, 2019

* support dump param of model into afs (#20302)

* support dump param to afs
test=develop

* code style
test=develop

* code style
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* find lookup table in order (#20932)

test=develop

* cherry-pick
test=develop

* solve pslib core in stop worker
test=develop

* print table stat info for pslib
test=develop

893ea7e0

[cherry-pick] Improve topk performance. (#21087) (#21441) · 5dbe9e59

由 zhaoyuchen2018 提交于 12月 02, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

5dbe9e59

[cherry-pick] Fix multihead op bug. (#20783) (#21438) · 2f0f10b3

由 zhaoyuchen2018 提交于 12月 02, 2019

The op should handle k=1024
Fix seq_len < warpsize error.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

2f0f10b3

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致