提交 · 4054947347c385723054444538d4b58adb6c9bc9 · Crayon鑫 / Paddle

05 12月, 2019 2 次提交

[Cherry-pick] fix the computation for dx (grad for x) for prelu operation. (#20949) (#21514) · 40549473

由 lilong12 提交于 12月 05, 2019

* fix the computation for dx (grad for x) for prelu operation. (#20949)

* set the default value of alpha for prelu to 0.25, test=develop

* add the call to __syncthreads(), test=develop

* fix the implementation of cpu prelu, test=develop

* repair the implementation of element mode prelu, test=develop

* modify test_prelu_op.py, test=develop

40549473

Z

let WHTI_XBYAK can be adjusted by -D when cmake,test=develop (#21538) · e3dd13b1
由 zhouwei25 提交于 12月 05, 2019

e3dd13b1

04 12月, 2019 6 次提交

P
make config option DisableGlogInfo() able to mute all inference logs (#21544) · 857cd9f8
由 Pei Yang 提交于 12月 04, 2019
```
make config option DisableGlogInfo() able to mute all inference logs
```
857cd9f8

Refactor fetch handler (#21264) (#21537) · 87a8caa8

由 tangwei12 提交于 12月 04, 2019

* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.

For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.

87a8caa8

Z
[cherry-pick] NV JETSON support and auto_growth strategy for inference. (#21500) · 20a09375
由 Zhaolong Xing 提交于 12月 04, 2019
```
* ADD NV JETSON SUPPORT
test=release/1.6

* CHERRY_PICK: specify the auto growth allocator for inference.
test=release/1.6
```
20a09375
W

Fix dgc clip & rampup step, test=release/1.6 (#21519) · 3f1169fe
由 WangXi 提交于 12月 04, 2019

3f1169fe
B

[cherry pick] Conv2d and Conv2d transpose MKL-DNN NHWC support (#21525) · 0e63746b
由 bingyanghuang 提交于 12月 04, 2019

0e63746b

Pick disable reshape inplace in dygraph (#21486) · 32a0eb50

由 hong 提交于 12月 04, 2019

* disable reshape inplace in dygraph model; test=develop (#21157)

* fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)

32a0eb50

03 12月, 2019 11 次提交
- L
  set dim[0] to -1 if dim[0] < 0 during compiling for c_allgather op (#21402) (#21512) · df2b4002
  由 lilong12 提交于 12月 03, 2019
```
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
```
  df2b4002
- L
  Fix transpose conv (#21406), test=release/1.6 (#21510) · 1fbc45b7
  由 Lv Mengsi 提交于 12月 03, 2019
```
* fix transpose conv,test=develop

* fix comments
test=develop
```
  1fbc45b7
- Z
  [cherry-pick] Improve argsort performance. (#21267) (#21442) · 66c18f4a
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Improve argsort performance.

- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s

- Add fp16 support

* Refine error message
* Refine code
* Add descending sort

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  66c18f4a
- K
  [cherry-pick] add Adam beta1/beta2 support Variable (#21433) · 735a2db0
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  735a2db0
- Z
  [cherry-pick] Add Asypadding for conv fusion. (#21041) (#21439) · 2660107c
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Add Asypadding for conv fusion.

test=develop

reference: pr/20042

* Fix eigen build link error

* Change back file mode

* Use math function & add more checks.
```
  2660107c
- L
  add the framework support for distfc (#21197) (#21463) · e06f4439
  由 lilong12 提交于 12月 03, 2019
```
* add the framework support for distfc and ut, test=develop
* fix the implementation of shard_index_op, test=develop
```
  e06f4439
- K
  [cherry-pick] add bn momentum variable (#21435) · 9c63b7c1
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* batch_norm momentum support variable. test=develop
```
  9c63b7c1
- 石
  
  revert ProgOptimUnsupported check, test=release/1.6 (#21475) · 5c7c6b1e
  由石晓伟提交于 12月 03, 2019
  
  5c7c6b1e
- P
  
  show shape diff in wrong trt input shape errmsg, test=develop (#21451) (#21470) · badaaee6
  由 Pei Yang 提交于 12月 03, 2019
  
  badaaee6
- B
  
  cherry-pick LRN and Pool2d (FWD) NHWC support (#21476) · ccb508dc
  由 bingyanghuang 提交于 12月 03, 2019
  
  ccb508dc
- W
  
  cherry-pick fix shape check in density_prior_box, test=release/1.6 (#21474) · 9ab738aa
  由 wangguanzhong 提交于 12月 03, 2019
  
  9ab738aa
02 12月, 2019 5 次提交

[cherry-pick] find lookup table in order & support dump param (#21347) · 893ea7e0

由 Thunderbrook 提交于 12月 02, 2019

* support dump param of model into afs (#20302)

* support dump param to afs
test=develop

* code style
test=develop

* code style
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* find lookup table in order (#20932)

test=develop

* cherry-pick
test=develop

* solve pslib core in stop worker
test=develop

* print table stat info for pslib
test=develop

893ea7e0

[cherry-pick] Improve topk performance. (#21087) (#21441) · 5dbe9e59

由 zhaoyuchen2018 提交于 12月 02, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

5dbe9e59

[cherry-pick] Fix multihead op bug. (#20783) (#21438) · 2f0f10b3

由 zhaoyuchen2018 提交于 12月 02, 2019

The op should handle k=1024
Fix seq_len < warpsize error.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

2f0f10b3

Z
[cherry-pick] Fix gru as small frame_size has error. (#20922) (#21440) · 873b32de
由 zhaoyuchen2018 提交于 12月 02, 2019
```
seems shuffle_sync cannot handle small size

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
873b32de
Z

CHERRY_PICK: TRT int8: refine trt int8 for dynamic range set (#21112) (#21449) · 0473cdb8
由 Zhaolong Xing 提交于 12月 02, 2019

0473cdb8

30 11月, 2019 1 次提交
- Y
  Fix the crash issue when scale or bias was null-pointer. (#21284) (#21444) · 408e638c
  由 Yihua Xu 提交于 11月 30, 2019
```
* Fix the crash issue when scale or bias was null-pointer.

* Add the error message for passing CI.

test=release/1.6
```
  408e638c
29 11月, 2019 3 次提交
- P
  fix trt weight bug (#21231) (#21443) · 77268831
  由 Pei Yang 提交于 11月 29, 2019
```
added splitter "__" between weight name and suffix number to avoid conflicts.
```
  77268831
- W
  
  Fix dgc accuracy by mv regularization to local, test=release/1.6 (#21390) · 6ce49eea
  由 WangXi 提交于 11月 29, 2019
  
  6ce49eea
- W
  
  Fp32 vs int8 qat C++ performance (#21244) (#21432) · 06545fcf
  由 Wojciech Uss 提交于 11月 29, 2019
  
  06545fcf
28 11月, 2019 1 次提交

cherry-pick1.6 fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21339) · 072eb5b6

由 xujiaqi01 提交于 11月 28, 2019

* fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052)

* fix cache table bug
* add save_paddle_inference_model
* fix hdfs util bug
* test=develop

* fix several sparse table issuses (#20686)

* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.* 
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop

* add copy table (#21086)

* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars

* fix fs_client_param bug (#21212)

* fix fs_client_param bug， user can set this config through fleet_desc_file or fleet config
* test=develop

* fix fleet util bug (#21254)

* fix fleet util bug in save paddle inference model
* test=develop

072eb5b6

26 11月, 2019 4 次提交
- L
  [Cherry pick] instance_norm, gradients and batch_norm (#21301) · 97bbab47
  由 Lv Mengsi 提交于 11月 26, 2019
```
* Fix gradients (#20857)

* fix_gradients

* fix_gradients, test=develop

* fix instance norm (#21042)

* fix instance norm

* update unitest,test=develop

* fix_bn

* revert unittest,test=develop
```
  97bbab47
- B
  
  [cherry-pick] Refactor mkldnn eletwise_mul and error message for NHWC in mkldnn (#21361) · 03dda317
  由 bingyanghuang 提交于 11月 26, 2019
  
  03dda317
- W
  
  [Cherry-pick 1.6] Fix dgc buffer illegal & reuse velocity & fix fuse (#21281) · 93c7f058
  由 WangXi 提交于 11月 26, 2019
  
  93c7f058
- W
  
  Fix INF bug of softmax_cross_entropy_op, test=release/1.6 (#21283) · 3423f0b6
  由 WangXi 提交于 11月 26, 2019
  
  3423f0b6
25 11月, 2019 6 次提交

cherry-pick error info check of Print_op for release1.6 (#21349) · 9a98d11e

由 lijianshe02 提交于 11月 25, 2019

* add input type and input data type check for Print_op test=develop (#21250)

* add input type and input data type check for Print_op test=develop

* cherry-pick error info check of Print_op for release1.6 test=develop

* cherry-pick error info check of Print_op for release1.6 test=develop

9a98d11e

Fix the CAPI ZeroCopy shape error and reuse the code to get output (#21240) (#21345) · c75b162a

由 liu zhengxi 提交于 11月 25, 2019

* fix the CAPI ZeroCopy shape error and reconstruct the output obtain

* use an anonymous namespace to cover the functor

* fix unit tests because of the output of typeid(T).name() is different from linux and windows, test=develop

c75b162a

fix bug of issue #21259 (#21331) · da9752fe

由 Yi Liu 提交于 11月 25, 2019

* fix bug of issue #21259 (#21287)
pass the argument `allow_out_of_range` of one_hot op to c++ back end.

da9752fe

cherry-pick (#21201) to release/1.6 (#21306) · a91b8014

由 liuwei1031 提交于 11月 25, 2019

cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows

a91b8014

[cherry-pick] fix crop_tensor, maxout and lrn (#21302) · 3848f720

由 Zhang Ting 提交于 11月 25, 2019

* [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756)

* All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview

* fix the bug that attr(offsets) should be initialized, test=develop

* [cherry-pick] maxout supports channel_last input (#20846)

* maxout support channel_last input, test=develop

* modified details of Input(X) and Attr(groups, axis) in doc, test=develop

* [cherry-pick] lrn supports channel_last input, test=develop (#20954)

3848f720

Add pre-condition check for fuse optimizer op pass (#21005) (#21305) · 9f004548

由 Chen Weihang 提交于 11月 25, 2019

* add pre condition check for fuse optimizer op pass, test=develop

* add log & set init to zero, test=develop

* fix test_fuse_all_reduce_pass failed, test=develop

* polish details, test=develop

* refine PADDLE_ENFORCE & remove needless VLOG, test=develop

* refactor op check method, test=develop

9f004548

24 11月, 2019 1 次提交
- C
  Further simplify the C++ error info stack (#21093) (#21304) · 9110c896
  由 Chen Weihang 提交于 11月 24, 2019
```
* simplify C++ error stack by rewrite Place, test=develop

* polish assignment overload func, test=develop
```
  9110c896

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致