提交 · fa7ace7cf2859f927c26f1970bbc2f5551532df1 · 机器未来 / Paddle

10 1月, 2020 2 次提交

Cherry pick from #21862 (#22194) · fa7ace7c

由 Guo Sheng 提交于 1月 10, 2020

* Fix default label dim of label_smooth_op. test=develop (#21862)

* Fix unit tests of label_smooth_op's data size.

fa7ace7c

[cherry-pick] Add FC padding, ernie test unit and layernorm parallel (#22198) · 3df38f5c

由 GaoWei8 提交于 1月 10, 2020

* Optimize the kernel implementation of layernorm with openmp (#20895)

* Add ernie c++ inference test (#21015)

* Add ernie unit test
test=develop

* Add ernie unit test
test=develop

* Add ernie unit test
test=develop

* remove ngraph

* optimize gpu test
test=develop

* optimize codes
test=develop

* fix cmake fails on inference_download_and_uncompress (#21185)

* solve cmake fails on inference_download_and_uncompress
test=develop

* solve cmake fails on inference_download_and_uncompress
test=develop

* Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972)

* Add fc padding to solve mkl performance
test=develop

* fix gpu pass and error information
test=develop

* fix fc_fuse_pass_test
test=develop

* fix error information
test=develop

* fix error information
test=develop

* fix name and add fc op padding test
test=develop

* fix attributes
test=develop

* optimize fc padding
test=develop

* fix test
test=develop

* Polish the codes of fc when needs padding (#21378)

test=develop

* Add ernie large c++ inference test (#21365)

* add ernie-large test
test=develop

* add ernie large c++ inference test
test=develop

* Modify padding strategy: remove weight copy in fc padding (#21650)

test=develop

* optimize fc jit (#21878)

test=develop
Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>

3df38f5c

09 1月, 2020 2 次提交
- C
  
  fix softmax_with_cross_entropy_fix bug, test=develop (#21810) (#22183) · bc385a29
  由 Chen Weihang 提交于 1月 09, 2020
  
  bc385a29
- W
  [Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce &... · 515b206d
  由 WangXi 提交于 1月 09, 2020
```
[Cherry-pick 1.6] fix batch_norm_grad shape=0 & allreduce shape enforce & sync_batch_norm hang in fleet (#22157)
```
  515b206d
08 1月, 2020 1 次提交
- Z
  [cherry-pick] Fix softmax cuda bug (#21720) (#22160) · b9a1d954
  由 zhaoyuchen2018 提交于 1月 08, 2020
```
* Fix softmax cuda bug

* Refine multihead log and softmax logic

* Align block to 32
```
  b9a1d954
07 1月, 2020 3 次提交

Fix optimizer op infershape failed in dygraph multi-cards mode (#21374) (#22112) · 34ef38c8

由 Chen Weihang 提交于 1月 07, 2020

* add param & grad shape check for sgd op

* add _reshape_inplece interface for dygraph parallel

* refine unittest based paddle/models scripts, test=develop

* add unittest for parallel grad fuse, test=develop

34ef38c8

【cherry-pick】fix decay param and overflow in match_matrix (#22107) · eb6d3396

由 Aurelius84 提交于 1月 07, 2020

* fix decay param in DecayAdagrad test=develop (#22026)

* fix integer overflow in match_matrix (#22036)

* fix integer overflow in match_matrix test=develop

* fix integer overflow in match_matrix test=develop

* fix typo test=develop

eb6d3396

Y
Fix the global_step & continuous applying error in EMA (#22090) (#22130) · 9b64d636
由 Yibing Liu 提交于 1月 07, 2020
```
* Fix the global_step & continuous applying error in EMA

* Fix for step 0 & add unit test

test=release/1.6
```
9b64d636

09 12月, 2019 1 次提交
- X
  fix logger problem (#21342) (#21635) · 2de10293
  由 xiegegege 提交于 12月 09, 2019
```
* fix logger problem
test=develop

* refine logger
test=develop
```
  2de10293
06 12月, 2019 3 次提交

B

cherry-pick MKL-DNN NHWC FWD support fix (#21593) · 1f598dfa
由 bingyanghuang 提交于 12月 06, 2019

1f598dfa
A

cherry-pick pyramid_hash op test=develop (#20779)(#18525) (#21562) · f83254d6
由 Aurelius84 提交于 12月 06, 2019

f83254d6

CHERRY_PICK: Better TensorRT support (#20858) (#21578) · 0a4002f5

由 Zhaolong Xing 提交于 12月 06, 2019

* Fix TensorRT detection bug

1. Add new search path for TensorRT at tensorrt.cmake
2. Add better debug message
3. Fix the bug of detection of TensorRT version

In NVIDIA official docker image, TensorRT headers are located at
`/usr/include/x86_64-linux-gnu` and TensorRT libraries are located
at `/usr/lib/x86_64-linux-gnu`, so using `-DTENSORRT_ROOT` will
fail to detect TensorRT.

There is no debug/warning message to tell developer that TensorRT
is failed to be detected.

In later version of TensorRT (e.g. v6), `NV_TENSORRT_MAJOR` is
defined at `NvInferVersion.h` instead of `NvInfer.h`, so add
compatibility fix.

* Fix TensorRT variables in CMake

1. Replace `${TENSORRT_ROOT}/include` with `${TENSORRT_INCLUDE_DIR}`
2. Replace `${TENSORRT_ROOT}/lib` with `${TENSORRT_LIBRARY}`

Manually type path may locate incorrect path of TensorRT. Use the
paths detected by system instead.

* Fix TensorRT library path

1. Add new variable - `${TENSORRT_LIBRARY_DIR}`
2. Fix TensorRT library path

inference_lib.cmake and setup.py.in need the path of TensorRT library
instead of the file of TensorRT library, so add new variable to fix it.

* Add more general search rule for TensoRT

Let system detect architecture instead of manually assign it, so
replace `x86_64-linux-gnu` with `${CMAKE_LIBRARY_ARCHITECTURE}`.

* Add more general search rule for TensorRT

Remove duplicate search rules for TensorRT libraries. Use
`${TENSORRT_LIBRARY_DIR}` to get full path of libnvinfer.so

test=release/1.6

0a4002f5

05 12月, 2019 2 次提交

L

construct a DistributedStrategy instance if the passed one is None (#21545) (#21567) · 0dfb5c94
由 lilong12 提交于 12月 05, 2019

0dfb5c94

[Cherry-pick] fix the computation for dx (grad for x) for prelu operation. (#20949) (#21514) · 40549473

由 lilong12 提交于 12月 05, 2019

* fix the computation for dx (grad for x) for prelu operation. (#20949)

* set the default value of alpha for prelu to 0.25, test=develop

* add the call to __syncthreads(), test=develop

* fix the implementation of cpu prelu, test=develop

* repair the implementation of element mode prelu, test=develop

* modify test_prelu_op.py, test=develop

40549473

04 12月, 2019 4 次提交

Refactor fetch handler (#21264) (#21537) · 87a8caa8

由 tangwei12 提交于 12月 04, 2019

* fix fetch handler problem and refactor
when a user define FetchHandler class, he or she should initialize a handler
with variable dict. the key of a variable dict is a user defined name,
the value of a variable dict is a Varaible generated from python API.

For each fetching, a user should implement handler function in which
fetched_result_dict will be available and the user can access the fetched value
with user defined keys.

87a8caa8

W

Fix dgc clip & rampup step, test=release/1.6 (#21519) · 3f1169fe
由 WangXi 提交于 12月 04, 2019

3f1169fe
B

[cherry pick] Conv2d and Conv2d transpose MKL-DNN NHWC support (#21525) · 0e63746b
由 bingyanghuang 提交于 12月 04, 2019

0e63746b

Pick disable reshape inplace in dygraph (#21486) · 32a0eb50

由 hong 提交于 12月 04, 2019

* disable reshape inplace in dygraph model; test=develop (#21157)

* fix ExecutionContext::HasInput and ExecutionContext::HasOutput depend on the scope structure, test=develop (#20721)

32a0eb50

03 12月, 2019 7 次提交
- L
  set dim[0] to -1 if dim[0] < 0 during compiling for c_allgather op (#21402) (#21512) · df2b4002
  由 lilong12 提交于 12月 03, 2019
```
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop
```
  df2b4002
- Z
  [cherry-pick] Improve argsort performance. (#21267) (#21442) · 66c18f4a
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Improve argsort performance.

- Give 200000 data to compute argsort on v100,
can speed up ~190x
before opt cost: 0.53s
after opt cost:0.0027s

- Add fp16 support

* Refine error message
* Refine code
* Add descending sort

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
```
  66c18f4a
- K
  [cherry-pick] add Adam beta1/beta2 support Variable (#21433) · 735a2db0
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  735a2db0
- Z
  [cherry-pick] Add Asypadding for conv fusion. (#21041) (#21439) · 2660107c
  由 zhaoyuchen2018 提交于 12月 03, 2019
```
* Add Asypadding for conv fusion.

test=develop

reference: pr/20042

* Fix eigen build link error

* Change back file mode

* Use math function & add more checks.
```
  2660107c
- L
  add the framework support for distfc (#21197) (#21463) · e06f4439
  由 lilong12 提交于 12月 03, 2019
```
* add the framework support for distfc and ut, test=develop
* fix the implementation of shard_index_op, test=develop
```
  e06f4439
- K
  [cherry-pick] add bn momentum variable (#21435) · 9c63b7c1
  由 Kaipeng Deng 提交于 12月 03, 2019
```
* batch_norm momentum support variable. test=develop
```
  9c63b7c1
- B
  
  cherry-pick LRN and Pool2d (FWD) NHWC support (#21476) · ccb508dc
  由 bingyanghuang 提交于 12月 03, 2019
  
  ccb508dc
02 12月, 2019 1 次提交

[cherry-pick] find lookup table in order & support dump param (#21347) · 893ea7e0

由 Thunderbrook 提交于 12月 02, 2019

* support dump param of model into afs (#20302)

* support dump param to afs
test=develop

* code style
test=develop

* code style
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* dump param
test=develop

* find lookup table in order (#20932)

test=develop

* cherry-pick
test=develop

* solve pslib core in stop worker
test=develop

* print table stat info for pslib
test=develop

893ea7e0

29 11月, 2019 2 次提交
- W
  
  Fix dgc accuracy by mv regularization to local, test=release/1.6 (#21390) · 6ce49eea
  由 WangXi 提交于 11月 29, 2019
  
  6ce49eea
- W
  
  Fp32 vs int8 qat C++ performance (#21244) (#21432) · 06545fcf
  由 Wojciech Uss 提交于 11月 29, 2019
  
  06545fcf
28 11月, 2019 1 次提交

cherry-pick1.6 fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21339) · 072eb5b6

由 xujiaqi01 提交于 11月 28, 2019

* fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052)

* fix cache table bug
* add save_paddle_inference_model
* fix hdfs util bug
* test=develop

* fix several sparse table issuses (#20686)

* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto.
* add find_distributed_lookup_table_grads instead of hard code GRAD
* support embedding stop gradient. push sparse has error before fix this.* 
* fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this.
* fix pull sparse, skip slots which do not have embedding.
* fix collect feasign label info, skip slots which do not have embedding.
* support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables.
* test=develop

* add copy table (#21086)

* copy some feasigns and corresponding embeddings from one sparse table to another
* copy all feasigns and corresponding embeddings from one sparse table to another
* copy all dense params from one table to another
* copy some local vars to other local vars

* fix fs_client_param bug (#21212)

* fix fs_client_param bug， user can set this config through fleet_desc_file or fleet config
* test=develop

* fix fleet util bug (#21254)

* fix fleet util bug in save paddle inference model
* test=develop

072eb5b6

26 11月, 2019 4 次提交
- L
  [Cherry pick] instance_norm, gradients and batch_norm (#21301) · 97bbab47
  由 Lv Mengsi 提交于 11月 26, 2019
```
* Fix gradients (#20857)

* fix_gradients

* fix_gradients, test=develop

* fix instance norm (#21042)

* fix instance norm

* update unitest,test=develop

* fix_bn

* revert unittest,test=develop
```
  97bbab47
- B
  
  [cherry-pick] Refactor mkldnn eletwise_mul and error message for NHWC in mkldnn (#21361) · 03dda317
  由 bingyanghuang 提交于 11月 26, 2019
  
  03dda317
- W
  
  [Cherry-pick 1.6] Fix dgc buffer illegal & reuse velocity & fix fuse (#21281) · 93c7f058
  由 WangXi 提交于 11月 26, 2019
  
  93c7f058
- W
  
  Fix INF bug of softmax_cross_entropy_op, test=release/1.6 (#21283) · 3423f0b6
  由 WangXi 提交于 11月 26, 2019
  
  3423f0b6
25 11月, 2019 3 次提交

cherry-pick error info check of Print_op for release1.6 (#21349) · 9a98d11e

由 lijianshe02 提交于 11月 25, 2019

* add input type and input data type check for Print_op test=develop (#21250)

* add input type and input data type check for Print_op test=develop

* cherry-pick error info check of Print_op for release1.6 test=develop

* cherry-pick error info check of Print_op for release1.6 test=develop

9a98d11e

fix bug of issue (#21331) · da9752fe

由 Yi Liu 提交于 11月 25, 2019

* fix bug of issue #21259 (#21287)
pass the argument `allow_out_of_range` of one_hot op to c++ back end.

da9752fe

[cherry-pick] fix crop_tensor, maxout and lrn (#21302) · 3848f720

由 Zhang Ting 提交于 11月 25, 2019

* [cherry-pick] All elements in attr(shape) of crop_tensor can be -1 and int32/64 kernel registered (#20756)

* All elements in attr(shape) of crop_tensor can be -1, test=develop, test=document_preview

* fix the bug that attr(offsets) should be initialized, test=develop

* [cherry-pick] maxout supports channel_last input (#20846)

* maxout support channel_last input, test=develop

* modified details of Input(X) and Attr(groups, axis) in doc, test=develop

* [cherry-pick] lrn supports channel_last input, test=develop (#20954)

3848f720

23 11月, 2019 2 次提交
- K
  
  add mkldnn include. test=develop (#21314) · f9cbe3bd
  由 Kaipeng Deng 提交于 11月 23, 2019
  
  f9cbe3bd
- K
  [cherry-pick] fix elementwise mod (#21315) · 5e35e5ea
  由 Kaipeng Deng 提交于 11月 23, 2019
```
* fix elementwise_mod FP kernel. test=develop

* fix unittest. test=develop
```
  5e35e5ea
21 11月, 2019 1 次提交

[cherry-pick]fix bug in pool/conv/conv_transpose: UpdatePaddingAndDilation,... · 7ab85396

由 liym27 提交于 11月 21, 2019

[cherry-pick]fix bug in pool/conv/conv_transpose: UpdatePaddingAndDilation, _get_padding_with_SAME and conv2dtranspose_forward_naive. (#20997) (#21225)

* fix bug in pool/conv/conv_transpose:
    1. It should be stride[i] not stride[0] in UpdatePaddingAndDilation;
    2. fix bug of func  _get_padding_with_SAME in test_conv/conv_transpose_op.py;
    3. fix bug of the computation process in function conv2dtranspose_forward_naive.
    test=release/1.6

7ab85396

14 11月, 2019 1 次提交
- T
  fix error message in expand API, and fix two error unit-tests (#21180) · cdb81264
  由 Tao Luo 提交于 11月 14, 2019
```
test=release/1.6
```
  cdb81264

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致