提交 · afc3fcd5095e8238f9c8a7a25debe20eb9c23270 · BaiXuePrincess / Paddle

04 3月, 2019 31 次提交
- F
  anakin subgraph engine (#15774) · afc3fcd5
  由 flame 提交于 2月 27, 2019
```
* add anakin subgraph engine

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* add initial op converter

* update

* update

* fix op register compile error

* update
test=develop

* update
```
  afc3fcd5
- M
  Polish code · 212242c4
  由 minqiyang 提交于 2月 27, 2019
```
test=develop
```
  212242c4
- Y
  Optimize while_op when is_test is true. (#15811) · 1b10a784
  由 Yiqun Liu 提交于 2月 27, 2019
```
test=develop
```
  1b10a784
- X
  Optimize Quantize Op with primitive reuse. (#15929) · 91838c32
  由 xiaolil1 提交于 2月 27, 2019
```
test=develop
```
  91838c32
- L
  refine infershape of sequence_enumerate, hash and fuse_emb_seq_pool · 1c58eee9
  由 luotao1 提交于 2月 27, 2019
```
test=develop
```
  1c58eee9
- M
  Polish code · 3f4aeed5
  由 minqiyang 提交于 2月 27, 2019
```
test=develop
```
  3f4aeed5
- M
  
  Reset output var's pre_op pointer when op was destructed · b754bf30
  由 minqiyang 提交于 2月 27, 2019
  
  b754bf30
- B
  
  Added adam op test=develop (#15710) · ac72bcd0
  由 baojun 提交于 2月 26, 2019
  
  ac72bcd0
- M
  Register sum operator (#15889) · b29acec8
  由 mozga-intel 提交于 2月 27, 2019
```
test=develop
```
  b29acec8
- D
  polish cudnn related code and fix bug. (#15164) · 4449e855
  由 dzhwinter 提交于 2月 27, 2019
```
* staged.

* polish code

* polish code. test=develop

* polish code. test=develop

* api change. test=develop

* fix default value. test=develop

* fix default value. test=develop
```
  4449e855
- M
  The flag of mkldnn is enabled iff it is necessary · 06a7f741
  由 mozga-intel 提交于 2月 27, 2019
```
test=develop
```
  06a7f741
- B
  
  added concat op test=develop · 320b2798
  由 baojun-nervana 提交于 2月 26, 2019
  
  320b2798
- M
  Remove var op deps in imperative mode · b71af29f
  由 minqiyang 提交于 2月 26, 2019
```
test=develop
```
  b71af29f
- T
  fix cpplint error of async_executor.h · 690be0bb
  由 Tao Luo 提交于 2月 26, 2019
```
test=develop
```
  690be0bb
- T
  
  enable cpplint, remove go_fmt · 6e87843e
  由 Tao Luo 提交于 2月 26, 2019
  
  6e87843e
- T
  fix jitcodekey and refine test · 0eefad0a
  由 tensor-tang 提交于 2月 26, 2019
```
test=develop
```
  0eefad0a
- T
  add sgd jitcode and op test · ce4cc482
  由 tensor-tang 提交于 2月 25, 2019
```
test=develop
```
  ce4cc482
- T
  add benchmark and mkl sgd implement · 1bfc565f
  由 tensor-tang 提交于 2月 25, 2019
```
test=develop
```
  1bfc565f
- S
  
  add API.spec. test=develop · a0834044
  由 shippingwang 提交于 2月 26, 2019
  
  a0834044
- S
  
  fix api.spec, test=develop · 7d4feb2f
  由 shippingwang 提交于 2月 26, 2019
  
  7d4feb2f
- M
  Add gperftools into imperative tracer · 9035887b
  由 minqiyang 提交于 2月 26, 2019
```
test=develop
```
  9035887b
- Y
  Optimize gelu operation with mkl erf. · b48d56e8
  由 Yihua Xu 提交于 2月 26, 2019
```
test=develop
```
  b48d56e8
- X
  Optimize INT8 DeQuantize Op with primitive reuse. · f8cbc4f3
  由 xiaoli.liu@intel.com 提交于 2月 26, 2019
```
test=develop
```
  f8cbc4f3
- M
  Fix bugs · 701af439
  由 minqiyang 提交于 2月 26, 2019
```
test=develop
```
  701af439
- B
  
  Update ngraph version to v0.14 test=develop · dea34134
  由 baojun-nervana 提交于 2月 25, 2019
  
  dea34134
- M
  invoke backward_hooks after reduce op's depcounts map · f1a2d204
  由 minqiyang 提交于 2月 25, 2019
```
test=develop
```
  f1a2d204
- M
  Move ClearBlock into OpBase and VarBase's destructor · e0a2b472
  由 minqiyang 提交于 2月 25, 2019
```
test=develop
```
  e0a2b472
- M
  
  Add imperative python tracer · 9abf40c9
  由 minqiyang 提交于 2月 22, 2019
  
  9abf40c9
- T
  enable sgd jitkernel refer code and test · 92f3cf42
  由 tensor-tang 提交于 2月 22, 2019
```
test=develop
```
  92f3cf42
- S
  
  add cosine decay op, test=develop · 13e89151
  由 shippingwang 提交于 2月 22, 2019
  
  13e89151
- J
  
  change default option related to softmax, test=develop · b2ce8320
  由 jerrywgz 提交于 2月 12, 2019
  
  b2ce8320
27 2月, 2019 3 次提交

C

test=develop · 4b7bf06e
由 ceci3 提交于 2月 27, 2019

4b7bf06e

Rewrite is_empty op to avoid unnecessary data transform. (#15509) · 454f4f21

由 Yiqun Liu 提交于 2月 27, 2019

* Rewrite is_empty op to avoid unnecessary data transform.
test=develop

* Add the implementation of InferShape and InferVarType for is_empty op.
test=develop

* Rewrite is_empty op to avoid directly inherit OperatorBase.
test=develop

454f4f21

INT8 Pool kernel Key Creation Optimization. (#15883) · 6724be2b

由 xiaolil1 提交于 2月 27, 2019

* Optimize key creation of INT8 pool kernel to improve the peformance of ResNet-50 and MobileNet, especially for latency.
test=develop

* Optimize key creation of pool fp32 grad.
test=develop

6724be2b

26 2月, 2019 6 次提交

K
Add MKL-DNN placement pass tester · 72253391
由 Krzysztof Binias 提交于 2月 26, 2019
```
test=develop
```
72253391

- MKL-DNN pooling updated to set_prim_desc · c63f6b20

由 Jacek Czaja 提交于 2月 04, 2019

- MKLDNN ops revisited

- disabled softmax modifications

- disabled elementwise_add

- reverted LRN modifications

- reverted SUM primitive

- Partial reviing of softmax

- Enable softmax

- Softmax changes

- LRN is back

- LRN partially disabled

- LRN is back

- LRN fix

- compilation fixes

- Sum fixed(hopefully)

- Enabling (partially) elementwise_add

- Fixes to elemenwise_add

- Lint fixes

quantize fix

- compilation fix

test=develop

Disabling pooling

- Disabled quantize op

test=develop

c63f6b20

Q

Fix bug in fake_quantize_op and add more unit testing (#15912) · 8e439ccf
由 qingqing01 提交于 2月 26, 2019

8e439ccf

loosly check in the InferShape of cross_entropy_op. (#15863) · f4846bf3

由 qingqing01 提交于 2月 26, 2019

* loosly check in cross_entropy_op when soft_label is True
* Add Runtime assertion in backward infer_shape check.
* Skip InferShape check when un-know the input dimensions

f4846bf3

Optimize the CUDA implementation of sequence_expand op by reduce the times of... · f4634d76

由 Yiqun Liu 提交于 2月 26, 2019

Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU. (#15493)

* Optimize the CUDA implementation of sequence_expand op by reduce the times of copying lod data from CPU to GPU.
test=develop

* Refine the op benchmark to support setting lod in config.
test=develop

f4634d76

This PR improve performance of prior_box op about 1.25x faster on CPU. (#15909) · 630c1e83

由 guomingz 提交于 2月 26, 2019

* This PR improve performance of prior_box op about 1.25x faster on CPU.

* Test Env:SKX 8180 with fake data on 28 threads(bs=1).
* The below table shows the ~25% improvement which generated by [eval_tp_fake_data.py](https://github.com/PaddlePaddle/Paddle/issues/15618#issuecomment-464613976).

| Type |Event | Calls |   Total     |  Min.    |   Max.      |  Ave.      |  Ratio.|
| ---------------- | ------------------ | ---- | ------- | -------- | -------- | ------------ | -------- |
| w/ optimization  | thread0::prior_box | 6000 | 921.201 | 0.110572 | 0.383402 | **0.153533** | 0.084585 |
| w/o optimization | thread0::prior_box | 6000 | 1151.85 | 0.102276 | 0.426702 | **0.191976** | 0.103337 |

test=develop

* Fix the style issue.

test=develop

630c1e83

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致