提交 · ecfddebbef8331d2503384c15f8398030a7a73e8 · PaddlePaddle / Paddle

27 4月, 2020 1 次提交
- Y
  
  Add the implementation of inverse (#23310) · ecfddebb
  由 Yiqun Liu 提交于 4月 27, 2020
  
  ecfddebb
09 4月, 2020 1 次提交

Remove: NGraph engine from PDPD repository (#23545) · 3baaee9a

由 mozga-intel 提交于 4月 09, 2020

* Remove the NGraph engine from PDPD repository
1. Each operator was removed from the operator's directory
2. Each test was removed from the unittest directory
3. The parallel executor support was removed from the PDPD
4. The CMake file was removed from the PDPD
5. The NG flags were removed from the repository
test=develop

* Remove ngraph from:
1. Cmake file
2. Python file
test=develop

3baaee9a

05 4月, 2020 1 次提交
- K
  Fix inplace_abn compile error on Windows (#23464) · d223a249
  由 Kaipeng Deng 提交于 4月 05, 2020
```
* fix inplace_abn windows compile error. test=develop
```
  d223a249
02 4月, 2020 1 次提交
- K
  Add inplace abn op (#22806) · 21d95be0
  由 Kaipeng Deng 提交于 4月 02, 2020
```
* add inplace_abn_op. test=develop
```
  21d95be0
30 3月, 2020 1 次提交
- J
  
  [DNNL] Softmax mkldnn op inplace support (#23197) · 012886df
  由 Jacek Czaja 提交于 3月 30, 2020
  
  012886df
26 3月, 2020 1 次提交

[Paddle-TRT]: Ernie Dynamic shape support. (#23138) · 430b0099

由 Zhaolong Xing 提交于 3月 26, 2020

* add dynamic plugin support.
test=develop

* change emb eltwise layernorm to math function
test=develop

* add emb eltwise layernorm
test=develop

* can run dynamic shape ernie
test=develop

* fix ci
test=develop

* add ut for trt ernie dynamic

test=develop

* refine dynamic shape c++ interface.
test=develop

* fix comments
test=develop

* fix comments
test=develop

430b0099

05 2月, 2020 1 次提交

add WITH_NCCL option for cmake. (#22384) · 7bc4b095

由 Wilber 提交于 2月 05, 2020

cmake选项中添加了WITH_NCCL，显示指定是否编译NCCL的部分代码，WITH_NCCL默认打开，但如果WITH_GPU为OFF，则关闭WITH_NCCL

添加了PADDLE_WITH_NCCL定义

单机单卡能够关闭NCCL编译，多卡的话需要默认打开NCCL，如果关闭NCCL，则只能使用单卡
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

7bc4b095

04 2月, 2020 1 次提交
- 石
  
  remove anakin from code, test=develop (#22420) · e1b0d7cb
  由石晓伟提交于 2月 04, 2020
  
  e1b0d7cb
09 1月, 2020 1 次提交
- 石
  
  [Feature] Lite subgraph (#22114) · ad0dfb17
  由石晓伟提交于 1月 09, 2020
  
  ad0dfb17
28 11月, 2019 1 次提交

Polish reference count pass (#21324) · 89966525

由 Zeng Jinle 提交于 11月 28, 2019

* fix ref_cnt pass, test=develop

* add cpp unittests to reference_count_pass, test=develop

* follow comments, test=develop

89966525

07 11月, 2019 1 次提交
- H
  Add select_input_op and select_output_op (#21016) · 1957192f
  由 Huihuang Zheng 提交于 11月 07, 2019
```
These ops are useful in control flow.
```
  1957192f
05 11月, 2019 1 次提交

Support NoNeedBufferVarsInference in dygraph backward (#20868) · 878a40f5

由 Zeng Jinle 提交于 11月 05, 2019

* support no need buffer vars in dygraph, test=develop

* fix inference compilation error, test=develop

* update no_need_buffer_vars_inference, test=develop

* add unittests for no_need_buffer_vars_context, test=develop

* refine no_need_buffer_vars by return ref, test=develop

* polish some codes, test=develop

878a40f5

30 10月, 2019 1 次提交

Move the codes of fused operators to operators/fused directory. (#20881) · 03ba0fda

由 Yiqun Liu 提交于 10月 30, 2019

* Move the codes of fused operators to operators/fused directory.
test=develop

* Correct the op name in cmake.

* Change the use of PADDLE_ENFORCE.
test=develop

03ba0fda

28 10月, 2019 1 次提交
- A
  
  add pyramid_hash_op (#20698) · aacd16db
  由 Aurelius84 提交于 10月 28, 2019
  
  aacd16db
24 10月, 2019 1 次提交

make search_compute support avx default (#20779) · efbdad05

由 Tao Luo 提交于 10月 24, 2019

* make search_compute support avx only

* clean search_compute.h

* rename sse_axpy to avx_axpy

test=develop

* update CMakeLists.txt

test=develop

efbdad05

18 10月, 2019 1 次提交
- Z
  
  Fix op run log when memory optimization strategy is enabled (#20695) · ab575de7
  由 Zeng Jinle 提交于 10月 18, 2019
  
  ab575de7
02 10月, 2019 1 次提交

Add multihead op for ernie opt (#19933) · e8673668

由 zhaoyuchen2018 提交于 10月 02, 2019

* Add multihead op for ernie opt

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine softmax

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine kernel.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cuda kernel

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cuda version

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cmake

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

e8673668

30 9月, 2019 1 次提交

fix compile paddle with anakin bug · 276b5e34

由 Wilber 提交于 9月 30, 2019

* fix compile with anakin bug

* remove useless deps test=develop

- 修复了联编anakin时，遇到的bug.
- 编译test_anakin_activate 不通过
- 编译test_anakin_engine 不通过

276b5e34

17 9月, 2019 1 次提交
- C
  add deformable conv v1 op and cpu version of deformable conv v2 (#18500) · 00efd1d8
  由 chengjuntao 提交于 9月 17, 2019
```
* add deformable conv v1 op, test=develop
```
  00efd1d8
11 9月, 2019 2 次提交

Make leaky relu inplacable (#19676) · 0daa5c97

由 Zeng Jinle 提交于 9月 11, 2019

* make leaky relu inplacable, test=develop

* force add unittests to pass coverage, test=develop

0daa5c97

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

08 9月, 2019 1 次提交
- H
  fix cmakelist deps (#19668) · 1ca6ea03
  由 hutuxian 提交于 9月 08, 2019
```
fix cmakelist deps: remove unnecessary deps and add proper op deps
```
  1ca6ea03
19 8月, 2019 1 次提交

Add match_matrix_tensor op (#18525) · 78a3d837

由 Aurelius84 提交于 8月 19, 2019

* add matrch_matrix_tensor op test=develop

* fix ignore unittest if with_mkl=off test=develop

* clean code and rm is_test param test=develop

* modify API.spec test=develop

* rm useless code in search_compute.h test=develop

* modify api.spec test=develop

* modify default_grad.spec test=develop

* Add API test code test=develop

* clean code in search_computer.h

* modify PADDLE_ENFORCE and clean search_compute.h test=develop

* fix code style test=develop

78a3d837

06 8月, 2019 1 次提交

Add var_conv_2d op (#18518) · e681d655

由 Kevin 提交于 8月 06, 2019

* fix overflow by int32 mul test=develop

* fix reference nullptr

* fix codestyle test=develop

* modify to point in ContextProjectFunctor test=develop

* modify to point in ContextProjectFunctor test=develop

* modify . to -> test=develop

* add var_conv_2d op test=develop

* edit api.spec test=develop

* ignore unittest if with_mkl=off test=develop

* fix python3 division test=develop

* fix ignore unittest bug test=develop

* remove useless code test=develop

* modify api.spec test=develop

* modify default_grad.spec test=develop

e681d655

23 7月, 2019 1 次提交
- C
  Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664) · fd3aad6c
  由 chengduo 提交于 7月 23, 2019
```
* support sparse gradients
test=develop
```
  fd3aad6c
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

11 6月, 2019 1 次提交

石

Update the Anakin interfaces for content-dnn and MLU (#17890) · bce259e5

由石晓伟提交于 6月 11, 2019

* update anakin-engine interfaces for content-dnn

test=develop

* support only-gpu mode of Anakin

modify eltwise parse

test=develop

* modification for thread-safe

test=develop

* Integrated template instance

test=develop

* increase template parameters

test=develop

* support MLU predictor

test=develop

* update anakin cmake files

test=develop

* update TargetWrapper::set_device

* update the initialization of anakin subgraph

test=develop

* use the default constructor of base class

test=develop

bce259e5

30 5月, 2019 1 次提交
- B
  Add deformable conv v2 op,test=develop (#17145) · bba57cdd
  由 Bai Yifan 提交于 5月 30, 2019
```
* unit commits, test=develop

* update API.spec, test=develop
```
  bba57cdd
17 5月, 2019 1 次提交
- C
  Add record event And remove CSP (#17447) · 5a6ab380
  由 chengduo 提交于 5月 17, 2019
```
* add record_event
test=develop

* remove csp
test=develop
```
  5a6ab380
18 4月, 2019 1 次提交
- G
  
  Polish DGC code (#16818) · cbdb8a17
  由 gongweibao 提交于 4月 18, 2019
  
  cbdb8a17
28 3月, 2019 1 次提交
- G
  
  Add DGC(Deep Gradient Compression) interface. (#15841) · eb83abea
  由 gongweibao 提交于 3月 28, 2019
  
  eb83abea
22 3月, 2019 1 次提交
- N
  1. Add ANAKIN_ROOT compile option · f3a2e4b3
  由 nhzlx 提交于 3月 22, 2019
```
2. refine trt code
test=develop
```
  f3a2e4b3
20 3月, 2019 1 次提交
- N
  
  cherry-pick from feature/anakin-engine: Add subgraph fuse support and anakin engine #16018 · b21770a2
  由 nhzlx 提交于 3月 20, 2019
  
  b21770a2
19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
16 3月, 2019 1 次提交
- Q
  Fix windows compiling (#16230) · 86e912c5
  由 qingqing01 提交于 3月 16, 2019
```
test=develop
```
  86e912c5
15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

22 2月, 2019 1 次提交
- Y
  Initialize the benchmark tester for operator. (#15772) · 7d96c74a
  由 Yiqun Liu 提交于 2月 22, 2019
```
* Initialize the benchmark tester for operator.
test=develop

* Rearrange the codes.
test=develop
```
  7d96c74a
30 1月, 2019 1 次提交
- X
  
  add sample_logits op · 58ad40cc
  由 xuezhong 提交于 1月 30, 2019
  
  58ad40cc
25 1月, 2019 1 次提交

Adding ngraph_engine_op (#14948) · efce2567

由 baojun 提交于 1月 24, 2019

* enable ngraph_engine_op
test=develop

* merge develop test=develop

* avoid const_cast test=develop

* rm ngraph_operator test=develop

* Added TODO to move EnableNgraph test=develop

* Add TODO to remove const_cast test=develop

efce2567

24 1月, 2019 1 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功