提交 · 81fe02c3fec82c5249bd143f96d0f4c98bf247ce · BaiXuePrincess / Paddle

23 7月, 2019 1 次提交
- C
  Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664) · fd3aad6c
  由 chengduo 提交于 7月 23, 2019
```
* support sparse gradients
test=develop
```
  fd3aad6c
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

11 6月, 2019 1 次提交

石

Update the Anakin interfaces for content-dnn and MLU (#17890) · bce259e5

由石晓伟提交于 6月 11, 2019

* update anakin-engine interfaces for content-dnn

test=develop

* support only-gpu mode of Anakin

modify eltwise parse

test=develop

* modification for thread-safe

test=develop

* Integrated template instance

test=develop

* increase template parameters

test=develop

* support MLU predictor

test=develop

* update anakin cmake files

test=develop

* update TargetWrapper::set_device

* update the initialization of anakin subgraph

test=develop

* use the default constructor of base class

test=develop

bce259e5

30 5月, 2019 1 次提交
- B
  Add deformable conv v2 op,test=develop (#17145) · bba57cdd
  由 Bai Yifan 提交于 5月 30, 2019
```
* unit commits, test=develop

* update API.spec, test=develop
```
  bba57cdd
17 5月, 2019 1 次提交
- C
  Add record event And remove CSP (#17447) · 5a6ab380
  由 chengduo 提交于 5月 17, 2019
```
* add record_event
test=develop

* remove csp
test=develop
```
  5a6ab380
18 4月, 2019 1 次提交
- G
  
  Polish DGC code (#16818) · cbdb8a17
  由 gongweibao 提交于 4月 18, 2019
  
  cbdb8a17
28 3月, 2019 1 次提交
- G
  
  Add DGC(Deep Gradient Compression) interface. (#15841) · eb83abea
  由 gongweibao 提交于 3月 28, 2019
  
  eb83abea
22 3月, 2019 1 次提交
- N
  1. Add ANAKIN_ROOT compile option · f3a2e4b3
  由 nhzlx 提交于 3月 22, 2019
```
2. refine trt code
test=develop
```
  f3a2e4b3
20 3月, 2019 1 次提交
- N
  
  cherry-pick from feature/anakin-engine: Add subgraph fuse support and anakin engine #16018 · b21770a2
  由 nhzlx 提交于 3月 20, 2019
  
  b21770a2
19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
16 3月, 2019 1 次提交
- Q
  Fix windows compiling (#16230) · 86e912c5
  由 qingqing01 提交于 3月 16, 2019
```
test=develop
```
  86e912c5
15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

22 2月, 2019 1 次提交
- Y
  Initialize the benchmark tester for operator. (#15772) · 7d96c74a
  由 Yiqun Liu 提交于 2月 22, 2019
```
* Initialize the benchmark tester for operator.
test=develop

* Rearrange the codes.
test=develop
```
  7d96c74a
30 1月, 2019 1 次提交
- X
  
  add sample_logits op · 58ad40cc
  由 xuezhong 提交于 1月 30, 2019
  
  58ad40cc
25 1月, 2019 1 次提交

Adding ngraph_engine_op (#14948) · efce2567

由 baojun 提交于 1月 24, 2019

* enable ngraph_engine_op
test=develop

* merge develop test=develop

* avoid const_cast test=develop

* rm ngraph_operator test=develop

* Added TODO to move EnableNgraph test=develop

* Add TODO to remove const_cast test=develop

efce2567

24 1月, 2019 1 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

18 1月, 2019 1 次提交

Tree conv op (#15217) · e2ba9668

由 zhaozhehao 提交于 1月 18, 2019

* refactor tree2col operator with new memory mechanism test=develop

* test=develop

* test=develop

* Modified API according to panyx0718 test=develop

* fix API change according to heavengate test=develop

* Modify API comment test=develop

e2ba9668

29 12月, 2018 1 次提交
- P
  fix script issue · dba009db
  由 peizhilin 提交于 12月 29, 2018
```
test=develop
```
  dba009db
26 12月, 2018 1 次提交
- P
  fix test issues on windows · 01c00b07
  由 peizhilin 提交于 12月 26, 2018
```
test=develop
```
  01c00b07
18 12月, 2018 3 次提交
- P
  
  add ctc support for windows · 19ebd8b4
  由 peizhilin 提交于 12月 18, 2018
  
  19ebd8b4
- P
  include the mkl fix only · b601f2de
  由 peizhilin 提交于 12月 18, 2018
```
test=develop
```
  b601f2de
- P
  
  add mkl,ctc support for windows · 5a6d7fe2
  由 peizhilin 提交于 12月 18, 2018
  
  5a6d7fe2
13 12月, 2018 1 次提交
- S
  fix cmake · deb0d41c
  由 sneaxiy 提交于 12月 12, 2018
```
fix cmake again
test=develop
```
  deb0d41c
10 12月, 2018 2 次提交
- S
  
  featue/py_func · 8760d23c
  由 sneaxiy 提交于 12月 10, 2018
  
  8760d23c
- T
  
  refine names · 53709e7e
  由 tensor-tang 提交于 12月 06, 2018
  
  53709e7e
05 12月, 2018 1 次提交
- T
  
  init jitkernel · 77236e33
  由 tensor-tang 提交于 11月 26, 2018
  
  77236e33
03 12月, 2018 1 次提交
- N
  
  add prelu gpu inference · f75815b7
  由 nhzlx 提交于 12月 03, 2018
  
  f75815b7
28 11月, 2018 1 次提交
- Q
  
  fix prefetch dependency test=develop · b9d3d75f
  由 Qiao Longfei 提交于 11月 28, 2018
  
  b9d3d75f
25 11月, 2018 1 次提交
- Q
  
  lookup table op support prefetch · 47280ef8
  由 Qiao Longfei 提交于 11月 25, 2018
  
  47280ef8
22 11月, 2018 1 次提交

Windows/online (#14474) · d9a1f3e5

由 wopeizl 提交于 11月 22, 2018

* add recordio support

* disable the openblas multi-thread on windows since no support
adjust the python script

* code style

* code style
test=develop

* add create_recordio_file_reader back

* fix code style
test=develop

* fix the gtest.cmake on windows

* fix cc_test on windows

* fix the win build
test=develop

* remove fused compile support on windows
test=develop

* add the jit support
test=develop

* add the jit support, test=develop

* add the jit support, test=develop

* add the jit back
fix compile error on windows

* rollback test=develop

* test case fix

* disable DSO by default on windows

* exclude warpctc_op on windows

* exclude the dynload_warpctc out on windows
test=develop

* fix the scripts error
test=develop

* disable avx on windows by default
test=develop

* re-organize the cmake file

* disable mkl on windows by default

* add warp_ctc back

* fix the dependency

* fix the dependency

* fix the build issue on windows

* remove unsupported flag on windows

* code style

* code style
test=develop

* fix issue

* add profiler, parallel_executor back

* clean up the pre-definitions on windows

* fix build issue

* test=develop

d9a1f3e5

21 11月, 2018 2 次提交
- Y
  fix(Compile): fix depends error when compile op using cub · 3edd32d0
  由 Yu Yang 提交于 11月 21, 2018
```
some operators depend on cub and xxhash by header. The dependency should be declared explicitly rather than declared to pybind.

test=develop
```
  3edd32d0
- D
  Fix compling with cuDNN v5 · cda60311
  由 Dang Qingqing 提交于 11月 20, 2018
```
test=develop
```
  cda60311
19 11月, 2018 7 次提交
- P
  
  add warp_ctc back · 8443961a
  由 peizhilin 提交于 11月 19, 2018
  
  8443961a
- Q
  Convolution fusion operator. (#14449) · fd7e6431
  由 qingqing01 提交于 11月 19, 2018
```
* Convolution fusion operator.
* Clean code
test=develop
```
  fd7e6431
- P
  
  re-organize the cmake file · 4a6769da
  由 peizhilin 提交于 11月 19, 2018
  
  4a6769da
- P
  fix the scripts error · 44940643
  由 peizhilin 提交于 11月 19, 2018
```
test=develop
```
  44940643
- P
  exclude the dynload_warpctc out on windows · 8cf63475
  由 peizhilin 提交于 11月 19, 2018
```
test=develop
```
  8cf63475
- P
  
  exclude warpctc_op on windows · 1aff40a4
  由 peizhilin 提交于 11月 19, 2018
  
  1aff40a4
- P
  
  disable DSO by default on windows · 7d51a0e8
  由 peizhilin 提交于 11月 19, 2018
  
  7d51a0e8
18 11月, 2018 1 次提交
- P
  add the jit back · a3e952f4
  由 peizhilin 提交于 11月 18, 2018
```
fix compile error on windows
```
  a3e952f4

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致