提交 · 9ba585f5bfbd7740a7042ff4883c2ceb3436d87e · Crayon鑫 / Paddle

16 6月, 2020 1 次提交

由 hutuxian 提交于 6月 16, 2020

* Add a StatValue class in the backend to represent a stat.
* Add a singleton StatRegistry to maintain the collection of stats.
* For the sake of code neatness, we only support type of int and float, which can cover most of the scenarios.

5822862d

08 6月, 2020 1 次提交
- Z
  
  temporarily disable these unittests failed on windows (#24942) · 4058e736
  由 Zhou Wei 提交于 6月 08, 2020
  
  4058e736
28 5月, 2020 1 次提交
- Z
  
  add WITH_GPU for cudaerror download (#24056) · d1047d0a
  由 Zhou Wei 提交于 5月 28, 2020
  
  d1047d0a
11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

20 4月, 2020 1 次提交

Optimize the error messages of paddle CUDA API (#23816) · 78170037

由 Zhou Wei 提交于 4月 20, 2020

* Optimize the error messages of paddle CUDA API, test=develop

* fix the error messages of paddle CUDA API, test=develop

* Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop

* remove build_ex_string,test=develop

* merge conflict,test=develop

78170037

17 4月, 2020 1 次提交

石

DeviceContext Split, test=develop (#23737) · 2d01cc85

由石晓伟提交于 4月 17, 2020

* supports thread-binding stream, test=develop

* avoid using thread_local variables in dtor, test=develop

* modify the stream priority enum, test=develop

2d01cc85

01 4月, 2020 1 次提交
- 石
  
  reverts the commit 23177, test=develop (#23363) · 5c59d213
  由石晓伟提交于 4月 01, 2020
  
  5c59d213
30 3月, 2020 1 次提交
- 石
  
  supports thread-binding stream, test=develop (#23177) · 75ebb48a
  由石晓伟提交于 3月 30, 2020
  
  75ebb48a
25 3月, 2020 1 次提交
- Z
  
  add cuda resource pool for BufferedReader, test=develop (#23152) · bba74071
  由 Zeng Jinle 提交于 3月 25, 2020
  
  bba74071
18 3月, 2020 1 次提交
- Y
  initialize global nccl context in dygraph (#23037) · 121b2aed
  由 Yi Liu 提交于 3月 18, 2020
```
initialize global nccl context in dygraph
test=develop
```
  121b2aed
04 3月, 2020 1 次提交

Add flags to limit gpu memory (#22793) · d41d802b

由 Zeng Jinle 提交于 3月 04, 2020

* add recorded cuda memory apis, fix typo, test=develop

* add more ut, test=develop

* follow comments, test=develop

* fix py35 incompatible issues, test=develop

d41d802b

03 12月, 2019 1 次提交
- Z
  NV jetson(nano, tx2, xavier) inference compile support (#21393) · c5f0293c
  由 Zhaolong Xing 提交于 12月 03, 2019
```
* add jeston compile support
test=develop

* refine the cmake
test=develop
```
  c5f0293c
25 11月, 2019 1 次提交
- Z
  
  remove warning LNK4006 and warning LNK4221 (#21226) · 345b67b5
  由 zhouwei25 提交于 11月 25, 2019
  
  345b67b5
08 11月, 2019 1 次提交

Enrich the type of error and declare the error type interfaces (#21024) · 7ee25189

由 Chen Weihang 提交于 11月 08, 2019

* Enrich the type of error and declare the error type interfaces, test=develop

* adjust tests to adapt new form, test=develop

* add inference deps with error_codes.pb.h, test=develop

* restore stack iter start pos, test=develop

* polish code based review comments, test=develop

7ee25189

16 10月, 2019 1 次提交
- Z
  
  make_conv_workspace_size_configurable, test=develop (#20662) · 4922eb6d
  由 Zeng Jinle 提交于 10月 16, 2019
  
  4922eb6d
14 10月, 2019 1 次提交

Dlpack support (#20039) · 12e4be03

由 633WHU 提交于 10月 14, 2019

* support dlpack to tensor and implement python interface test=develop

* add unittest for _to_dlpack and from_dlpack test=develop

12e4be03

24 9月, 2019 1 次提交
- Z
  
  fix cuda dev_ctx allocator cmake deps, test=develop (#19953) · 37f76407
  由 Zeng Jinle 提交于 9月 24, 2019
  
  37f76407
11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

05 9月, 2019 1 次提交

Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6

由 Yiqun Liu 提交于 9月 05, 2019

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

42b5bec6

16 8月, 2019 1 次提交
- Z
  
  move_flags_to_unified_files_for_management, test=develop (#19224) · 708bd979
  由 Zeng Jinle 提交于 8月 16, 2019
  
  708bd979
23 7月, 2019 1 次提交
- C
  Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664) · fd3aad6c
  由 chengduo 提交于 7月 23, 2019
```
* support sparse gradients
test=develop
```
  fd3aad6c
27 6月, 2019 2 次提交

H
add dependecy of collective_helper (#18365) · 9931bc64
由 HaoRen 提交于 6月 27, 2019
```
* add dependecy of collective_helper

* test=develop
fix dependecy of collective_helper
```
9931bc64

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

18 4月, 2019 1 次提交
- G
  
  Polish DGC code (#16818) · cbdb8a17
  由 gongweibao 提交于 4月 18, 2019
  
  cbdb8a17
30 3月, 2019 1 次提交
- G
  Fix windows compilation error! (#16546) · fea91164
  由 gongweibao 提交于 3月 30, 2019
```
* fix compiled
test=develop

* follow comments test=develop
```
  fea91164
29 3月, 2019 3 次提交
- D
  fix windows compile problem · e3107a6a
  由 dongdaxiang 提交于 3月 26, 2019
```
test=develop
```
  e3107a6a
- D
  add more example on datagenerator · dc8cf36e
  由 dongdaxiang 提交于 3月 23, 2019
```
test=develop
```
  dc8cf36e
- D
  
  add printer for fetch variable · cf136064
  由 dongdaxiang 提交于 2月 18, 2019
  
  cf136064
28 3月, 2019 1 次提交
- G
  
  Add DGC(Deep Gradient Compression) interface. (#15841) · eb83abea
  由 gongweibao 提交于 3月 28, 2019
  
  eb83abea
04 3月, 2019 1 次提交

由 dzhwinter 提交于 2月 27, 2019

* staged.

* polish code

* polish code. test=develop

* polish code. test=develop

* api change. test=develop

* fix default value. test=develop

* fix default value. test=develop

4449e855

27 2月, 2019 1 次提交

由 dzhwinter 提交于 2月 27, 2019

* staged.

* polish code

* polish code. test=develop

* polish code. test=develop

* api change. test=develop

* fix default value. test=develop

* fix default value. test=develop

225c11a9

25 2月, 2019 3 次提交
- C
  Remove unnecessary dependence for profiler (#15899) · 8e904d32
  由 chengduo 提交于 2月 25, 2019
```
* refile profiler
test=develop

* follow comment
test=develop
```
  8e904d32
- Z
  
  update with develop. test=develop · 9261cf39
  由 Zhen Wang 提交于 2月 25, 2019
  
  9261cf39
- Z
  
  add set_attr for IrOpNode. test=develop · 0bf809c9
  由 Zhen Wang 提交于 2月 25, 2019
  
  0bf809c9
21 2月, 2019 2 次提交

T
disable dam temporarily (#15860) · e3dd6970
由 Tao Luo 提交于 2月 21, 2019
```
test=develop
```
e3dd6970

Profiler refine and add CUDA runtime api tracer (#15301) · a83e4704

由 Dun 提交于 2月 21, 2019

* refine profiler && add runtime tracer

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* fix bug && test=develop

* add thread id map && test=develop

* test=develop

* testing

* bug fix

* remove cuda event && refine code && test=develop

* test=develop

* test=develop

* test=develop

* fix windows temp file && test=develop

* test=develop

* fix windows bug && test=develop

* fix start up issue && test=develop

* code polish &&  test=develop

* remove unused code && test=develop

* add some cupti cbid && test=develop

* add FLAGS_multiple_of_cupti_buffer_size && test=develop

* fix compile error && test=develop

* add keyword && test=develop

* fix && test=develop

* code polish && test=develop

a83e4704

20 2月, 2019 1 次提交
- T
  
  remove legacy any.cmake · c797a1f0
  由 Tao Luo 提交于 2月 20, 2019
  
  c797a1f0
03 2月, 2019 1 次提交
- P
  fix the lib_any dependency · 883d2209
  由 peizhilin 提交于 2月 03, 2019
```
test=develop
```
  883d2209
02 2月, 2019 1 次提交
- P
  fix dependency · 061299be
  由 peizhilin 提交于 2月 02, 2019
```
test=develop
```
  061299be
14 1月, 2019 1 次提交
- P
  fix issue when type is invalid · eea75a1d
  由 peizhilin 提交于 1月 14, 2019
```
test=develop
```
  eea75a1d

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致