提交 · b93870e696bd097968adcbd28cac09f891bfde64 · 机器未来 / Paddle

“e2ba9668b4a0b9b8c820f8fe152b1f6fc65310e9”上不存在“paddle/fluid/operators/diag_embed_op.cu”

14 11月, 2019 1 次提交

Improve topk performance. (#21087) · b93870e6

由 zhaoyuchen2018 提交于 11月 13, 2019

* Improve topk performance.

give 200000 data to compute topk,
before opt: cost 1s
after opt: cost 0.0028s.

* Refine return value.
* Add cuda util funtions.
* Fix ComputeBlockSize bug & refine comments.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

b93870e6

28 9月, 2019 1 次提交

Enable users to create custom cpp op outside framework. (#19256) · 1a3eef02

由 qingqing01 提交于 9月 28, 2019

* How to write custom op needs to follow framework OP spec.
* Package fluid_framework.so and headers into whl.
* Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir.
* Export some C-APIs to merge OpInfo between core.so and custom_op.so.
* Add unit testing.
* Update API.spec.

1a3eef02

24 9月, 2019 1 次提交
- Z
  
  fix cuda dev_ctx allocator cmake deps, test=develop (#19953) · 37f76407
  由 Zeng Jinle 提交于 9月 24, 2019
  
  37f76407
22 9月, 2019 1 次提交

Add lock to cudnn handle calls (#19845) · c7f36e7c

由 Zeng Jinle 提交于 9月 22, 2019

* refine reallocate of workspace size, test=develop

* add lock to cudnn handle calls, test=develop

c7f36e7c

18 9月, 2019 1 次提交
- Z
  
  refine reallocate of workspace size, test=develop (#19843) · 5eb381a3
  由 Zeng Jinle 提交于 9月 18, 2019
  
  5eb381a3
11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

03 9月, 2019 1 次提交
- T
  refine PADDLE_ENFORCE codes for unify PADDLE_ASSERT_MSG (#19603) · 75d15719
  由 Tao Luo 提交于 9月 03, 2019
```
test=develop
```
  75d15719
08 7月, 2019 1 次提交

add mkldnn shapeblob cache clear strategy (#18513) · fe32879d

由 Tao Luo 提交于 7月 08, 2019

* add mkldnn shapeblob cache clear strategy

test=develop

* refine with comments

test=develop

* make cache clear strategy more safey

test=develop

* add lock for GetShapeBlobSize

test=develop

fe32879d

03 7月, 2019 1 次提交
- T
  add shape_blob for cache mkldnn primitive (#18454) · 3f3112ce
  由 Tao Luo 提交于 7月 03, 2019
```
test=develop
```
  3f3112ce
02 7月, 2019 1 次提交

rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453) · 8f5fffca

由 Leo Zhao 提交于 7月 02, 2019

* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id()

test=develop

* update session id definition and adjust logic for default behavior

test=develop

* reset logic in mkldnn reuse as most of cases work in default.

test=develop

8f5fffca

27 6月, 2019 1 次提交
- M
  Reset DeviceContext after quantization warmup (#18182) · 84096932
  由 Michał Gallus 提交于 6月 27, 2019
```
test=develop
```
  84096932
28 4月, 2019 1 次提交

Use CudnnWorkspaceHandle in exhaustive search (#17082) · b9494058

由 Huihuang Zheng 提交于 4月 28, 2019

1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn.
2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search.

test=develop

b9494058

21 4月, 2019 1 次提交

Refine model gpu memory (#16993) · 1202d3fc

由 Zeng Jinle 提交于 4月 21, 2019

* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop

* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop

* follow comments
test=develop

1202d3fc

25 3月, 2019 1 次提交
- N
  fix ci bug: cudnn handler in multi card · a1d11bb1
  由 nhzlx 提交于 3月 25, 2019
```
test=develop
```
  a1d11bb1
21 3月, 2019 1 次提交
- W
  
  fix win gpu build test=develop (#16334) · b7baeed7
  由 Wu Yi 提交于 3月 21, 2019
  
  b7baeed7
20 3月, 2019 2 次提交

N

git cherry-pick from feature/anakin-engine: update anakin subgraph #16278 · 07dcf285
由 nhzlx 提交于 3月 20, 2019

07dcf285

Collective ops (#15572) · 6382b62f

由 Wu Yi 提交于 3月 20, 2019

* wip allreduce in op

* wip

* wip

* wip

* wip adding test

* wip for conflict with mp mode

* fix tests test=develop

* fix cpu build test=develop

* fix travis clang format test=develop

* fix cpu build test=develop

* update api.spec test=develop

* delete comment test=develop

* fix cpplint test=develop

* fix test=develop

* follow comment test=develop

* add file test=develop

* fix build test=develop

* update test=develop

* to be compatible with sync_bn, and fix mp mode in develop test=develop

6382b62f

19 3月, 2019 1 次提交
- Z
  add allocator flags · 22715487
  由 zhhsplendid 提交于 3月 19, 2019
```
test=develop
```
  22715487
16 3月, 2019 1 次提交
- Q
  Fix windows compiling (#16230) · 86e912c5
  由 qingqing01 提交于 3月 16, 2019
```
test=develop
```
  86e912c5
15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

14 1月, 2019 1 次提交
- C
  Revert "Revert "Remove workspace_handle in conv_cudnn (#15186)"" (#15290) · 46d01d79
  由 chengduo 提交于 1月 13, 2019
```
test=develop
This reverts commit 358e657f.
```
  46d01d79
11 1月, 2019 2 次提交

C
Revert "Remove workspace_handle in conv_cudnn (#15186)" · 358e657f
由 chengduozh 提交于 1月 11, 2019
```
test=develop
This reverts commit 064512aa.
```
358e657f

Remove workspace_handle in conv_cudnn (#15186) · 064512aa

由 chengduo 提交于 1月 10, 2019

* remove workspace_handle in conv2d_cudnn
test=develop

* remove workspace_handle
test=develop

* fix bug
test=develop

* make test_conv2d_op SERIAL
test=develop

* save memory in conv_cudnn
test=develop

* enhance thread safety
test=develop

* enhance temporary allocator
test=develop

* Add excess fraction
test=develop

* follow comments
test=develop

* fix bug and code refine
test=develop

* fix memory size check
test=develop

* rename reuse_tmp_allocation_excess_fraction
test=develop

064512aa

08 1月, 2019 2 次提交
- S
  Revert "Revert "Remove op handle lock"" · ed409ac9
  由 sneaxiy 提交于 1月 08, 2019
```
test=develop
```
  ed409ac9
- Z
  Revert "Remove op handle lock" · dacfaaa9
  由 Zeng Jinle 提交于 1月 08, 2019
```
test=develop
```
  dacfaaa9
02 1月, 2019 1 次提交
- S
  remove_op_handle_lock · d0a8a1e9
  由 sneaxiy 提交于 1月 02, 2019
```
test=develop
```
  d0a8a1e9
29 12月, 2018 1 次提交
- S
  remove tensor core lock · d25395fc
  由 sneaxiy 提交于 12月 29, 2018
```
test=develop
```
  d25395fc
25 12月, 2018 1 次提交

Move GetTensor to tensor_util (#15011) · b9fb03cf

由 chengduo 提交于 12月 25, 2018

* refine tensor
test=develop

* refine tensor
test=develop

* fix device_context log
test=develop

b9fb03cf

21 12月, 2018 1 次提交

[Feature] Add Temporary Allocator (#14875) · 79bd6dfa

由 chengduo 提交于 12月 21, 2018

* Add Temporal Allocator

* add Temporay Allocator to DeviceContext
test=develop

* code refine
test=develop

* fix mean_iou
test=develop

* Add DeviceTemporaryAllocator
test=develop

* fix conv_op bug
test=develop

* small fix
test=develop

* code refine
test=develop

* log refine
test=develop

* fix unit test
test=develop

* move double check

* refine concat_and_split
test=develop

* add limit_of_temporary_allocation
test=develop

* fix name
test=develop

79bd6dfa

11 12月, 2018 1 次提交
- Y
  Fix Eigen macro when using GPU · 7604b1ad
  由 Yu Yang 提交于 12月 11, 2018
```
The macro should be defined by compiler rather than by source.

test=develop
```
  7604b1ad
03 12月, 2018 1 次提交
- S
  
  fix bug · c47c451a
  由 sneaxiy 提交于 12月 03, 2018
  
  c47c451a
22 11月, 2018 1 次提交

Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1

由 chengduo 提交于 11月 22, 2018

* refine cublase
test=develop

* code refine

* refine cublas

* add GEMME_EX

* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop

* fix CublasCall for cuda version
test=develop

* fix error
test=develop

* fix GEMM_EX to be compatible with gcc 4.8
test=develop

* add GEMM_EX
test=develop

* to compatiable with gcc4.8
test=develop

00b9e9a1

15 11月, 2018 1 次提交
- Y
  
  Pass compile · 0d6718fc
  由 Yu Yang 提交于 11月 15, 2018
  
  0d6718fc
08 11月, 2018 2 次提交
- P
  
  fix share library issue · 45125ba5
  由 peizhilin 提交于 11月 08, 2018
  
  45125ba5
- Z
  
  Revert "cherry picked windows patches." · ba8b5619
  由 Zhaolong Xing 提交于 11月 08, 2018
  
  ba8b5619
07 11月, 2018 1 次提交
- Y
  Merge device_context · c774bcbd
  由 Yu Yang 提交于 11月 07, 2018
```
test=develop
```
  c774bcbd
06 11月, 2018 1 次提交
- S
  remove unnecessary codes · faac8a76
  由 sneaxiy 提交于 11月 05, 2018
```
test=develop
```
  faac8a76
31 10月, 2018 1 次提交

feat(platform): lazy initialization of devicecontext in pool (#14067) · 90d9e5ae

由 Yu Yang 提交于 10月 31, 2018

* feat(platform): lazy initialization of devicecontext in pool

Use std::async(deferer, []{...}) to lazy initialize DeviceContext in Pool

test=develop

* Add future includes

test=develop

90d9e5ae

30 10月, 2018 1 次提交
- D
  
  cleard. staged · bf2e4cb1
  由 dzhwinter 提交于 10月 30, 2018
  
  bf2e4cb1
26 10月, 2018 1 次提交
- S
  review fixes (Teamcity fails) · 2098b425
  由 Sylwester Fraczek 提交于 10月 24, 2018
```
test=develop
```
  2098b425

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致