提交 · d2e59e155aa86b426b3cb0feb5990be77f74fc37 · 机器未来 / Paddle

15 7月, 2022 1 次提交
- R
  
  Remove boost library (#44092) · d2e59e15
  由 Ruibiao Chen 提交于 7月 15, 2022
  
  d2e59e15
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
15 4月, 2022 1 次提交

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

06 4月, 2022 1 次提交
- S
  
  fix bug of missing boost when compile cache.cc (#41430) · 5c6e4bff
  由 Sing_chan 提交于 4月 06, 2022
  
  5c6e4bff
05 4月, 2022 1 次提交

Implement AutoTuneStatus class for Kernel Auto Tune (#41218) · b0f8000e

由 Zhang Ting 提交于 4月 05, 2022

* switch autotune

* implement AutoTuneCache

* implement AutoTuneCache class

* add pybind api

* add dygraph test

* support static mode and eager mode and improve unittests

* rename the SwitchAutoTune Class and improve tests

* improve AutoTuneStatus and reduce the cost of tests

b0f8000e

31 3月, 2022 1 次提交

add_autotune_kernel_tool (#40658) · 7c5dca9f

由 limingshu 提交于 3月 31, 2022

* for 1st time interface combine.

* modification with kernel factory

* first auto_tune version.

* first version.

* basic version

* add warm up step.

* a debug version.

* optimize the functionality of class auto_tuner.

* add some quotes for optimized auto_tuner class.

* add some quotes for optimized auto_tuner class.

* add namespace.

* modification according to the advices

* replace fluid header with phi header.

* replace fluid header with phi header.

7c5dca9f

25 3月, 2022 1 次提交
- Z
  
  Implement a common AlgorithmsCache for kernel auto-tune (#40793) · 01b688c0
  由 Zhang Ting 提交于 3月 25, 2022
  
  01b688c0
23 3月, 2022 1 次提交

Add Gpu Timer Tool (#40642) · 291d8941

由 Zhang Ting 提交于 3月 23, 2022

* add kernel profiler

* add gpu timer tool

* remove warmup

* fix rocm complilation error

291d8941

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致