- 15 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* change cudnn helper for auto-tune * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm. * Fix the bug in calculating and printing current step cache hit rate. * Improve the autotune cache and fix unittest. * Change the key from AlgorithmType to int64_t. * Fix unittest for cpu-only env. * change ChooseAlgoByWorkspace for heuristic mode Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 09 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 06 4月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 05 4月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
* switch autotune * implement AutoTuneCache * implement AutoTuneCache class * add pybind api * add dygraph test * support static mode and eager mode and improve unittests * rename the SwitchAutoTune Class and improve tests * improve AutoTuneStatus and reduce the cost of tests
-
- 31 3月, 2022 1 次提交
-
-
由 limingshu 提交于
* for 1st time interface combine. * modification with kernel factory * first auto_tune version. * first version. * basic version * add warm up step. * a debug version. * optimize the functionality of class auto_tuner. * add some quotes for optimized auto_tuner class. * add some quotes for optimized auto_tuner class. * add namespace. * modification according to the advices * replace fluid header with phi header. * replace fluid header with phi header.
-
- 25 3月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
-
- 23 3月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
* add kernel profiler * add gpu timer tool * remove warmup * fix rocm complilation error
-