- 15 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* change cudnn helper for auto-tune * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm. * Fix the bug in calculating and printing current step cache hit rate. * Improve the autotune cache and fix unittest. * Change the key from AlgorithmType to int64_t. * Fix unittest for cpu-only env. * change ChooseAlgoByWorkspace for heuristic mode Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 09 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-