Change cuDNN Conv kernel for auto tune feature (#41313)
* change cudnn helper for auto-tune
* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.
* Fix the bug in calculating and printing current step cache hit rate.
* Improve the autotune cache and fix unittest.
* Change the key from AlgorithmType to int64_t.
* Fix unittest for cpu-only env.
* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
Showing
想要评论请 注册 或 登录