- 25 8月, 2022 1 次提交
-
-
由 hong 提交于
* optimizer conv alog speed * code polish * remove useless code * fix compile error * fix cpu compile error * not use cudnn alog t * add search cache max number * polish code * fix cache test bug * add groups data format to conv args * fix cache test bug * fix cudnn_deterministic bug * fix test switch auto tune bug * fix test swith autotune bug; * fix conv cache bug * fix cache test error * fix cache test bug * fix windows mac compile error * fix workspace search error * update cudnn cache * fix cache test bug; test=develop * fix autotune swith test error * polish code * oplish code
-
- 15 7月, 2022 1 次提交
-
-
由 Ruibiao Chen 提交于
-
- 01 7月, 2022 1 次提交
-
-
由 limingshu 提交于
* 2nd part of transpose update * add switch_auto_tune option. * add some changes according to Ci * refine the structure of auto_tune_base. * merge develop changes * reset the switch_set_range and change unittest of transpose auto-tune * change the kernel auto-tune logits
-
- 24 6月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* perfect copy * deal with conflict * deal with conflict * fix compile bugs * fix unittest bugs * change code format * deal with conflict * modify code by review * fix ce bugs * fix ce bugs * add lo * perfect code format * deal with conflicts
-
- 07 6月, 2022 1 次提交
-
-
由 limingshu 提交于
Transpose optimization with assitant of Chengdu Supercomputing Center and auto_tune operation (#42704)
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 15 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* change cudnn helper for auto-tune * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm. * Fix the bug in calculating and printing current step cache hit rate. * Improve the autotune cache and fix unittest. * Change the key from AlgorithmType to int64_t. * Fix unittest for cpu-only env. * change ChooseAlgoByWorkspace for heuristic mode Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 09 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 06 4月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 05 4月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
* switch autotune * implement AutoTuneCache * implement AutoTuneCache class * add pybind api * add dygraph test * support static mode and eager mode and improve unittests * rename the SwitchAutoTune Class and improve tests * improve AutoTuneStatus and reduce the cost of tests
-
- 31 3月, 2022 2 次提交
-
-
由 Zhang Ting 提交于
-
由 limingshu 提交于
* for 1st time interface combine. * modification with kernel factory * first auto_tune version. * first version. * basic version * add warm up step. * a debug version. * optimize the functionality of class auto_tuner. * add some quotes for optimized auto_tuner class. * add some quotes for optimized auto_tuner class. * add namespace. * modification according to the advices * replace fluid header with phi header. * replace fluid header with phi header.
-
- 25 3月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
-
- 23 3月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
* add kernel profiler * add gpu timer tool * remove warmup * fix rocm complilation error
-