- 15 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* change cudnn helper for auto-tune * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm. * Fix the bug in calculating and printing current step cache hit rate. * Improve the autotune cache and fix unittest. * Change the key from AlgorithmType to int64_t. * Fix unittest for cpu-only env. * change ChooseAlgoByWorkspace for heuristic mode Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 14 4月, 2022 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 09 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 04 3月, 2022 1 次提交
-
-
由 hong 提交于
* move conv to pten * move conv to pten; test=develop * fix bug; * add conv cudnn impl; test=develop * update * update operator; test=develop * fix bug; test=develop * move operator and prepared_operator to develop; test=develop * resolve conflict; test=develop * remove useless code;test=develop * add depency ; test=develop * fix bug; * add sig.cc ; test=develop * fix use_op error; test=develop * fix bug; test=develop * fix bug; test=develop * add conv3d register; test=develop * fix star gan and conv_nn_grad test failed; test=develop * add header; test=develop * manul to recover to develop; * resolve confilct; test=develop * remove useless code * fix bug; * remove conv2d_cudnn; test=develop * fix bugs; test=develop * fix cpu rocm compile bugs; test=develop * fix blas error; test=develop * fix compile bug; test=develop * fix windows compile error; test=develop * fix windows error; test=develop * resolve confilct; test=develop
-
- 20 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten dir to phi * rename namespace to phi * rename infrt pten dir to phi * resolve conflict * rename pten to phi in cmake * revert all infrt change * change needed files * fix infrt failed * fix inference failed
-
- 19 2月, 2022 1 次提交
-
-
由 Aurelius84 提交于
* Unify paddle/pten::framework::ddim into pten::ddim * fix paddle namespace * compile sucessfully * fix npu src file * fix conflict * fix conflict * fix tensorrt compiler error * fix conflict * fix conflict * fix tesst file conflict * fix conflict * fix mlu file conflict * fix mlu file conflict * fix cinn header file conflict * fix conflict * fix conflict * fix conflict * fix conflict
-
- 08 2月, 2022 1 次提交
-
-
由 Wilber 提交于
* gpu_context.. * update * update * update
-
- 25 1月, 2022 1 次提交
-
-
由 limingshu 提交于
* first commit * add more changes
-
- 30 12月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 08 10月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* support CUDA Graph on PE * add ut, fix CI compile * reduce memory consumption * fix CUDA 10 CI * improve coverage * improve python coverage
-
- 04 8月, 2021 1 次提交
-
-
由 Lijunhui 提交于
-
- 02 6月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
-
- 26 5月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
* optimize OP's compilation time, test=develop * add more op and run ci test, test=develop * CUDA Kernel register in cc file, test=develop * fix macros, test=develop * fix undefined symbol error, test=develop * fix compilation error and undefined symbol, test=develop * fix compilation error on Windows, test=develop * fix compilation error on Windows, test=develop
-
- 18 2月, 2021 1 次提交
-
-
由 Zhang Ting 提交于
* enable exhaustive_search for input_grad when dtype is float16 * enable exhaustive_search for forward algos
-
- 11 1月, 2021 1 次提交
-
-
由 AshburnLee 提交于
-
- 20 11月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 16 11月, 2020 1 次提交
-
-
由 Leo Chen 提交于
-
- 14 10月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* use exhaustive_search for float16 * tune algo only when dtype is float16
-
- 23 9月, 2020 1 次提交
-
-
由 Shang Zhizhou 提交于
* [bug fix]:Memory increases after adapting the cudnn version to 8 * [bug fix]cudnnGetConvolutionForwardAlgorithm not defined
-
- 05 8月, 2020 1 次提交
-
-
由 Zhaolong Xing 提交于
* cunn8 support test=develop * fix ci error test=develop
-
- 27 5月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 12 4月, 2020 1 次提交
-
-
由 zhongpu 提交于
-
- 03 4月, 2020 1 次提交
-
-
由 zhongpu 提交于
* use global conv cache; test=develop * use singleton cache; test=develop * fix format error; test=develop * add cudnn helper header; test=develop * fix header error; test=develop * fix mac unitest; test=develop * fix mac unitest; test=develop * fix file format; test=develop * fix include file error, test=develop * remove kernel_configs_ in class ExecutionContext and kernel_configs_map_ in class OperatorWithKernel, test=develop * fix test_elementwise_mul_op_dim, test=develop * fix compile error, test=develop Co-authored-by: Nphlrain <phliuhongyu@126.com>
-
- 02 4月, 2020 2 次提交
-
-
由 zhongpu 提交于
* use global conv cache; test=develop * use singleton cache; test=develop * fix format error; test=develop * add cudnn helper header; test=develop * fix header error; test=develop * fix mac unitest; test=develop * fix mac unitest; test=develop * fix file format; test=develop * fix include file error, test=develop * remove kernel_configs_ in class ExecutionContext and kernel_configs_map_ in class OperatorWithKernel, test=develop * fix test_elementwise_mul_op_dim, test=develop Co-authored-by: Nphlrain <phliuhongyu@126.com>
- 07 1月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
-
- 04 11月, 2019 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 09 10月, 2019 1 次提交
-
-
由 liuwei1031 提交于
-
- 03 9月, 2019 1 次提交
-
-
由 gongweibao 提交于
Change backward_guard to optimize_guard to maximize the allreduce overlap
-
- 23 7月, 2019 1 次提交
-
-
由 wangchaochaohu 提交于
* rewrite the conv_op using cudnn_conv_helper * add workspace limit for v7 test=develop * fix test=develop * add half float test=develop * fix test=develop * fix test=develop * revise code style test=develop * fix test=develop
-
- 10 5月, 2019 1 次提交
-
-
由 qingqing01 提交于
* Add conv2d_grad_grad_op * Extracte the cuDNN conv algo searching code in conv_cudnn_helper.h. - Now use it in conv2d_grad_grad. - Will simply the searching code in conv2d and conv2d_grad in next PR. * Enhance and fix bug in unit testing of gradient_checker. * Support to fetch empty variables,return None in Python.
-