- 31 8月, 2023 1 次提交
-
-
由 Tian Zheng 提交于
* Add fused_scale_bias_relu_conv_bnstats op * Review changes * Fix no CUDNN Frontend build * Fix PADDLE_ENFORCE format * Fix PADDLE_ENFORCE CI error * Rename kernel filename * Refactor unittest to use paddle eager_op_test * Fix padding bugs * Review changes * test=cuda117 * test=cuda117
-
- 19 4月, 2023 1 次提交
-
-
由 limingshu 提交于
* first commit * restruct c++ interface to divide linear from matmulwithcublaslt * finish building in cublaslt impl * fix code bugs * fix host cost * add some changes
-
- 21 2月, 2023 1 次提交
-
-
由 limingshu 提交于
-
- 18 11月, 2022 1 次提交
-
-
由 Tian Zheng 提交于
* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation * Fix macro * Add implementation for conv_kernel and conv_grad_kernel * Modification after rebase onto latest develop * Modify plan cache to comply with the API of phi::autotune * Refactor to reduce duplicate code * Review fix: - move functions in conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu - add const specifier for input tensor - add logging when plans fail to execute - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h * - move plan building outside of cache * Fix ROCM build
-
- 11 11月, 2022 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 25 8月, 2022 1 次提交
-
-
由 hong 提交于
* optimizer conv alog speed * code polish * remove useless code * fix compile error * fix cpu compile error * not use cudnn alog t * add search cache max number * polish code * fix cache test bug * add groups data format to conv args * fix cache test bug * fix cudnn_deterministic bug * fix test switch auto tune bug * fix test swith autotune bug; * fix conv cache bug * fix cache test error * fix cache test bug * fix windows mac compile error * fix workspace search error * update cudnn cache * fix cache test bug; test=develop * fix autotune swith test error * polish code * oplish code
-
- 01 7月, 2022 1 次提交
-
-
由 limingshu 提交于
* 2nd part of transpose update * add switch_auto_tune option. * add some changes according to Ci * refine the structure of auto_tune_base. * merge develop changes * reset the switch_set_range and change unittest of transpose auto-tune * change the kernel auto-tune logits
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 15 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* change cudnn helper for auto-tune * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm. * Fix the bug in calculating and printing current step cache hit rate. * Improve the autotune cache and fix unittest. * Change the key from AlgorithmType to int64_t. * Fix unittest for cpu-only env. * change ChooseAlgoByWorkspace for heuristic mode Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 05 4月, 2022 1 次提交
-
-
由 Zhang Ting 提交于
* switch autotune * implement AutoTuneCache * implement AutoTuneCache class * add pybind api * add dygraph test * support static mode and eager mode and improve unittests * rename the SwitchAutoTune Class and improve tests * improve AutoTuneStatus and reduce the cost of tests
-
- 03 3月, 2022 1 次提交
-
-
由 xiongkun 提交于
* add pad forward * fix error * transfer pad and pass the test_pad_op
-
- 25 2月, 2022 1 次提交
-
-
由 0x45f 提交于
* move eye OP to pten * move size OP to pten * merge develop * fix merge * move files * move erfinv OP to phi * remove comment * move pixel_shuffle OP to phi * remove comment * fix PT_REGISTER * fix NPU * fix CR * remove size_sig.cc for PR-CI-Coverage
-
- 20 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten dir to phi * rename namespace to phi * rename infrt pten dir to phi * resolve conflict * rename pten to phi in cmake * revert all infrt change * change needed files * fix infrt failed * fix inference failed
-
- 15 2月, 2022 1 次提交
-
-
由 From00 提交于
* Move Abs op to pten * Fix NPU compilation error * Fix CI error * Use LaunchSameDimsElementwiseCudaKernel in pten
-
- 28 1月, 2022 1 次提交
-
-
由 hong 提交于
* move digamma to pten; test=develop * fix mutable_data bugs; test=develop * remove useless code; test=develop * remove kernel compute; test=develop * fix bug; test=develop
-
- 17 1月, 2022 1 次提交
-
-
由 Allen Guo 提交于
* update ipu_backend * sync with paddle internal Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai> Co-authored-by: NAllen Guo <alleng@graphcore.ai> Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai> Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai> Co-authored-by: NHan Zhao <hanzhao@graphcore.ai> * apply comments 01 * update error messag * restore ipu_executor and ipu_optimizer * add clang-format on Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai> Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai> Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai> Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>
-
- 12 1月, 2022 1 次提交
-
-
由 ziyoujiyi 提交于
* delete gloo connect retry * the_one_ps dirs reconstruct * . * . * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify
-
- 03 11月, 2021 1 次提交
-
-
由 LiYuRio 提交于
-
- 18 9月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
Add basic Cost Model, it uses executor to run program and profile it to get op time. This is an early basic version, we will add more functions in the future.
-
- 15 9月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 03 6月, 2020 1 次提交
-
-
由 Yanghello 提交于
* add crypto helper for paddle, test=develop * cryptopp.cmake bug fixed, test=develop * remove debug build type, test=develop * fixed CMakeLists for new target, test=develop * fix CI bug, test=develop * add cmake option flag DWITH_CRYPTO, test=develop * add crypto api for python, test=develop * Revert "add crypto api for python, test=develop" This reverts commit 3a1cfa9d. * Revert "Add crypto api (#24694)" This reverts commit 5a7a517c. * Revert "Revert "Add crypto api (#24694)"" This reverts commit f952b19f. * fixed cryptopp cmake building error, test=develop * change WITH_CRYPTO building option to OFF, test=develop * âfixed cipher test failed, test=develop * "add crypto api for python, test=develop" This reverts commit 83fb55c0. * travis CI bug fixed, test=develop * fixed test in python3 * test=develop * fixed unittest, test=develop
-
- 21 1月, 2019 1 次提交
-
-
由 flame 提交于
add python inference api
-
- 10 1月, 2019 1 次提交
-
-
由 flame 提交于
-
- 13 12月, 2018 1 次提交
-
-
由 sneaxiy 提交于
fix cmake again test=develop
-
- 10 12月, 2018 1 次提交
-
-
由 sneaxiy 提交于
-
- 10 9月, 2018 1 次提交
-
-
由 Yan Chunwei 提交于
-
- 18 6月, 2018 1 次提交
-
-
由 Yan Chunwei 提交于
-
- 24 5月, 2018 1 次提交
-
-
由 Yan Chunwei 提交于
-
- 23 5月, 2018 1 次提交
-
-
由 Yan Chunwei 提交于
Add the demo of subgraph splitter
-
- 22 3月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 07 3月, 2018 2 次提交
- 06 3月, 2018 2 次提交
- 15 2月, 2018 1 次提交
-
-
由 Yi Wang 提交于
* Update tensor_util.h * Update with moved TensorDesc * Fix tensur_utils.cu * Update * Update * Update * Update * Make tensor_util.cu a symbolic link
-
- 10 2月, 2018 2 次提交
- 07 2月, 2018 1 次提交
-
-
由 fengjiayi 提交于
-
- 06 2月, 2018 2 次提交