- 01 2月, 2023 1 次提交
-
-
由 limingshu 提交于
* profile reduce kernel for fp16 and reduceHigherdim * use reinterpret_cast * fix for CI on ROCm * add Macro for ROCm * ROCm CI config * ROCm CI config * unit test repair * pull * add common_funcs.h * reduceType * Update reduce_function.h * not higher * rename * implement of matmul using cublasLt instead of cublas * cublasLt bugfix * Update matmul_kernel_impl.h * Update matmul_kernel_impl_via_blasLt.h * for-loop-algo * PR comments changes * add macro * ci unused variable isCublasLt * ci unused variable isCublasLt macro * split matmul to autotune * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * rewrite the split kernel with segmented_array * add some method for cuda_graph * fix bugs for rocm * change for ci-error * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work. * add some changes for passing mode_benchmark and coverage ci * fix ci error * fix ci-rocm error * add some changes for header --------- Co-authored-by: Nzhangbopd <1299246947@qq.com> Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
-
- 31 1月, 2023 32 次提交
-
-
由 RedContritio 提交于
-
由 wenbin 提交于
* gn_silu * add ut * set TIMEOUT * correct comments * comments * disable windows ut * rename parameter
-
由 wangshengxiang 提交于
-
由 wenbin 提交于
* disable integer * disable integer * add cast layer
-
由 Zhang Jun 提交于
-
由 niuliling123 提交于
-
由 Charles-hit 提交于
* polish static grad op maker gen * fix some bugs * fix static code gen * solve conflict * modify composite grad maker name * integrate phi and fluid info in static code gen * rename some composite maker * modify static code gen format
-
由 Zhang Jun 提交于
-
由 PuQing 提交于
* add FP16 dtype for CastNumpy2Scalar * fix throw message * add test * fix SyntaxWarning * test skip for float16 * fix dtype mistakes
-
由 ronnywang 提交于
* [CustomDevice] add custom device api * update * update * test=document_fix * update * update * add examples
-
由 RedContritio 提交于
* fix incorrect output shape of broadcast * add unittest
-
由 Roc 提交于
-
由 zhangkaihuo 提交于
-
由 zhangbo9674 提交于
-
由 MarDino 提交于
-
由 张春乔 提交于
* fix mod 0 error * fix div 0 error in floormod
-
由 Yuanle Liu 提交于
-
由 201716010711 提交于
-
由 xiaoting 提交于
* support 0d tensor for interpolate * support 0d tensor for interpolate * add xpu unittest for interp * update unittest for interpolate * fix coverage * fix code style * fix for coverage * fix coverage
-
由 TeFeng Chen 提交于
* support inplaced variable in cinn_launch * fix error hint when compiling * fix inplaced output variable of the subgraph * skip CinnCompiler check * using existed definition * fix namespace reference error * modify error message * update cinn tage * fix namespace * skip enforce check * fix unittest attribute throw
-
由 pangyoki 提交于
-
由 HongyuJia 提交于
* decouple phi custom_op * decouple phi custom_op, remove codes * delete custom symbol of inference
-
由 张春乔 提交于
-
由 RedContritio 提交于
* add axis check in UniqueRawInferMeta * add unittest for negative axis * simplify check for unique
-
由 RedContritio 提交于
* add elements count check in atan2 * add unittest and pre-check in inferMeta * add dimension check
-
由 张春乔 提交于
* add zero size check in matrix_power_kernel_impl.h * add zero size check in matrix_power_kernel_impl.h * add zero size check in unittest * bug_fix * bug_fix * bug_fix * bug_fix * bug_fix * bug fix * bug_fix * bug_fix * add static check * delete the dy codes
-
由 RedContritio 提交于
* support negative index in repeat_interleave * add unittest
-
由 张春乔 提交于
-
由 RedContritio 提交于
-
由 Yiqun Liu 提交于
* Unify the gpu implementation of stack and unstack to reuse the optimization. * Optimize the cuda implementation of unstack. * Use GpuMemcpyAsync instead of memory::Copy. * Fix error of calculating the index. * Use FastDivMod to further imporve the performance of unstack.
-
由 LiYuRio 提交于
-
由 姜永久 提交于
* rm flags_retain grad in pybind * retain grads for xpu test * set retain grad for xpu * rm flag * lint --------- Co-authored-by: Nwanghuancoder <wanghuan29@baidu.com>
-
- 30 1月, 2023 7 次提交
-
-
由 jiangcheng 提交于
-
由 RedContritio 提交于
* add pivots type check and fix batchsize error * add unittest for batchsize = 0 * fix nullptr in lu_unpack fix batchsize error in LU_Unpack add nullptr check in OneFunctor * remove exception in device code
-
由 Ryan 提交于
* add pinv check * add unitest * update unitest * roll back * fix not call stupid bug * use context
-
由 engineer1109 提交于
replace all TensorFromVector & TensorToVector AssignKernel async copy
-
由 Ruibiao Chen 提交于
* Support stream priority for standalone executor * Fix compile error * Fix compile error * Fix compile error * Fix compile error * Fix compile error
-
由 zmxdream 提交于
* add set slot_num for psgpuwraper (#177) * add set slot_num_for_pull_feature for psgpuwarper * Add get_epoch_finish python interface (#182) * add get_epoch_finish interface * add return * delete return * add unzip op (#183) * fix miss key for error dataset (#186) * fix miss key for error dataset * fix miss key for error dataset Co-authored-by: Nyangjunchao <yangjunchao@baidu.com> * add excluded_train_pair and infer_node_type (#187) * support return of degree (#188) * fix task stuck in barrier (#189) Co-authored-by: Nyangjunchao <yangjunchao@baidu.com> * check node/feature format when loading (#190) * check node&feature format when loading * check node&feature format when loading (2£ (2) * degrade log (#191) * [PGLBOX]fix conflict * [PGLBOX]fix conflict * [PGLBOX]replace LodTensor with phi::DenseTensor * [PGLBOX]fix gpu_primitives.h include path * [PGLBOX]from platform::PADDLE_CUDA_NUM_THREADS to phi::PADDLE_CUDA_NUM_THREADS * [PGLBOX]fix unzip example code * [PGLBOX]fix unzip example code * [PGLBOX]fix unzip example code * [PGLBOX]fix unzip example code * [PGLBOX]fix unzip ut * [PGLBOX]fix unzip ut * [PGLBOX]fix code style * [PGLBOX]fix code style * [PGLBOX]fix code style * fix code style * fix code style * fix unzip ut * fix unzip ut * fix unzip ut * fix unzip * fix code stype * add ut * add c++ ut & fix train_mode_ set * fix load into memory * fix c++ ut * fix c++ ut * fix c++ ut * fix c++ ut * fix code style * fix collective * fix unzip_op.cc * fix barrier * fix code style * fix barrier * fix barrier * fix code styple * fix unzip * add unzip.py * add unzip.py * fix unzip.py --------- Co-authored-by: Nchao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: NSiming Dai <908660116@qq.com> Co-authored-by: Nhuwei02 <53012141+huwei02@users.noreply.github.com> Co-authored-by: Nyangjunchao <yangjunchao@baidu.com>
-
由 gem5 提交于
-