- 13 12月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* save fused_attention memory when dropout_rate = 0.0 * add ut * fix ut bug * fix fused_layernorm_residual_dropout_bias_test.cu
-
- 07 12月, 2022 1 次提交
-
-
由 张春乔 提交于
-
- 05 12月, 2022 1 次提交
-
-
由 limingshu 提交于
* first commit * fix bugs according to ci * add some changes * change file name into function.cu.h * remove const_cast
-
- 30 11月, 2022 1 次提交
-
-
由 Netpunk 提交于
* migrate transpose_op.cu.h and gpu_utils.h * format code style * fix some problems * format code * reset tranpose_op.cc * test commit * recover transpose_op.h * delete transpose_op.h * adjust header files order in transpose_op.cc
-
- 18 11月, 2022 1 次提交
-
-
由 MarDino 提交于
* fused qkvBiasAdd and transpose with split qkv * fix typo * fix format * fix name * add annotation * fix comment
-
- 28 9月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* remove needless using tensor * remove needless using tensor * resolve conflict * replace tensor using * fix format error * revert needless changing * fix rocm and npu compile error * fix cinn compile error * fix format error * fix mkldnn format error * fix mkldnn format error * fix cinn compile error * fix cinn compile error * fix cinn compile error * resolve conflict
-
- 01 8月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* remove cudaDeviceContext * remove more template * fix rocm compile * remove alias name CUDADeviceContext * fix compile * fix tests * revert changes
-
- 01 7月, 2022 1 次提交
-
-
由 limingshu 提交于
* 2nd part of transpose update * add switch_auto_tune option. * add some changes according to Ci * refine the structure of auto_tune_base. * merge develop changes * reset the switch_set_range and change unittest of transpose auto-tune * change the kernel auto-tune logits
-
- 26 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 09 6月, 2022 1 次提交
-
-
由 crystal 提交于
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 30 5月, 2022 1 次提交
-
-
由 crystal 提交于
-
- 24 5月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* move grad_add * fix unittest bugs * fix compile bugs
-
- 16 5月, 2022 1 次提交
-
-
由 WangXi 提交于
-
- 19 4月, 2022 1 次提交
-
-
由 WangXi 提交于
-
- 11 3月, 2022 1 次提交
-
-
由 Yuang Liu 提交于
-
- 10 3月, 2022 1 次提交
-
-
由 hong 提交于
* move dropout to phi; test=develop * fix xpu, npu compile error; test=develop
-
- 25 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* support cudnn kernel moving * polish cmake rules * add unittest for coverage * remove orig kernel * remove softmax cudnn kernel * fix softmax test failed * fix npu func error * resolve conflict * rename gpu dnn kernels * fix name rule error * fix compile error * update fp16 namespace
-
- 20 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten dir to phi * rename namespace to phi * rename infrt pten dir to phi * resolve conflict * rename pten to phi in cmake * revert all infrt change * change needed files * fix infrt failed * fix inference failed
-
- 18 2月, 2022 1 次提交
-
-
由 Feiyu Chan 提交于
* move blas related files * move lapack related files
-
- 18 1月, 2022 1 次提交
-
-
由 Zhanlue Yang 提交于
* Merged LoDTensor with Tensor,test=allcases * Patched python level LoDTensor * Patched python level LoDTensor * Merge Tensor into DenseTensor * Fixed namespace issues,test=allcases * Fixed merge issues * Fixed inference issues * Fixed NPU test issues * Fixed merge issues
-
- 08 11月, 2021 1 次提交
-
-
由 Li Min 提交于
目前的fused_attention_op不支持attn_mask=None的输入,本PR对此进行了补充,并补充了相应的单测逻辑。
-
- 23 9月, 2021 1 次提交
-
-
由 Li Min 提交于
-