- 30 11月, 2022 1 次提交
-
-
由 Netpunk 提交于
* migrate transpose_op.cu.h and gpu_utils.h * format code style * fix some problems * format code * reset tranpose_op.cc * test commit * recover transpose_op.h * delete transpose_op.h * adjust header files order in transpose_op.cc
-
- 18 11月, 2022 1 次提交
-
-
由 Wang Xin 提交于
* remove "gpu_primitives.h" in fluid namespace * fix PR-CI-GpuPS fail * fix PR-CI-GpuPS fail
-
- 01 11月, 2022 1 次提交
-
-
由 limingshu 提交于
* first commit * transpose_kernel_optimization * first complishment of transpose op * second commit * refine code logics of tranpose_kernel * refine transpose kernel * first commit * fix DtoD copy bugs for hip * refine code according to the PR advice * change dim to int64_t type. * fix some type error
-
- 28 9月, 2022 2 次提交
-
-
由 Chen Weihang 提交于
* remove needless using tensor * remove needless using tensor * resolve conflict * replace tensor using * fix format error * revert needless changing * fix rocm and npu compile error * fix cinn compile error * fix format error * fix mkldnn format error * fix mkldnn format error * fix cinn compile error * fix cinn compile error * fix cinn compile error * resolve conflict
-
由 limingshu 提交于
-
- 07 9月, 2022 2 次提交
- 01 7月, 2022 1 次提交
-
-
由 limingshu 提交于
* 2nd part of transpose update * add switch_auto_tune option. * add some changes according to Ci * refine the structure of auto_tune_base. * merge develop changes * reset the switch_set_range and change unittest of transpose auto-tune * change the kernel auto-tune logits
-
- 24 6月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* perfect copy * deal with conflict * deal with conflict * fix compile bugs * fix unittest bugs * change code format * deal with conflict * modify code by review * fix ce bugs * fix ce bugs * add lo * perfect code format * deal with conflicts
-
- 07 6月, 2022 1 次提交
-
-
由 limingshu 提交于
Transpose optimization with assitant of Chengdu Supercomputing Center and auto_tune operation (#42704)
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 02 3月, 2022 1 次提交
-
-
由 hong 提交于
* immigrate_transpose_to_pten cpu kernel only; test=develop * fix bug; test=develop * add transpose cuda api * bug fix; * fix bugs * fix bugs; test=develop * bug fix; * move transepose to pten; test=develop * fix bug; test=develop * fix bugs; test=develop * add transpose grad fp16 support; test=develop * fix bug; test=develop * fix npu bug; test=develop * fix nemul = 0 bug; test=develop * add fp16 support; test=develop * fix data type register bug; test=develop * fix transpose bug; test=develop * update transpose * fix transpose bug; test=develop * remove useless code; test=develop * remove useless code; test=develop * fix transpose alias bug; test=develop * polish code; test=develop * resolve confict; test=develop * resolve confilct; test=develop * recover prepared operator; test=develop * fix bug; test=develop * polish code; test=develop * fix bug; test=develop * fix bug; test=develop
-
- 20 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten dir to phi * rename namespace to phi * rename infrt pten dir to phi * resolve conflict * rename pten to phi in cmake * revert all infrt change * change needed files * fix infrt failed * fix inference failed
-
- 19 2月, 2022 1 次提交
-
-
由 Aurelius84 提交于
* Unify paddle/pten::framework::ddim into pten::ddim * fix paddle namespace * compile sucessfully * fix npu src file * fix conflict * fix conflict * fix tensorrt compiler error * fix conflict * fix conflict * fix tesst file conflict * fix conflict * fix mlu file conflict * fix mlu file conflict * fix cinn header file conflict * fix conflict * fix conflict * fix conflict * fix conflict
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 02 9月, 2021 1 次提交
-
-
由 Li Min 提交于
-
- 27 5月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* modify kron OP to complex template types * modify reshape, slice, trace, transpose OPs to complex template types * modify to complex template types in eigen slice files * change to complex template types for pad.cc and pac.cu * format code style
-
- 04 12月, 2020 1 次提交
-
-
由 chentianyu03 提交于
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types * add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest * kron, reshape, transpose support complex types * sum and trace op support complex types * add test case of sum and trace op * fix the bug of imag part of complex not initialized * format file * format code style * kron support type promotion; modify test cases
-
- 20 10月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 16 7月, 2020 1 次提交
-
-
由 Huihuang Zheng 提交于
Add Support for SelectedRows for Transpose OP and Fix a Bug That SelectedRows Cannot be Supported in SimNet (#25536) This PR fixes a bug that SelectedRows cannot be supported in SimNet. The reason of this bug is that dygraph basic_engine didn't copy var's type when the var needs to be accumulated during backward. So when a var is SelectedRows and needs to be accumulated, like SimNet which calls net for two times, the var's type will be changed to default LoDTensor thus bug happens. To fix it, we just also copy the type. Without this PR, the accumulated SelectedRows parameters in dygraph will be changed into LoDTensor. So when we fixed the bug of supporting SelectedRows in SimNet, we found `test_imperative_lod_tensor_to_selected_rows` failed and threw the error that SelectedRows was not supported for Transpose OP. To fix it, too, this PR also added support for SelectedRows for Transpose OP.
-
- 11 7月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* fix softmax_with_cross_entropy cuda kernel overflow bug, test=develop * replace old macro & for condition, test=develop * polish details, test=develop
-
- 02 4月, 2020 1 次提交
-
-
由 liym27 提交于
* Add unittest for transformer prediction in dygraph_to_static. * fix bug in fill_constant api. * Make transpose support size 0. test=develop
-
- 11 2月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Refine code, fix select tile error,test=develop * Refine element type and some comments, test=develop * Refine comments and gpu utils, test=develop * Remove some useless condition * Refine floor and ceil, test=develop * refine for loop. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-