- 10 2月, 2022 1 次提交
-
-
由 zhangbo9674 提交于
* add squeeze unsqueeze stack * add unittest * add cpu kernel
-
- 17 1月, 2022 1 次提交
-
-
由 Wilber 提交于
* add pten::Place data structure. * update ci problem * fix ci problem * update * using platform::Place=pten::Place * remove BOOST_GET_CONST for CPUPlace and GPUPlace * compile pass 25%. * compile pass 45% * compile pass 60% * remove boost_get for xpu npu mlu and ipu * compile pass on cpu and gpu. * fix compile problem * fix compile error. * update * fix ci problem * update * ci approve * fix ci problem * fix ci eager test problem * remove BOOST_GET_CONST * fix npu compile
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 13 5月, 2021 1 次提交
-
-
由 Jiawei Wang 提交于
-
- 20 10月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 11 5月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop
-
- 08 1月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 30 1月, 2019 1 次提交
-
-
由 Yibing Liu 提交于
* Some improvements to support bert mixed precision training test=develop * Revert the cast in layer_norm test=develop
-
- 12 11月, 2018 1 次提交
-
-
由 Yibing Liu 提交于
* Add int type support for stack_op * Improve gather op to support index with shape N x 1 test=develop * Fix stack_op kernel's registry test=develop
-