- 17 11月, 2021 12 次提交
-
-
由 YUNSHEN XIE 提交于
* remove test_hapi_hub from mac * fix format error
-
由 Chen Weihang 提交于
* add slice api impl of Tensor * fix test slice error
-
由 zhaocaibei123 提交于
-
由 zmx 提交于
* fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * refactor heter trainer. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop
-
由 danleifeng 提交于
-
由 zhangchunle 提交于
-
由 Leo Chen 提交于
* copy beta pow to same place when skip_update=1 * fix xpu
-
由 zyfncg 提交于
-
由 LiYuRio 提交于
-
由 WangXi 提交于
-
由 Tongxin Bai 提交于
* [Einsum] correct output dimension errors due to single element tensors. * [Einsum] format polish.
-
由 xiongkun 提交于
* add * add BuildOperatorDependences * fix bug * add unittest for write after write * fix merge bug * fix
-
- 16 11月, 2021 16 次提交
-
-
由 Chen Weihang 提交于
-
由 arlesniak 提交于
* Added BF16 Pool2d grad * upstream pulled * fix for CI * fixes after review
-
由 danleifeng 提交于
-
由 Weilong Wu 提交于
-
由 Zeng Jinle 提交于
-
由 Weilong Wu 提交于
-
由 YuanRisheng 提交于
* reshape kernel refactor * fix compile bugs when run ci * support xpu for reshape * fix bugs when run unittest in kunlun ci * fix compile bugs when run kunlun * perfect code according to suggestion * add api and unit test for reshape
-
由 zhangkaihuo 提交于
Add pure fp16 support for fused transformer.
-
由 tianshuo78520a 提交于
-
由 Zeng Jinle 提交于
* make pass ut timeout smaller * increate ut timeout
-
由 Yiqun Liu 提交于
* Make FLAGS_determinstic effective in conv2d forward. * Add call of SetCinnCudnnDeterministic in cinn_launch op.
-
由 Sing_chan 提交于
-
由 jakpiase 提交于
-
由 Li Min 提交于
fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
-
由 石晓伟 提交于
-
由 Yuang Liu 提交于
-
- 15 11月, 2021 12 次提交
-
-
由 Chen Weihang 提交于
* move extension into pten [no-verify] * append tensor methods by ext_tensor [no-verify] * append other tensor methods [no-verify] * ext related files tidy [no-verify] * include relation tidy [no-verify] * add pten tensor test [no-verify] * replace tensor in custom op & compile success * refine tensor constructor for unittest * custom relu jit run success * fix all custom op unittests * add inference cmake adapt [no-verify] * fix failed unittests * fix windows failed unittests * try to fix kunlun and inference failed * fix test_elementwise_api error * try to fix win compile failed * fix kunlun fp16 type error * remove useless haddle error macro * add custom linear op test * fix compile failed & add win symbols * fix non pten kernel cast failed * add dll decl for api * polish several deetails * polish details by review comment * add dll_decl for register
-
由 Leo Chen 提交于
* fix revord_event * refine class Instruction * refine Instruction and InterpreterCore * make instruction and operator_base consistent * support NoNeedBufferVar in stream_analyzer * fix place of event * add vlog before continue
-
由 Chen Weihang 提交于
-
由 baoachun 提交于
* remove input dim check of activation in op_teller * remove input dim check of concat in op_teller * remove input dim check of clip in op_teller * remove input dim check of scale in op_teller * remove input dim check in op_teller * update attr check of slice in op_teller
-
由 Yuang Liu 提交于
-
由 wanghuancoder 提交于
* fix 3 bug, test=develop * refine, test=develop
-
由 feng_shuai 提交于
-
由 arlesniak 提交于
* Added BF16 to mean op * fix for CI * fix for CI * fix for CI
-
由 jiangcheng 提交于
-
由 Weilong Wu 提交于
* Add elementwise_mul triple grad kernel * Removed InplaceInferer and polished code
-
由 zhaocaibei123 提交于
-
由 Zeng Jinle 提交于
* add split_program * make ut faster * increase ut timeout * make result deterministic * add fuse_all_reduce pass * add ut framework, update * fix ut framework * remove useless code * add coverage support * update * fix CI * fix some bugs and fix ci coverage * fix conflict
-