- 08 12月, 2021 1 次提交
-
-
由 crystal 提交于
* add boardcast_sub * add boardcast_sub
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 29 11月, 2021 2 次提交
-
-
由 chentianyu03 提交于
* add pten reduce kernel * add reduce_sum kernel * update attribute args and order * make out dtype undefined * fix empty input error * merge develop branch * rename sum as reduce function * rename sum as reduce function * fix reducekernelImpl args error * add reduce cuda kernel * modify dims type to const & * remove unsed log * fix reduce_all out eigen function error * remove unused codes * add the missing sum api define and testcase * merge develop branch * fix sum test axis value error * replace pten mean kernel with reduce_mean * revcover meam cuda to original implement
-
由 piotrekobiIntel 提交于
-
- 27 11月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] reorganization for device API abstraction * [NPU] delete old files * [NPU] fix npu_collective_helper * [NPU] fix collective_helper * [NPU] fix ut * [NPU] mod memory allocation and hccl_helper * [NPU] fix place_type * [NPU] split enfoce.h * move acl* call into npu_info * merge conflict * fix merge * merge conflict * merge conflict
-
- 23 11月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [XPU] Reorganize xpu device codes in platform, test=develop * fix xpu_header.h, test=develop
-
- 17 11月, 2021 1 次提交
-
-
由 niuliling123 提交于
* Modify reduce_op.op.h for xpu2 with kernel primitive api
-
- 28 10月, 2021 1 次提交
-
-
由 ronnywang 提交于
* add TypeAdapter method for npu_op_runner * add int64 supporting for elementwise_mul and reduce_sum * add int64 supporting and UT for expand_v2, scale and reduce_max * fix bug
-
- 26 10月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [NPU] fix argsort op, test=develop * remove debug files, test=develop * fix typo, test=develop * address review comments, test=develop
-
- 21 10月, 2021 1 次提交
-
-
由 niuliling123 提交于
* Update the implement of reduceAnyKernel according to kernel primitive api * Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1
-
- 18 10月, 2021 1 次提交
-
-
由 taixiurong 提交于
[XPU AMP] 1. xpu support gradient acc 2. xpu support create tensor in dygraph 3. xpu support update weight params in amp (#36439)
-
- 28 9月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
-
- 18 9月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
* - REorder disabling caching * - compilation fix * - another compilation fix * - another compilation fix * - compilation fix * - Fix * - yet another compilation fix * - suppresingly another compilation fix * - lint * - fix after review * - fix
-
- 08 9月, 2021 2 次提交
-
-
由 niuliling123 提交于
-
由 Zhong Hui 提交于
-
- 26 8月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
[oneDNN] disable caching oneDNN primitives in matmul v2, Reduce grad and elementwise_add grad, expand_v2 (#35132) * - grad caching disabled of matmul_v1 - compilation fix - compilation fix * - reduction removed * - Matmul v2 disabled caching * Draft of further changes * - workaround for reducegrad * - fixes to UT * - fix to compilation * - another fix * - fix
-
- 17 8月, 2021 1 次提交
-
-
由 niuliling123 提交于
fix a bug in nlp: text_matching/sentence_transformers when last dim is 1 and reduce mid dim (#34941)
-
- 11 8月, 2021 2 次提交
-
-
由 ronnywang 提交于
* add reduce_mean_op_npu and test * remove skip.If * update
-
由 niuliling123 提交于
-
- 06 8月, 2021 1 次提交
-
-
由 furnace 提交于
* [NPU] add reduce_prod * [NPU] delete check_dygraph=False * [NPU] delete skipIf * add attrs support or check * [NPU] delete extra codes for test_reduce_max_op_npu * [NPU] add attr out_dtype
-
- 05 8月, 2021 2 次提交
-
-
由 hong 提交于
* first test version * add test exec; * add data transfer; test=develop * add new exec head; * add memcpy; test=develop * add python fetch * add new test * add graph node; test=develop * remove useless new executor test; test=develop * remove gperf dependency; test=develop * fix compile bugs; test=develop * remove useless code; test=develop * remove useless code; test=develop * add uni test; test=develop * polish code; test=develop * polish code; test=develop * add interpreter cmakefile; test=develop * remove useless code; test=develop
-
由 limingshu 提交于
-
- 03 8月, 2021 1 次提交
-
-
由 QingshuChen 提交于
* support Kunlun2 * support KL2 * support KL2
-
- 02 8月, 2021 2 次提交
-
-
由 Zhang Zheng 提交于
-
由 furnace 提交于
* [NPU] add reduce_max * [NPU] delete skipIf * [NPU] add atrrs support or check * [NPU] add attr out_dtype * [NPU] delete debug codes
-
- 30 7月, 2021 1 次提交
-
-
由 jakpiase 提交于
* added expand_v2 bf16/fp32 kernel * minor change * CI fix * added missing test file * added formatting * reduced binary size * CI fix
-
- 12 7月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 05 7月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 02 7月, 2021 1 次提交
-
-
由 niuliling123 提交于
-
- 22 6月, 2021 1 次提交
-
-
由 niuliling123 提交于
-
- 15 6月, 2021 1 次提交
-
-
由 jiangcheng 提交于
* add reduce_sum_op by add self-kernel * set all ReduceKernel MPType for accuracy * add float16 test script which input is integer number * solve reduce sum float16 check_grad problem * solve conflict and change test script for CI * change kernel register for CI * remove all useless template
-
- 28 5月, 2021 2 次提交
-
-
由 chentianyu03 提交于
* modify to complex template types for fill_constant op * modify to complex template types for py_layer, strided_slice and reduce_sum_op.part
-
由 chentianyu03 提交于
-
- 26 5月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* refine ~npuOpRunner * implement destructor and forbid copy * use reference to avoid copy * use const reference * relax adam precision * fix top_k
-
- 25 5月, 2021 1 次提交
-
-
由 niuliling123 提交于
-
- 20 5月, 2021 1 次提交
-
-
由 TTerror 提交于
* fix gather op and add logsumexp op on kunlun * update xpu depence * update tests and fix elementwise_add
-
- 18 5月, 2021 1 次提交
-
-
由 liuyuhui 提交于
-
- 30 4月, 2021 1 次提交
-
-
由 jakpiase 提交于
-
- 21 4月, 2021 1 次提交
-
-
由 jakpiase 提交于
-
- 19 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [NPU] support GarbageCollector for npu (#31874) * support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU * [NPU] support npu for memcpy op (#31808) * support npu for memcpy op * add ut * fix ut * fix typo * 【NPU】fix bug of using temp vector (#31963) * fix bug when beta1_pow on cpu (#31995) * [NPU] support npu profiler (#31684) * support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder * fix adam (#32016) * [NPU] enable async copy and add wait before sync operation (#31956) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync * [NPU] Support dataloader on npu place. (#31867) * [NPU] Wait on NPUPlace (#32086) * [NPU] fix cast op (#32121) * fix npu kernel of cast op to handle casting to same dtype * add comments * [NPU] support cann 20.3 (#32044) * fix compile problem on cann 20.3 * fix ut * fix test_mul * fix check_finite_and_scale * fix lookup_table_v2_grad * fix cmake * support print op * [NPU] Support npu save load (#31893) * support save load for NPU * add save load npu unittest * support np.array transform in NPU * fix errors * delete dygraph in unittest * add Wait * fix unittest * fix review comment * fix unittest problem * fix little problem * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196) * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace * refine code * fix NPUDeviceContext in all c++ unittest (#32198) * fix NPUDeviceContext in all c++ unittest * refine log Co-authored-by: Npangyoki <pangyoki@126.com> * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * change TensorFromVector to FillNpuTensorWithConstant * fix ignored api * delete extra unittest * fix little error * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu * change TensorCopySync to TensorCopy * delete useless Wait and add StreamWait * fix npu_stream error * fix check_finite_and_unscale_op_npu TensorCopy * only save stream wait * fix NPUDeviceContext in all c++ unittest * delete wait Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> * delete useless unittest file (#32206) * Fix op test (#32231) * fix conditional block (#32243) * fix adam bug again (#32246) * fix compile * fix ut * fix ut Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com> Co-authored-by: Npangyoki <pangyoki@126.com>
-