- 14 9月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35688)
-
- 13 9月, 2021 2 次提交
- 08 9月, 2021 1 次提交
-
-
由 will-jl944 提交于
multiply supports bool
-
- 07 9月, 2021 1 次提交
-
-
由 niuliling123 提交于
-
- 06 9月, 2021 1 次提交
-
-
由 wawltor 提交于
* Add the extra flag for the some ops * fix the compile problem in matmul extra
-
- 03 9月, 2021 2 次提交
- 02 9月, 2021 1 次提交
-
-
由 wangxinxin08 提交于
add axis check for elementwise op while the dimension of x is equal to the dimension of tensor (#35340)
-
- 31 8月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
-
- 27 8月, 2021 1 次提交
-
-
由 baoachun 提交于
* add elementwise max grad op for npu * add elementwise max grad op for npu * add elementwise max grad op for npu * add elementwise max grad op for npu * add elementwise max grad op for npu
-
- 26 8月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
[oneDNN] disable caching oneDNN primitives in matmul v2, Reduce grad and elementwise_add grad, expand_v2 (#35132) * - grad caching disabled of matmul_v1 - compilation fix - compilation fix * - reduction removed * - Matmul v2 disabled caching * Draft of further changes * - workaround for reducegrad * - fixes to UT * - fix to compilation * - another fix * - fix
-
- 25 8月, 2021 2 次提交
-
-
由 ronnywang 提交于
-
由 taixiurong 提交于
-
- 22 8月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 16 8月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
* - Added softmax without caching * - Binary is no longer manually cached * - Activation onednn caching removed * - Removed manual caching of activation * - modified UT * - fix * - fix * - fixes to building * - fix * - fix * - fix to UT * - Faulty UT workaround * - approval workaround * - Fixes after review * - compilation fixes * - more lint fixes * - more fixes after review * - fixes after another round of review * - hopefully compilation fix - compilation fix
-
- 12 8月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
This reverts commit 0a5c99e8.
-
- 11 8月, 2021 2 次提交
-
-
由 Jacek Czaja 提交于
* - Added softmax without caching * - Binary is no longer manually cached * - Activation onednn caching removed * - Removed manual caching of activation * - modified UT * - fix * - fix * - fixes to building * - fix * - fix * - fix to UT * - Faulty UT workaround * - approval workaround * - Fixes after review * - compilation fixes * - more lint fixes * - more fixes after review * - fixes after another round of review
-
由 andyjpaddle 提交于
-
- 09 8月, 2021 1 次提交
-
-
由 ronnywang 提交于
* add broadcast supporting for elementwise_add * add broadcast supporting for elementwise_add * add more tests * remove the redundant code * update * fix place error in unittest * remove skip.If
-
- 05 8月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 07 7月, 2021 1 次提交
-
-
由 taixiurong 提交于
-
- 05 7月, 2021 2 次提交
- 24 6月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
* - fix to #33282 * - Increased threshold for elementwise_mul_bf16 grad * -disabled faulty UT * - fix to approval
-
- 23 6月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 12 6月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 04 6月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 02 6月, 2021 2 次提交
- 26 5月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* refine ~npuOpRunner * implement destructor and forbid copy * use reference to avoid copy * use const reference * relax adam precision * fix top_k
-
- 25 5月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* modify complex template for elementwise ops * modify mul, div grad struct * add complex template for CudaShuffleDownSync CudaShuffleXorSync funcs and fix the bug when delete cuda<9000 * fix shuffle func args bug * fix shuffle func args bug * fix shuffle func args bug
-
- 24 5月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 20 5月, 2021 2 次提交
- 14 5月, 2021 1 次提交
-
-
由 limingshu 提交于
-
- 10 5月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 22 4月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 19 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [NPU] support GarbageCollector for npu (#31874) * support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU * [NPU] support npu for memcpy op (#31808) * support npu for memcpy op * add ut * fix ut * fix typo * 【NPU】fix bug of using temp vector (#31963) * fix bug when beta1_pow on cpu (#31995) * [NPU] support npu profiler (#31684) * support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder * fix adam (#32016) * [NPU] enable async copy and add wait before sync operation (#31956) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync * [NPU] Support dataloader on npu place. (#31867) * [NPU] Wait on NPUPlace (#32086) * [NPU] fix cast op (#32121) * fix npu kernel of cast op to handle casting to same dtype * add comments * [NPU] support cann 20.3 (#32044) * fix compile problem on cann 20.3 * fix ut * fix test_mul * fix check_finite_and_scale * fix lookup_table_v2_grad * fix cmake * support print op * [NPU] Support npu save load (#31893) * support save load for NPU * add save load npu unittest * support np.array transform in NPU * fix errors * delete dygraph in unittest * add Wait * fix unittest * fix review comment * fix unittest problem * fix little problem * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196) * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace * refine code * fix NPUDeviceContext in all c++ unittest (#32198) * fix NPUDeviceContext in all c++ unittest * refine log Co-authored-by: Npangyoki <pangyoki@126.com> * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * change TensorFromVector to FillNpuTensorWithConstant * fix ignored api * delete extra unittest * fix little error * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu * change TensorCopySync to TensorCopy * delete useless Wait and add StreamWait * fix npu_stream error * fix check_finite_and_unscale_op_npu TensorCopy * only save stream wait * fix NPUDeviceContext in all c++ unittest * delete wait Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> * delete useless unittest file (#32206) * Fix op test (#32231) * fix conditional block (#32243) * fix adam bug again (#32246) * fix compile * fix ut * fix ut Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com> Co-authored-by: Npangyoki <pangyoki@126.com>
-
- 18 4月, 2021 1 次提交
-
-
由 Zhang Zheng 提交于
-