- 03 9月, 2021 3 次提交
- 02 9月, 2021 7 次提交
-
-
由 JingZhuangzhuang 提交于
* [NPU] Support npu kernel for gather_ng op * [NPU] Support npu kernel for gather_nd op * [NPU] Support npu kernel for gather_nd and gather_nd_grad op * update py format error. * modify gather_nd_op_npu * modify gather_nd 910 test * modify gather_nd 910 test Co-authored-by: Nxiaoxiaohehe001 <hiteezsf@163.com>
-
由 xiongkun 提交于
* Add SVD Op and it's GPU and CPU kernel * Remove CUDAPlace in test_svd_op, make the test available in CPU package * modfity the file * fix windows bug/ fix ROCM / fix test timeout * for pass the CIs * improve error report * for code review * some modification to test_svd_op * change python code style * expose the svd interface for document
-
由 zhulei 提交于
* [NPU] Add label_smooth_op * [NPU] Add label_smooth_op
-
由 Yuang Liu 提交于
-
由 wangna11BD 提交于
-
由 JZ-LIANG 提交于
* support shard reader * support shard reader * add parallel mode * update process mesh * add method to compute comm_group * implement dist_embedding forward func * implement dist matmul forward func * implement dist reshape forward func * add transpiler framework * add transpiler forward * implement transpiler forward * implement transpiler backward & update * add process * add unitest * chmod * chmod * chmod * update unitest * add unitest for gpt * remove unused print * rename transpiler --> partitioner * rename transpiler --> partitioner * chmod * chmod * bug fixed * remove amp function * update case for dp mode * update case for dp mode
-
由 Baibaifan 提交于
-
- 01 9月, 2021 15 次提交
-
-
由 jakpiase 提交于
* aded slice FWD FP32 * added tests for slice FWD FP32 * added slice bwd * added bf16 tests * CI fix * CI fix * added reason to skip_if * minor change * temporary fix for failing test * temporary fix * changes after review * CI rerun
-
由 Thunderbrook 提交于
* merge dense * log level * tensor copy sync * format
-
由 ShenLiang 提交于
* add cache for send_recv * add eval_batch for pipeline * add eval batch for pipelineparallel * add style code
-
由 baoachun 提交于
* add strided_slice_grad op for npu
-
由 Leo Chen 提交于
* support setting linewith when printing tensor * fix ut * refine code * update comments * use small precision since windows/linux has different ramdom value * fix typo * adjust parameter order for consistency
-
由 LielinJiang 提交于
* add input and output docs for vision transform
-
由 JZ-LIANG 提交于
-
由 0x45f 提交于
* modify dy2stat error message in compile time * fix variable name
-
由 WeiXin 提交于
* fix bug:When axes in paddle.sile is a tuple, an error occurs. * polish code.
-
由 QingshuChen 提交于
* support KL label smooth * update UT for KL label_smooth
-
由 cc 提交于
-
由 Roc 提交于
-
由 Aurelius84 提交于
* Support append method and initialized value for List in ControlFlow * polish error msg and en doc * fix code style
-
由 zyfncg 提交于
* Support getitem by Bool index * delete some debug info of bool index * support the case that the shape of bool index is different from indexed tensor * support setitem by bool index * add the unittest for throwing exception * merge conflict * add check for int tensor when index is bool
-
由 zhaoyingli 提交于
-
- 31 8月, 2021 10 次提交
-
-
由 Aurelius84 提交于
* polish code * fix unittest on windows * refine pybind interface * support statistic MemSize of AllocatorPool * Replace mutex into atomic
-
由 Feng Xing 提交于
This PR adds fused transformer python related files. It defines interface of fused transformer. Fused transformer implements an optimized version of transformer layer (in python/paddle/nn/layer/transformer.py). In this PR, four layers (functions) are defined: (1) FusedMultiHeadAttention: multi-head attention layer (2) FusedFeedForward: feed forward layer (3) FusedTransformerEncoderLayer: transformer encoder layer (4) FusedTransformer: transformer layer
-
由 Aurelius84 提交于
* Add model for ResNet50 for Dy2stat AMP training * fix timeout * fix dataloader
-
由 Qi Li 提交于
* [NPU] fix cmake for ascend ci, test=develop * update paddle_build.sh scripts, test=allcase
-
由 Shang Zhizhou 提交于
* Revert "Revert "Add copy from tensor (#34406)" (#35173)" This reverts commit 32c1ec42. * add template instantiation
-
由 Zhanlue Yang 提交于
[Background] Expansion in code size can be irreversible in the long run, leading to huge release packages which not only hampers user experience but also exceeds a hard limit of pypi. In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU arches supported. This PR aims to prune this NV_FATBIN. [Solution] In the new release strategy, two types of whl packages will be involved: Cubin PIP package: PIP package maintains a smaller window for GPU arches support, containing sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches JIT release package: This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60, compute_70, compute_75, compute_80, with best performance and GPU arches coverage. However, it takes around 10 min to install due to the JIT compilation. [How to use] The new release strategy is disabled by default. To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL
-
由 Wilber 提交于
-
由 XGZhang 提交于
-
由 Aganlengzi 提交于
-
由 Aganlengzi 提交于
-
- 30 8月, 2021 3 次提交
-
-
由 xiaoxiaohehe001 提交于
* add_op_unittest
-
由 zhulei 提交于
* [NPU] Add log_loss op * [NPU] Add log_loss op * [NPU] Add log_loss op
-
由 xiongkun 提交于
* tmp * Tile - Assign - Crop * Finish the set value npu kernel and test case in npu * improve the error message * Modify according to zhangliujie * code review
-
- 29 8月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
-
- 27 8月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
-