- 31 8月, 2021 13 次提交
-
-
由 zhouweiwei2014 提交于
-
由 Zhanlue Yang 提交于
[Background] Expansion in code size can be irreversible in the long run, leading to huge release packages which not only hampers user experience but also exceeds a hard limit of pypi. In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU arches supported. This PR aims to prune this NV_FATBIN. [Solution] In the new release strategy, two types of whl packages will be involved: Cubin PIP package: PIP package maintains a smaller window for GPU arches support, containing sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches JIT release package: This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60, compute_70, compute_75, compute_80, with best performance and GPU arches coverage. However, it takes around 10 min to install due to the JIT compilation. [How to use] The new release strategy is disabled by default. To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL
-
由 tianshuo78520a 提交于
* notest;test=cpu * test * test=document_fix
-
由 Wilber 提交于
-
由 wenbin 提交于
* add trt error information. * rerun ci
-
由 wuhuanzhou 提交于
* fix CI skip cc test error, test=develop * remove test code, test=develop
-
由 Huihuang Zheng 提交于
As the title, see details at the PR description.
-
由 zhouweiwei2014 提交于
-
由 王明冬 提交于
-
由 XGZhang 提交于
-
由 Aganlengzi 提交于
-
由 Aganlengzi 提交于
-
由 Peihan 提交于
* change exit code and summary style * disable test_ernie_text_cls on windows
-
- 30 8月, 2021 10 次提交
-
-
由 xiaoxiaohehe001 提交于
* add_op_unittest
-
由 zhulei 提交于
* [NPU] Add log_loss op * [NPU] Add log_loss op * [NPU] Add log_loss op
-
由 chentianyu03 提交于
-
由 Jacek Czaja 提交于
-
由 zhulei 提交于
* [Op Def] Add extra def of linear_interp & linear_interp_v2 * [Op Def] Add extra def of linear_interp & linear_interp_v2 & addmm
-
由 ceci3 提交于
* update ernie int8
-
由 tianshuo78520a 提交于
-
由 xiongkun 提交于
* tmp * Tile - Assign - Crop * Finish the set value npu kernel and test case in npu * improve the error message * Modify according to zhangliujie * code review
-
由 tianshuo78520a 提交于
* notest;test=cpu_gpu * notest;test=cpu_gpu * notest;test=cpu_gpu * notest;test=cpu_gpu * notest;test=cpu_gpu * notest;test=cpu_gpu * notest;test=cpu_gpu * fix * fix
-
由 Aurelius84 提交于
* Abstract GenerateDeviceEventFlag to shield platforms * Remove get_cuda_flags
-
- 29 8月, 2021 1 次提交
-
-
由 Guoxia Wang 提交于
-
- 27 8月, 2021 16 次提交
-
-
由 Guoxia Wang 提交于
-
由 JYChen 提交于
-
由 xiegegege 提交于
-
由 xiaoting 提交于
* add maxunppol2d op, test=develop * fix typo, test=develop * fix unpool unitest, test=develop * fix unpool code-example, test=develop * fix for unpool_op_unittest,test=develop * fix example code, test=develop * add noqa:F401, test=develop * fix converage, test=develop * fix unitest for unpool, test=develop * rename unpool2d to unpool, test=develop * rename unpool2d to unpool, test=develop
-
由 Guoxia Wang 提交于
* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter
-
由 WangXi 提交于
-
由 joanna.wozna.intel 提交于
* Add calculation for gru op * Correct the types * Remove mkldnn only * Correct mkldnn ifdef * Remove mkldnn ifdef * Separate mkldnn quantizer test * Correct Windows test * Check different cmake fix * Revert cmake change * Cmake change 2 * Cmake change 3
-
由 Aurelius84 提交于
* add CPUDeiveEvent * Polish DeviceEvent code * Add DEVICE_EVENT_LIBS
-
由 wanghuancoder 提交于
* fix count_api_without_core_ops, test=develop * fix count_api_without_core_ops, test=develop * refine, test=develop * remove test code, test=develop * remove test, test=develop * modify check_api_approvals.sh, test=develop
-
由 zhupengyang 提交于
-
由 王明冬 提交于
-
由 HydrogenSulfate 提交于
-
由 HydrogenSulfate 提交于
-
由 HydrogenSulfate 提交于
-
由 HydrogenSulfate 提交于
-
由 HydrogenSulfate 提交于
-