diff --git a/docs/source_en/benchmark.md b/docs/source_en/benchmark.md index 446ddac3bbf78f04ee7de7c4ac58c227a8a679b9..bd1b0becf3bb60a93eb5562eed374e6b22f55063 100644 --- a/docs/source_en/benchmark.md +++ b/docs/source_en/benchmark.md @@ -27,3 +27,25 @@ For details about the MindSpore pre-trained model, see [Model Zoo](https://gitee 1. The preceding performance is obtained based on ModelArts, the HUAWEI CLOUD AI development platform. The network contains 24 hidden layers, the sequence length is 128 tokens, and the vocabulary contains 21128 tokens. 2. For details about other open source frameworks, see [BERT For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT). + +### Wide & Deep (data parallel) + +| Network | Network Type | Dataset | MindSpore Version | Resource                 | Precision | Batch Size | Throughput | Speedup | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910
CPU:24 Cores | Mixed | 16000 | 796892 samples/sec | - | +| | | | | Ascend: 8 * Ascend 910
CPU:192 Cores | Mixed | 16000*8 | 4872849 samples/sec | 0.76 | + +1. The preceding performance is obtained based on Atlas 800, and the model is data parallel. +2. For details about other open source frameworks, see [Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。 + +### Wide & Deep (Host-Device model parallel) + +| Network | Network Type | Dataset | MindSpore Version | Resource                 | Precision | Batch Size | Throughput | Speedup | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910
CPU:24 Cores | Mixed | 1000 | 68715 samples/sec | - | +| | | | | Ascend: 8 * Ascend 910
CPU:192 Cores | Mixed | 8000*8 | 283830 samples/sec | 0.51 | +| | | | | Ascend: 16 * Ascend 910
CPU:384 Cores | Mixed | 8000*16 | 377848 samples/sec | 0.34 | +| | | | | Ascend: 32 * Ascend 910
CPU:768 Cores | Mixed | 8000*32 | 433423 samples/sec | 0.20 | + +1. The preceding performance is obtained based on Atlas 800, and the model is model parallel. +2. For details about other open source frameworks, see [Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。 diff --git a/docs/source_zh_cn/benchmark.md b/docs/source_zh_cn/benchmark.md index 2da80e81d965bf69f532e01dc62aadc14ff017d5..1f4833f0c63fc274323bf0d5483d6690b38cf1b4 100644 --- a/docs/source_zh_cn/benchmark.md +++ b/docs/source_zh_cn/benchmark.md @@ -26,3 +26,25 @@ 1. 以上数据基于华为云AI开发平台ModelArts测试获得,其中网络包含24个隐藏层,句长为128个token,字典表包含21128个token。 2. 业界其他开源框架数据可参考:[BERT For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT)。 + +### Wide & Deep (数据并行) + +| Network | Network Type | Dataset | MindSpore Version | Resource                 | Precision | Batch Size | Throughput | Speedup | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910
CPU:24 Cores | Mixed | 16000 | 796892 samples/sec | - | +| | | | | Ascend: 8 * Ascend 910
CPU:192 Cores | Mixed | 16000*8 | 4872849 samples/sec | 0.76 | + +1. 以上数据基于Atlas 800测试获得,且网络模型为数据并行。 +2. 业界其他开源框架数据可参考:[Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。 + +### Wide & Deep (Host-Device混合计算模型并行) + +| Network | Network Type | Dataset | MindSpore Version | Resource                 | Precision | Batch Size | Throughput | Speedup | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910
CPU:24 Cores | Mixed | 8000 | 68715 samples/sec | - | +| | | | | Ascend: 8 * Ascend 910
CPU:192 Cores | Mixed | 8000*8 | 283830 samples/sec | 0.51 | +| | | | | Ascend: 16 * Ascend 910
CPU:384 Cores | Mixed | 8000*16 | 377848 samples/sec | 0.34 | +| | | | | Ascend: 32 * Ascend 910
CPU:768 Cores | Mixed | 8000*32 | 433423 samples/sec | 0.20 | + +1. 以上数据基于Atlas 800测试获得,且网络模型为模型并行。 +2. 业界其他开源框架数据可参考:[Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。