!523 add wide and deep benchmark

Merge pull request !523 from lirongzhen1/master

!523 add wide and deep benchmark
Merge pull request !523 from lirongzhen1/master
df24e04c · mindspore-ci-bot · Gitee · 14eeb129 · 3f44b193 · df24e04c
隐藏空白更改
内联并排

Showing with 44 addition and 0 deletion

docs/source_en/benchmark.md docs/source_en/benchmark.md +22 -0

docs/source_zh_cn/benchmark.md docs/source_zh_cn/benchmark.md +22 -0

未找到文件。
--- a/docs/source_en/benchmark.md
+++ b/docs/source_en/benchmark.md
@@ -27,3 +27,25 @@ For details about the MindSpore pre-trained model, see [Model Zoo](https://gitee

 1. The preceding performance is obtained based on ModelArts, the HUAWEI CLOUD AI development platform. The network contains 24 hidden layers, the sequence length is 128 tokens, and the vocabulary contains 21128 tokens.   
 2. For details about other open source frameworks, see [BERT For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT).
+
+### Wide & Deep (data parallel)
+
+| Network | Network Type | Dataset | MindSpore Version | Resource &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | Precision | Batch Size | Throughput |  Speedup |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910 </br> CPU：24 Cores | Mixed | 16000 | 796892 samples/sec | - |
+|  |  |  |  | Ascend: 8 * Ascend 910 </br> CPU：192 Cores | Mixed | 16000*8 | 4872849 samples/sec | 0.76 |
+
+1. The preceding performance is obtained based on Atlas 800, and the model is data parallel.
+2. For details about other open source frameworks, see [Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。
+
+### Wide & Deep (Host-Device model parallel)
+
+| Network | Network Type | Dataset | MindSpore Version | Resource &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | Precision | Batch Size | Throughput |  Speedup |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910 </br> CPU：24 Cores | Mixed | 1000 | 68715 samples/sec | - |
+|  |  |  |  | Ascend: 8 * Ascend 910 </br> CPU：192 Cores | Mixed | 8000*8 | 283830 samples/sec | 0.51 |
+|  |  |  |  | Ascend: 16 * Ascend 910 </br> CPU：384 Cores | Mixed | 8000*16 | 377848 samples/sec | 0.34 |
+|  |  |  |  | Ascend: 32 * Ascend 910 </br> CPU：768 Cores | Mixed | 8000*32 | 433423 samples/sec | 0.20 |
+
+1. The preceding performance is obtained based on Atlas 800, and the model is model parallel.
+2. For details about other open source frameworks, see [Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。
--- a/docs/source_zh_cn/benchmark.md
+++ b/docs/source_zh_cn/benchmark.md
@@ -26,3 +26,25 @@

 1. 以上数据基于华为云AI开发平台ModelArts测试获得，其中网络包含24个隐藏层，句长为128个token，字典表包含21128个token。  
 2. 业界其他开源框架数据可参考：[BERT For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT)。
+
+### Wide & Deep (数据并行)
+
+| Network | Network Type | Dataset | MindSpore Version | Resource &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | Precision | Batch Size | Throughput |  Speedup |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910 </br> CPU：24 Cores | Mixed | 16000 | 796892 samples/sec | - |
+|  |  |  |  | Ascend: 8 * Ascend 910 </br> CPU：192 Cores | Mixed | 16000*8 | 4872849 samples/sec | 0.76 |
+
+1. 以上数据基于Atlas 800测试获得，且网络模型为数据并行。
+2. 业界其他开源框架数据可参考：[Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。
+
+### Wide & Deep (Host-Device混合计算模型并行)
+
+| Network | Network Type | Dataset | MindSpore Version | Resource &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | Precision | Batch Size | Throughput |  Speedup |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| Wide & Deep | Recommend | Criteo | 0.6.0 | Ascend: 1 * Ascend 910 </br> CPU：24 Cores | Mixed | 8000 | 68715 samples/sec | - |
+|  |  |  |  | Ascend: 8 * Ascend 910 </br> CPU：192 Cores | Mixed | 8000*8 | 283830 samples/sec | 0.51 |
+|  |  |  |  | Ascend: 16 * Ascend 910 </br> CPU：384 Cores | Mixed | 8000*16 | 377848 samples/sec | 0.34 |
+|  |  |  |  | Ascend: 32 * Ascend 910 </br> CPU：768 Cores | Mixed | 8000*32 | 433423 samples/sec | 0.20 |
+
+1. 以上数据基于Atlas 800测试获得，且网络模型为模型并行。
+2. 业界其他开源框架数据可参考：[Wide & Deep For TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Recommendation/WideAndDeep)。