Update cnn benchmark

8b1af84e · hjchen2 · f9ac5f3c · 8b1af84e
隐藏空白更改
内联并排

Showing with 18 addition and 0 deletion

cnn_benchmark/benchmark.md cnn_benchmark/benchmark.md +18 -0

未找到文件。
--- a/cnn_benchmark/benchmark.md
+++ b/cnn_benchmark/benchmark.md
@@ -40,6 +40,16 @@ commit: 726c3a12b9d97b57f9fb7e3d212b63564e20e755
  2. oneflow tensorrt没有支持batch normalization，导致整图被分割成多个tensorrt子图。如果oneflow tensorrt支持batch normalization，将减少4ms左右。但同时发现支持了batch normalization后，batch之间的空隙从几乎0ms增加到了5.32ms，导致即使支持了batch normalization后，吞吐并没有明显的变化。
  3. 总结：如果对1、2优化后，理论上一个batch的耗时能达到 (10 - 4 =) 6ms。

+- Update 2019.12.24：所有source op都通过device tick代理到cpu tick，减少event次数。
+
+>| -            | Oneflow(fp32) | Oneflow(fp16) | TensorRT(fp32) | TensorRT(fp16) | TensorRT(int8) | TensorRT official(fp32) | TensorRT official(fp16) | TensorRT official(int8) |
+>| ------------ | ------------- | ------------- | -------------- | -------------- | -------------- | ----------------------- | ----------------------- | ----------------------- |
+>| alexnet      | 2692          | 2022          | 2679           | 4060           |                |                         |                         |                         |
+>| vgg16        | 398           | 346           | 425            | 1200           |                | 470                     | 1629                    |                         |
+>| resnet50     | 735           | 570           | 945            | 2120           |                | 1025                    | 2500                    |                         |
+>| inception-v3 | 538           | 510           | 572            | 1356           |                |                         |                         |                         |
+
+
 2. batch size为50

 >| -            | Oneflow(fp32) | Oneflow(fp16) | TensorRT(fp32) | TensorRT(fp16) | TensorRT(int8) | TensorRT official(fp32) | TensorRT official(fp16) | TensorRT official(int8) |
@@ -49,6 +59,14 @@ commit: 726c3a12b9d97b57f9fb7e3d212b63564e20e755
 >| resnet50     | 810           | 619           | 830            | 1285           |                | 1302                    | 3843                    |                         |
 >| inception-v3 | 544           | 531           | 717            | 1839           |                |                         |                         |                         |

+- Update 2019.12.24：所有source op都通过device tick代理到cpu tick，减少event次数。
+
+>| -            | Oneflow(fp32) | Oneflow(fp16) | TensorRT(fp32) | TensorRT(fp16) | TensorRT(int8) | TensorRT official(fp32) | TensorRT official(fp16) | TensorRT official(int8) |
+>| ------------ | ------------- | ------------- | -------------- | -------------- | -------------- | ----------------------- | ----------------------- | ----------------------- |
+>| alexnet      | 6568          | 3341          | 5030           | 9076           |                |                         |                         |                         |
+>| vgg16        | 528           | 498           | 459            | 1638           |                | 498                     | 1907                    |                         |
+>| resnet50     | 888           | 685           | 1262           | 3989           |                | 1302                    | 3843                    |                         |
+>| inception-v3 | 698           | 589           | 797            | 2363           |                |                         |                         |                         |

 #### Precision