# Anakin GPU 性能测试 ## 环境: > CPU: `12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz` > GPU: `Tesla P4` > cuDNN: `v7` ## anakin 对比对象: **`Anakin`** 将与高性能的推理引擎 **`NVIDIA TensorRT 3`** 进行比较 ## Benchmark Model > 注意在性能测试之前,请先将测试model通过 `External Converter` 工具转换为Anakin model > 对这些model,本文在GPU上进行单线程单GPU卡的性能测试。 - [Vgg16](#1) *caffe model 可以在[这儿](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)下载* - [Yolo](#2) *caffe model 可以在[这儿](https://github.com/hojel/caffe-yolo-model)下载* - [Resnet50](#3) *caffe model 可以在[这儿](https://github.com/KaimingHe/deep-residual-networks#models)下载* - [Resnet101](#4) *caffe model 可以在[这儿](https://github.com/KaimingHe/deep-residual-networks#models)下载* - [Mobilenet v1](#5) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载* - [Mobilenet v2](#6) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载* - [RNN](#7) *暂不支持* ### VGG16 - Latency (`ms`) of different batch | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 8.53945 | 8.18737 | | 2 | 14.2269 | 13.8976 | | 4 | 24.2803 | 21.7976 | | 8 | 45.6003 | 40.319 | - GPU Memory Used (`MB`) | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 1053.88 | 762.73 | | 2 | 1055.71 | 762.41 | | 4 | 1003.22 | 832.75 | | 8 | 1108.77 | 926.9 | ### Yolo - Latency (`ms`) of different batch | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 8.41606| 7.07977 | | 2 | 16.6588| 15.2216 | | 4 | 31.9955| 30.5102 | | 8 | 66.1107 | 64.3658 | - GPU Memory Used (`MB`) | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 1054.71 | 299.8 | | 2 | 951.51 | 347.47 | | 4 | 846.9 | 438.47 | | 8 | 1042.31 | 515.15 | ### Resnet50 - Latency (`ms`) of different batch | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 4.10063 | 3.33845 | | 2 | 6.10941 | 5.54814 | | 4 | 9.90233 | 10.2763 | | 8 | 17.3287 | 20.0783 | - GPU Memory Used (`MB`) | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 1059.15 | 299.86 | | 2 | 1077.8 | 340.78 | | 4 | 903.04 | 395 | | 8 | 832.53 | 508.86 | ### Resnet101 - Latency (`ms`) of different batch | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 7.29828 | 5.672 | | 2 | 11.2037 | 9.42352 | | 4 | 17.9306 | 18.0936 | | 8 | 31.4804 | 35.7439 | - GPU Memory Used (`MB)` | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 1161.94 | 429.22 | | 2 | 1190.92 | 531.92 | | 4 | 994.11 | 549.7 | | 8 | 945.47 | 653.06 | ### MobileNet V1 - Latency (`ms`) of different batch | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 1.52692 | 1.39282 | | 2 | 1.98091 | 2.05788 | | 4 | 3.2705 | 4.03476 | | 8 | 5.15652 | 7.06651 | - GPU Memory Used (`MB`) | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 1144.35 | 99.6 | | 2 | 1160.03 | 199.75 | | 4 | 1098 | 184.33 | | 8 | 990.71 | 232.11 | ### MobileNet V2 - Latency (`ms`) of different batch | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 1.95961 | 1.78249 | | 2 | 2.8709 | 3.01144 | | 4 | 4.46131 | 5.43946 | | 8 | 7.161 | 10.2081 | - GPU Memory Used (`MB`) | BatchSize | TensorRT | Anakin | | --- | --- | --- | | 1 | 1154.69 | 195.25 | | 2 | 1187.25 | 227.6 | | 4 | 1053 | 241.75 | | 8 | 1062.48 | 352.18 | ## How to run those Benchmark models 1. 首先, 使用[External Converter](./convert_paddle_to_anakin.html)对caffe model 进行转换 2. 然后跳转至 *source_root/benchmark/CNN* 目录下,使用 'mkdir ./models'创建存放模型的目录,并将转换好的Anakin模型放在该目录下 3. 运行脚本 `sh run.sh`,运行结束后,该模型的运行时间将会显示到终端上 4. 如果你想获取每层OP的运行时间,你只用将 CMakeLists.txt 中的`ENABLE_OP_TIMER` 设置为 `YES` 即可