Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #14366

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 11月 12, 2018 by saxon_zh@saxon_zhGuest

ResNet50 latency test based on CAPI is very slow

Created by: chuanqi129

Hi, I used the below cmake command to build paddle: cmake3 .. -DWITH_DOC=OFF -DWITH_GPU=OFF -DWITH_DISTRIBUTE=OFF -DWITH_MKLDNN=ON -DWITH_GOLANG=OFF -DWITH_SWIG_PY=ON -DWITH_TESTING=ON -DWITH_INFERENCE_API_TEST=ON -DCMAKE_BUILD_TYPE=Release -DWITH_TIMER=OFF -DWITH_PROFILER=OFF -DWITH_FLUID_ONLY=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON and run below command for ResNet50 latency test based on CAPI: FLAGS_use_mkldnn=true OMP_NUM_THREADS=40 KMP_AFFINITY=compact,granularity=fine taskset -c 0-40 numactl -l ./paddle/fluid/inference/tests/api/test_analyzer_resnet50 --infer_model=third_party/inference_demo/resnet50/model --repeat=100 --gtest_filter=Analyzer_resnet50.profile_mkldnn --batch_size=1 --num_threads=40

But the finally results are very slow like below log information:

  • Running Analysis pass [DFG to fluid] W1112 14:13:24.766306 393684 data_flow_graph_to_fluid_pass.cc:70] parameter changes in the scope takes effect
  • Running Analysis pass [dfg-to-fluid-debuger-pass] sh: dot: command not found I1112 14:13:24.786814 393684 analysis_predictor.cc:257] == optimize end == I1112 15:25:30.706657 393769 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 37, latency: 43259.1ms, fps: 0.0231165 ====== I1112 15:25:31.645990 393766 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 34, latency: 43268.4ms, fps: 0.0231115 ====== I1112 15:25:33.218940 393756 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 24, latency: 43284.2ms, fps: 0.0231031 ====== I1112 15:25:42.488104 393737 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 5, latency: 43376.9ms, fps: 0.0230537 ====== I1112 15:25:46.173593 393753 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 21, latency: 43413.7ms, fps: 0.0230342 ====== I1112 15:25:49.869393 393744 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 12, latency: 43450.7ms, fps: 0.0230146 ====== I1112 15:25:52.919145 393751 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 19, latency: 43481.2ms, fps: 0.0229984 ====== I1112 15:25:53.417090 393745 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 13, latency: 43486.2ms, fps: 0.0229958 ====== I1112 15:25:54.113740 393761 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 29, latency: 43493.1ms, fps: 0.0229921 ====== I1112 15:25:54.355783 393768 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 36, latency: 43495.5ms, fps: 0.0229909 ====== I1112 15:25:56.391198 393767 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 35, latency: 43515.9ms, fps: 0.0229801 ====== I1112 15:25:57.218600 393743 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 11, latency: 43524.2ms, fps: 0.0229757 ====== I1112 15:25:57.219207 393738 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 6, latency: 43524.2ms, fps: 0.0229757 ====== I1112 15:25:57.253047 393742 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 10, latency: 43524.5ms, fps: 0.0229755 ====== I1112 15:25:57.371704 393760 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 28, latency: 43525.7ms, fps: 0.0229749 ====== I1112 15:25:57.457932 393763 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 31, latency: 43526.6ms, fps: 0.0229745 ====== I1112 15:25:57.460876 393733 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 1, latency: 43526.6ms, fps: 0.0229744 ====== I1112 15:25:57.491451 393764 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 32, latency: 43526.9ms, fps: 0.0229743 ====== I1112 15:25:57.588882 393740 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 8, latency: 43527.8ms, fps: 0.0229738 ====== I1112 15:25:57.596644 393770 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 38, latency: 43527.8ms, fps: 0.0229738 ====== I1112 15:25:57.621752 393752 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 20, latency: 43528.1ms, fps: 0.0229737 ====== I1112 15:25:57.621911 393754 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 22, latency: 43528.1ms, fps: 0.0229737 ====== I1112 15:25:57.628367 393757 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 25, latency: 43528.3ms, fps: 0.0229736 ====== I1112 15:25:57.668081 393755 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 23, latency: 43528.7ms, fps: 0.0229734 ====== I1112 15:25:57.689358 393762 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 30, latency: 43528.9ms, fps: 0.0229733 ====== I1112 15:25:57.690102 393746 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 14, latency: 43528.9ms, fps: 0.0229732 ====== I1112 15:25:57.723378 393759 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 27, latency: 43529.2ms, fps: 0.0229731 ====== I1112 15:25:57.747210 393765 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 33, latency: 43529.5ms, fps: 0.0229729 ====== I1112 15:25:57.790575 393735 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 3, latency: 43529.8ms, fps: 0.0229728 ====== I1112 15:25:57.790573 393736 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 4, latency: 43529.8ms, fps: 0.0229728 ====== I1112 15:25:57.792763 393749 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 17, latency: 43529.8ms, fps: 0.0229728 ====== I1112 15:25:57.811703 393732 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 0, latency: 43530.1ms, fps: 0.0229726 ====== I1112 15:25:57.853924 393771 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 39, latency: 43530.5ms, fps: 0.0229724 ====== I1112 15:25:57.857890 393758 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 26, latency: 43530.6ms, fps: 0.0229724 ====== I1112 15:25:57.873034 393748 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 16, latency: 43530.7ms, fps: 0.0229723 ====== I1112 15:25:57.888835 393739 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 7, latency: 43530.9ms, fps: 0.0229722 ====== I1112 15:25:57.900784 393750 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 18, latency: 43531ms, fps: 0.0229721 ====== I1112 15:25:57.914562 393741 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 9, latency: 43531.2ms, fps: 0.0229721 ====== I1112 15:25:58.040720 393747 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 15, latency: 43532.4ms, fps: 0.0229714 ====== I1112 15:25:58.094480 393734 helper.h:162] ====== batch_size: 1, repeat: 100, threads: 40, thread id: 2, latency: 43533ms, fps: 0.0229711 ====== [ OK ] Analyzer_resnet50.profile_mkldnn (4519964 ms) -------- 1 test from Analyzer_resnet50 (4519964 ms total)

-------- Global test environment tear-down [==========] 1 test from 1 test case ran. (4519964 ms total) [ PASSED ] 1 test.

Does it make sense? Remember the result should be better than this.

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#14366
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7