Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #9256

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 3月 20, 2018 by saxon_zh@saxon_zhGuest

[Speed] Some operation benchmark on Eigen Tensor

Created by: dzhwinter

AFAIK, some operation on Eigen Tensor is quite slow. Please be careful if you use the broadcast, chip operations. Here is a detail list of many operators.

                     iterations      ns/op
BM_algebraicFunc/10       500000       3310    30.21 MFlops/s
BM_algebraicFunc/80       500000       3764  1700.24 MFlops/s
BM_algebraicFunc/640      200000      14676 27908.66 MFlops/s
BM_algebraicFunc/4K         2000     845890 29554.65 MFlops/s
BM_broadcasting/10        500000       3357    29.79 MFlops/s
BM_broadcasting/80        500000       3375  1895.82 MFlops/s
BM_broadcasting/640       500000       4420 92663.99 MFlops/s
BM_broadcasting/4K         10000     267284 93533.46 MFlops/s
BM_coeffWiseOp/10         500000       3266    91.83 MFlops/s
BM_coeffWiseOp/80         500000       3321  5780.22 MFlops/s
BM_coeffWiseOp/640        200000      14623 84028.95 MFlops/s
BM_coeffWiseOp/4K           2000     850300 88204.08 MFlops/s
BM_colChip/10             500000       4011     2.49 MFlops/s
BM_colChip/80             500000       3919    20.41 MFlops/s
BM_colChip/640            500000       3197   200.15 MFlops/s
BM_colChip/4K             500000       3233  1546.26 MFlops/s
BM_colReduction/10        500000       3478    28.75 MFlops/s
BM_colReduction/80        200000      11161   573.42 MFlops/s
BM_colReduction/640       200000       7870 52042.19 MFlops/s
BM_colReduction/4K          5000     308427 81056.28 MFlops/s
BM_contraction_64xNxN/10     500000       7090  1805.14 MFlops/s
BM_contraction_64xNxN/80     100000      15501 52845.46 MFlops/s
BM_contraction_64xNxN/640      50000      46078 1137805.85 MFlops/s
BM_contraction_64xNxN/4K       2000     897852 3564061.12 MFlops/s
BM_contraction_Nx64xN/10     200000      13725   932.55 MFlops/s
BM_contraction_Nx64xN/80     200000      14890 55013.27 MFlops/s
BM_contraction_Nx64xN/640     100000      21537 2434306.21 MFlops/s
BM_contraction_Nx64xN/4K       2000    1181087 2709366.40 MFlops/s
BM_contraction_NxNx64/10     200000       8660  1477.96 MFlops/s
BM_contraction_NxNx64/80     100000      15477 52927.86 MFlops/s
BM_contraction_NxNx64/640      50000      46470 1128214.28 MFlops/s
BM_contraction_NxNx64/4K       5000     564143 5672317.32 MFlops/s
BM_contraction_NxNxN/10     200000       8654   231.08 MFlops/s
BM_contraction_NxNxN/80     100000      15638 65479.29 MFlops/s
BM_contraction_NxNxN/640      20000      98106 5344083.13 MFlops/s
BM_contraction_NxNxN/4K         50   42695967 5855353.87 MFlops/s
BM_convolution_1x7/128     100000      16215 13482.31 MFlops/s
BM_convolution_1x7/1K      20000      91143 160121.78 MFlops/s
BM_convolution_1x7/4K       1000    2009086 173999.44 MFlops/s
BM_convolution_4x7/128    1000000       1633 522961.93 MFlops/s
BM_convolution_4x7/1K    1000000       1565 37174331.84 MFlops/s
BM_convolution_4x7/4K    1000000       1614 865610908.25 MFlops/s
BM_convolution_64x7/128      10000     106848 66498.74 MFlops/s
BM_convolution_64x7/1K       5000     730637 1199713.33 MFlops/s
BM_convolution_64x7/4K        100   16002777 1380461.57 MFlops/s
BM_convolution_7x1/128     200000      11202 19515.29 MFlops/s
BM_convolution_7x1/1K      50000      44060 331224.30 MFlops/s
BM_convolution_7x1/4K       2000     861338 405856.86 MFlops/s
BM_convolution_7x4/128     100000      18575 45973.54 MFlops/s
BM_convolution_7x4/1K      20000      84771 686610.84 MFlops/s
BM_convolution_7x4/4K       1000    1775696 787004.51 MFlops/s
BM_convolution_7x64/128      50000      40714 174516.31 MFlops/s
BM_convolution_7x64/1K       5000     625639 1401054.53 MFlops/s
BM_convolution_7x64/4K        100   14568089 1516411.54 MFlops/s
BM_fullReduction/10       500000       3428    29.17 MFlops/s
BM_fullReduction/80       500000       3428  1866.83 MFlops/s
BM_fullReduction/640      200000       7429 55135.02 MFlops/s
BM_fullReduction/4K        10000     276669 90360.50 MFlops/s
BM_memcpy/10              500000       3919    25.52 MFlops/s
BM_memcpy/80              500000       3847  1663.29 MFlops/s
BM_memcpy/640             500000       6807 60169.97 MFlops/s
BM_memcpy/4K                5000     564178 44312.24 MFlops/s
BM_padding/10             500000       3332    30.01 MFlops/s
BM_padding/80             500000       3349  1910.98 MFlops/s
BM_padding/640            500000       6592 62132.65 MFlops/s
BM_padding/4K               5000     577463 43292.77 MFlops/s
BM_random/10              500000       3236    30.90 MFlops/s
BM_random/80              500000       3269  1957.55 MFlops/s
BM_random/640             500000       3994 102551.78 MFlops/s
BM_random/4K               10000     266284 93884.58 MFlops/s
BM_rowChip/10             500000       3328     3.00 MFlops/s
BM_rowChip/80             500000       3220    24.84 MFlops/s
BM_rowChip/640            500000       3217   198.89 MFlops/s
BM_rowChip/4K             500000       3269  1529.47 MFlops/s
BM_rowReduction/10        500000       3317    30.14 MFlops/s
BM_rowReduction/80        500000       4868  1314.62 MFlops/s
BM_rowReduction/640        50000      50880  8050.24 MFlops/s
BM_rowReduction/4K          5000     352852 70851.06 MFlops/s
BM_shuffling/10           500000       3224    31.01 MFlops/s
BM_shuffling/80           500000       3223  1985.13 MFlops/s
BM_shuffling/640          200000      13123 31211.57 MFlops/s
BM_shuffling/4K              500    3140372  7960.84 MFlops/s
BM_slicing/10             200000      13020     7.68 MFlops/s
BM_slicing/80             200000      13710   466.81 MFlops/s
BM_slicing/640            100000      16364 25030.39 MFlops/s
BM_slicing/4K               2000     717139 34860.74 MFlops/s
BM_striding/10            500000       3483    28.70 MFlops/s
BM_striding/80            500000       4039  1584.26 MFlops/s
BM_striding/640           500000       3593 113968.07 MFlops/s
BM_striding/4K              5000     320700 77954.47 MFlops/s
BM_transcendentalFunc/10     500000       3175    31.49 MFlops/s
BM_transcendentalFunc/80     500000       3211  1992.60 MFlops/s
BM_transcendentalFunc/640     200000      14442 28359.91 MFlops/s
BM_transcendentalFunc/4K       2000     846544 29531.83 MFlops/s
BM_typeCasting/10         500000       3192    31.33 MFlops/s
BM_typeCasting/80         500000       3289  1945.38 MFlops/s
BM_typeCasting/640        500000       6875 59571.46 MFlops/s
BM_typeCasting/4K           5000     604207 41376.54 MFlops/s
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#9256
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7