提交 5bde8dc4 编写于 作者: R russelltao

add 148,684,923 cache-references (83.32%)

上级 b67b32a6
## 1. C++程序traverse_1d_array.cpp
## 1. 验证环境
* 操作系统: CentOS7.0
* CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
* GCC-C++: 4.8.5
* JAVA: 1.8.0
* Python: 2.7.5
## 2. C++程序traverse_1d_array.cpp
### a. 编译程序
#### 安装编译依赖的软件
如Linux中需要安装gcc-c++,CentOS中可用`yum install gcc-c++`安装,Ubuntu中可用`apt-get install gcc-c++`
#### 编译程序
`g++ traverse_1d_array.cpp -o traverse_1d_array`
### b. 运行验证
#### 以步长为1遍历数组
`./traverse_1d_array -s 1`
消耗时间(毫秒):20
#### 以步长为128遍历数组
`./traverse_1d_array -s 128`
消耗时间(毫秒):280
#### 以步长为1024遍历数组
`./traverse_1d_array -s 1024`
消耗时间(毫秒):1850
### c. 使用perf验证缓存命中率
#### 以步长为1遍历数组
`perf stat -e cache-references,cache-misses,instructions,cycles,L1-dcache-load-misses,L1-dcache-loads ./traverse_1d_array -s 1`
* 输出结果:
```
Performance counter stats for './traverse_1d_array -s 1':
332,787 cache-references (82.95%)
26,230 cache-misses # 7.882 % of all cache refs (67.17%)
111,702,471 instructions # 1.61 insn per cycle (83.59%)
69,498,357 cycles (83.57%)
** 250,109 L1-dcache-load-misses # 0.43% of all L1-dcache hits (83.58%) **
58,115,659 L1-dcache-loads (82.72%)
0.030938059 seconds time elapsed
0.026916000 seconds user
0.004098000 seconds sys
```
#### 以步长为128遍历数组
`perf stat -e cache-references,cache-misses,instructions,cycles,L1-dcache-load-misses,L1-dcache-loads ./traverse_1d_array -f`
* 输出结果:
```
34,246,770 cache-references (83.16%)
912,881 cache-misses # 2.666 % of all cache refs (66.44%)
137,729,629 instructions # 0.16 insn per cycle (83.27%)
844,462,327 cycles (83.51%)
25,917,035 L1-dcache-load-misses # 38.92% of all L1-dcache hits (83.51%)
66,593,669 L1-dcache-loads (83.39%)
0.291569229 seconds time elapsed
0.066179000 seconds user
0.225442000 seconds sys
```
#### 以步长为1024遍历数组
`perf stat -e cache-references,cache-misses,instructions,cycles,L1-dcache-load-misses,L1-dcache-loads ./traverse_1d_array -f`
* 输出结果:
```
148,684,923 cache-references (83.32%)
8,213,600 cache-misses # 5.524 % of all cache refs (66.64%)
312,534,826 instructions # 0.06 insn per cycle (83.32%)
5,593,728,896 cycles (83.32%)
148,953,141 L1-dcache-load-misses # 133.42% of all L1-dcache hits (83.37%)
111,642,681 L1-dcache-loads (83.35%)
1.894789074 seconds time elapsed
0.158064000 seconds user
1.736704000 seconds sys
```
## 3. Java程序
### a. 编译程序
`javac traverse_1d_array.java`
### b.运行验证
#### 使用array[i][j]遍历数组
`./traverse_1d_array -f`
`java traverse_1d_array -f`
消耗时间(毫秒):20
#### 使用array[j][i]遍历数组
`./traverse_1d_array -s`
`java traverse_1d_array -s`
消耗时间(毫秒):100
### c. 使用perf验证缓存命中率
#### 使用array[i][j]遍历数组
`perf stat -e cache-references,cache-misses,instructions,cycles,L1-dcache-load-misses,L1-dcache-loads ./traverse_1d_array -f`
* 输出结果:
```
Performance counter stats for 'java traverse_2d_array -f':
6,379,138 cache-references (80.62%)
866,578 cache-misses # 13.585 % of all cache refs (68.93%)
459,726,039 instructions # 1.51 insn per cycle (85.22%)
303,673,757 cycles (85.69%)
5,270,707 L1-dcache-load-misses # 3.96% of all L1-dcache hits (81.64%)
133,211,743 L1-dcache-loads (83.13%)
0.126089887 seconds time elapsed
0.122353000 seconds user
0.047877000 seconds sys
```
#### 使用array[j][i]遍历数组
`perf stat -e cache-references,cache-misses,instructions,cycles,L1-dcache-load-misses,L1-dcache-loads ./traverse_1d_array -s`
## 2. python程序traverse_1d_array.py
\ No newline at end of file
* 输出结果:
```
Performance counter stats for 'java traverse_2d_array -s':
42,441,956 cache-references (80.21%)
872,336 cache-misses # 2.055 % of all cache refs (66.61%)
386,326,280 instructions # 0.71 insn per cycle (84.29%)
544,411,061 cycles (85.01%)
38,884,991 L1-dcache-load-misses # 32.48% of all L1-dcache hits (85.24%)
119,711,464 L1-dcache-loads (82.94%)
0.192838747 seconds time elapsed
0.200693000 seconds user
0.052919000 seconds sys
```
\ No newline at end of file
......@@ -19,14 +19,18 @@ int main(int argc, char** argv) {
while((ch = getopt(argc, argv, "s:")) != -1) {
switch(ch)
{
case 's':
step = atoi(optarg);
break;
//步长s必须小于1024
case 's':
step = atoi(optarg);
if (step > 1024) step = 1024;
break;
}
}
char* arr = new char[TESTN];
//使用clock比取系统时间能够更准确的看到消耗了多少CPU资源
clock_t start, end;
//用不同的步长,但只做total次运算,这样可以横向比较
long total = TESTN/1024,cnt = 0;
long i = 0;
start =clock();
......
import java.util.Date;
public class traverse_1d_array{
public static void main(String args[]){
int ch;
int TESTN = 4096;
boolean slowMode = false;
for (String arg : args) {
if ("-f".equals(arg)) {
slowMode = false;
break;
} else if ("-s".equals(arg)) {
slowMode = true;
break;
}
}
char [][]arr = new char[TESTN][TESTN];
Date start = new Date();
if (!slowMode) {
for(int i = 0; i < TESTN; i++) {
for(int j = 0; j < TESTN; j++) {
//arr[i][j]是连续访问的
arr[i][j] = 0;
}
}
} else {
for(int i = 0; i < TESTN; i++) {
for(int j = 0; j < TESTN; j++) {
//arr[j][i]是不连续访问的
arr[j][i] = 0;
}
}
}
System.out.println(new Date().getTime()-start.getTime());
}
}
......@@ -100,5 +100,4 @@
0.200693000 seconds user
0.052919000 seconds sys
```
## 3. python程序
\ No newline at end of file
```
\ No newline at end of file
import time
import sys, getopt
import numpy as np
try:
opts, args = getopt.getopt(sys.argv,"fs")
......@@ -15,14 +15,16 @@ for opt, arg in opts:
elif opt in ("-s"):
slowMode = True
TESTN = 10240
arr = [[0 for col in range(TESTN)] for row in range(TESTN)]
TESTN = 1024*10
arr = np.empty((TESTN, TESTN))
t1 = time.time()
if slowMode:
sum = np.sum(arr,axis=1)
for i in range(TESTN):
for j in range(TESTN):
arr[j][i] = 1
else:
sum = np.sum(arr,axis=0)
for i in range(TESTN):
for j in range(TESTN):
arr[i][j] = 1
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册