# 如何用 awk sed 命令统计一个 Unix/Linux 系统中文件大小的分布情况？

![](https://img-ask.csdnimg.cn/upload/1617766280958.jpg?x-oss-process=image/auto-orient,1/resize,w_320,m_lfit)

**问题**：

[SoftwareTeacher](https://blog.csdn.net/SoftwareTeacher?type=blog)看了《Unix 传奇》一书，想起很久以前读 Unix 文件系统设计思想的时候，里面提到Unix 的很多文件大小在 4K 以下。他提了一个问题

“怎么能用Unix 的awk， sed 和其它 shell 命令统计出你当前 Unix/Linux 系统上 所有文件大小的分布情况？  （小于1K， 1K - 4K,  4K - 1M， 1M - 10M， 10M - 1G， 1G 以上）”

**本节任务**：

请在[线上Linux环境](https://edu.csdn.net/lab/36675?targetLesson=2692)里练习命令。

下面是基本实现思路：

1. 使用命令列出目录下的所有文件信息
2. 使用命令过滤出文件名和文件大小的列
3. 使用命令统计不同文件大小的个数并计算百分比打印
4. 使用管道把上述1/2/3组合起来

以下代码根据一位回答者[Brentbin](https://blog.csdn.net/Brentbin?type=ask)的实现改编，以下实现正确的是？

## 答案

```bash
ls -Rla | awk '{print $9, $5}' | awk '
BEGIN{
    size[0] = " 0K-1K"
    size[1] = " 1K-4K"
    size[2] = " 4K-1M"
    size[3] = " 1M-10M"
    size[4] = "10M-1G"
    size[5] = " 1G+  "
    total = 0
}

($2 <= 1024) {a[0]++} 
(1024 < $2 && $2 <= 4096) {a[1]++} 
(4096 < $2 && $2 <= 1048576) {a[2]++} 
(1048576 < $2 && $2 <= 10485760) {a[3]++} 
(10485760 < $2 && $2 <= 1073741824) {a[4]++} 
(1073741824 < $2 ) {a[5]++} 

{total++} 

END {
    for(i=0;i<length(a);++i) 
        print size[i], "文件个数:", a[i], "百分比:", (a[i]/total)*100,"%"
}'
```

## 选项

### A

```bash
ls -Rla | awk '{print $9, $5}' | awk '
BEGIN{
    size[0] = " 0K-1K"
    size[1] = " 1K-4K"
    size[2] = " 4K-1M"
    size[3] = " 1M-10M"
    size[4] = "10M-1G"
    size[5] = " 1G+  "
    total = 0
}

($2 <= 1024) {a[0]++} 
(1024 < $2 && $2 <= 4096) {a[1]++} 
(4096 < $2 && $2 <= 1048576) {a[2]++} 
(1048576 < $2 && $2 <= 10485760) {a[3]++} 
(10485760 < $2 && $2 <= 1073741824) {a[4]++} 
(1073741824 < $2 ) {a[5]++} 

END {
    for(i=0;i<length(a);++i) 
        print size[i], "文件个数:", a[i], "百分比:", (a[i]/total)*100,"%"
}'
```

### B

```bash
ls -Rl | awk '{print $9, $5}' | awk '
BEGIN{
    size[0] = " 0K-1K"
    size[1] = " 1K-4K"
    size[2] = " 4K-1M"
    size[3] = " 1M-10M"
    size[4] = "10M-1G"
    size[5] = " 1G+  "
    total = 0
}

($2 <= 1024) {a[0]++} 
(1024 < $2 && $2 <= 4096) {a[1]++} 
(4096 < $2 && $2 <= 1048576) {a[2]++} 
(1048576 < $2 && $2 <= 10485760) {a[3]++} 
(10485760 < $2 && $2 <= 1073741824) {a[4]++} 
(1073741824 < $2 ) {a[5]++} 

{total++} 

END {
    for(i=0;i<length(a);++i) 
        print size[i], "文件个数:", a[i], "百分比:", (a[i]/total)*100,"%"
}'
```

### C

```bash
ls -Rla | awk '{print $9, $5}' | awk '
BEGIN{
    size[0] = " 0K-1K"
    size[1] = " 1K-4K"
    size[2] = " 4K-1M"
    size[3] = " 1M-10M"
    size[4] = "10M-1G"
    size[5] = " 1G+  "
    total = 0
}

($2 <= 1024) {a[0]++} 
(1024 < $2 && $2 <= 4096) {a[1]++} 
(4096 < $2 && $2 <= 1048576) {a[2]++} 
(1048576 < $2 && $2 <= 10485760) {a[3]++} 
(10485760 < $2 && $2 <= 1073741824) {a[4]++} 

{total++} 

END {
    for(i=0;i<length(a);++i) 
        print size[i], "文件个数:", a[i], "百分比:", (a[i]/total)*100,"%"
}'
```

### D

```bash
ls -Rla | awk '{print $9, $4}' | awk '
BEGIN{
    size[0] = " 0K-1K"
    size[1] = " 1K-4K"
    size[2] = " 4K-1M"
    size[3] = " 1M-10M"
    size[4] = "10M-1G"
    size[5] = " 1G+  "
    total = 0
}

($2 <= 1024) {a[0]++} 
(1024 < $2 && $2 <= 4096) {a[1]++} 
(4096 < $2 && $2 <= 1048576) {a[2]++} 
(1048576 < $2 && $2 <= 10485760) {a[3]++} 
(10485760 < $2 && $2 <= 1073741824) {a[4]++} 
(1073741824 < $2 ) {a[5]++} 

{total++} 

END {
    for(i=0;i<length(a);++i) 
        print size[i], "文件个数:", a[i], "百分比:", (a[i]/total)*100,"%"
}'
```
