提交 5bda0e7a 编写于 作者: Z Zhang Zekun

iommu/iova: increase the iova_rcache depot max size

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7ASVH
CVE: NA

---------------------------------------

In fio test with iodepth=256 with allowd cpus to 0-255, we observe a
serve performance decrease. The statistic of cache hit rate are
relatively low. Here are some statistics about the iova_cpu_rcahe of
all cpus:

iova alloc order		0	1	2	3	4	5
----------------------------------------------------------------------
average cpu_rcache hit rate	0.9941	0.7408	0.8109	0.8854	0.9082	0.8887

Jobs: 12 (f=12): [R(12)][20.0%][r=1091MiB/s][r=279k IOPS][eta 00m:28s]
Jobs: 12 (f=12): [R(12)][22.2%][r=1426MiB/s][r=365k IOPS][eta 00m:28s]
Jobs: 12 (f=12): [R(12)][25.0%][r=1607MiB/s][r=411k IOPS][eta 00m:27s]
Jobs: 12 (f=12): [R(12)][27.8%][r=1501MiB/s][r=384k IOPS][eta 00m:26s]
Jobs: 12 (f=12): [R(12)][30.6%][r=1486MiB/s][r=380k IOPS][eta 00m:25s]
Jobs: 12 (f=12): [R(12)][33.3%][r=1393MiB/s][r=357k IOPS][eta 00m:24s]
Jobs: 12 (f=12): [R(12)][36.1%][r=1550MiB/s][r=397k IOPS][eta 00m:23s]
Jobs: 12 (f=12): [R(12)][38.9%][r=1485MiB/s][r=380k IOPS][eta 00m:22s]

The under lying hisi sas driver has 16 thread irqs to free iova, but
these irq call back function will only free iovas on 16 certain cpus(cpu{0,
16,32...,240}). For example, thread irq which smp affinity is 0-15, will
only free iova on cpu 0. However, the driver will alloc iova on all
cpus(cpu{0-255}), cpus without free iova in local cpu_rcache need to get
free iovas from iova_rcache->depot. The current size of
iova_rcache->depot max size is 32, and it seems to be too small for 256
users (16 cpus will put iovas to iova_rcache->depot and 240 cpus will
try to get iova from it). Set iova_rcache->depot to 128 can fix the
performance issue, and the performance can return to normal.

iova alloc order		0	1	2	3	4	5
----------------------------------------------------------------------
average cpu_rcache hit rate	0.9925	0.9736	0.9789	0.9867	0.9889	0.9906

Jobs: 12 (f=12): [R(12)][12.9%][r=7526MiB/s][r=1927k IOPS][eta 04m:30s]
Jobs: 12 (f=12): [R(12)][13.2%][r=7527MiB/s][r=1927k IOPS][eta 04m:29s]
Jobs: 12 (f=12): [R(12)][13.5%][r=7529MiB/s][r=1927k IOPS][eta 04m:28s]
Jobs: 12 (f=12): [R(12)][13.9%][r=7531MiB/s][r=1928k IOPS][eta 04m:27s]
Jobs: 12 (f=12): [R(12)][14.2%][r=7529MiB/s][r=1928k IOPS][eta 04m:26s]
Jobs: 12 (f=12): [R(12)][14.5%][r=7528MiB/s][r=1927k IOPS][eta 04m:25s]
Jobs: 12 (f=12): [R(12)][14.8%][r=7527MiB/s][r=1927k IOPS][eta 04m:24s]
Jobs: 12 (f=12): [R(12)][15.2%][r=7525MiB/s][r=1926k IOPS][eta 04m:23s]
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
上级 673b97e8
...@@ -437,5 +437,15 @@ config SMMU_BYPASS_DEV ...@@ -437,5 +437,15 @@ config SMMU_BYPASS_DEV
This feature will be replaced by ACPI IORT RMR node, which will be This feature will be replaced by ACPI IORT RMR node, which will be
upstreamed in mainline. upstreamed in mainline.
config IOVA_MAX_GLOBAL_MAGS
int "Set the max iova global magzines in iova rcache"
range 16 2048
default "32"
help
Iova rcache global magizine is shared among every cpu. The size of
it can be a bottle neck when lots of cpus are contending to use it.
If you are suffering from the speed of allocing iova with more than
128 cpus, try to tune this config larger.
endif # IOMMU_SUPPORT endif # IOMMU_SUPPORT
...@@ -26,7 +26,7 @@ struct iova_magazine; ...@@ -26,7 +26,7 @@ struct iova_magazine;
struct iova_cpu_rcache; struct iova_cpu_rcache;
#define IOVA_RANGE_CACHE_MAX_SIZE 6 /* log of max cached IOVA range size (in pages) */ #define IOVA_RANGE_CACHE_MAX_SIZE 6 /* log of max cached IOVA range size (in pages) */
#define MAX_GLOBAL_MAGS 32 /* magazines per bin */ #define MAX_GLOBAL_MAGS CONFIG_IOVA_MAX_GLOBAL_MAGS /* magazines per bin */
struct iova_rcache { struct iova_rcache {
spinlock_t lock; spinlock_t lock;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册