提交 3fb5c298 编写于 作者: C Christian Ehrhardt 提交者: Linus Torvalds

swap: allow swap readahead to be merged

Swap readahead works fine, but the I/O to disk is almost always done in
page size requests, despite the fact that readahead submits
1<<page-cluster pages at a time.

On older kernels the old per device plugging behavior might have captured
this and merged the requests, but currently all comes down to much more
I/Os than required.

On a single device this might not be an issue, but as soon as a server
runs on shared san resources savin I/Os not only improves swapin
throughput but also provides a lower resource utilization.

With a load running KVM in a lot of memory overcommitment (the hot memory
is 1.5 times the host memory) swapping throughput improves significantly
and the lead feels more responsive as well as achieves more throughput.

In a test setup with 16 swap disks running blocktrace on one of those disks
shows the improved merging:
Prior:
Reads Queued:     560,888,    2,243MiB  Writes Queued:     226,242,  904,968KiB
Read Dispatches:  544,701,    2,243MiB  Write Dispatches:  159,318,  904,968KiB
Reads Requeued:         0               Writes Requeued:         0
Reads Completed:  544,716,    2,243MiB  Writes Completed:  159,321,  904,980KiB
Read Merges:       16,187,   64,748KiB  Write Merges:       61,744,  246,976KiB
IO unplugs:       149,614               Timer unplugs:       2,940

With the patch:
Reads Queued:     734,315,    2,937MiB  Writes Queued:     300,188,    1,200MiB
Read Dispatches:  214,972,    2,937MiB  Write Dispatches:  215,176,    1,200MiB
Reads Requeued:         0               Writes Requeued:         0
Reads Completed:  214,971,    2,937MiB  Writes Completed:  215,177,    1,200MiB
Read Merges:      519,343,    2,077MiB  Write Merges:       73,325,  293,300KiB
IO unplugs:       337,130               Timer unplugs:      11,184

I got ~10% to ~40% more throughput in my cases and at the same time much
lower cpu consumption when broken down per transferred kilobyte (the
majority of that due to saved interrupts and better cache handling).  In a
shared SAN others might get an additional benefit as well, because this
now causes less protocol overhead.
Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
上级 a7d6f529
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
#include <linux/init.h> #include <linux/init.h>
#include <linux/pagemap.h> #include <linux/pagemap.h>
#include <linux/backing-dev.h> #include <linux/backing-dev.h>
#include <linux/blkdev.h>
#include <linux/pagevec.h> #include <linux/pagevec.h>
#include <linux/migrate.h> #include <linux/migrate.h>
#include <linux/page_cgroup.h> #include <linux/page_cgroup.h>
...@@ -376,6 +377,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, ...@@ -376,6 +377,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
unsigned long offset = swp_offset(entry); unsigned long offset = swp_offset(entry);
unsigned long start_offset, end_offset; unsigned long start_offset, end_offset;
unsigned long mask = (1UL << page_cluster) - 1; unsigned long mask = (1UL << page_cluster) - 1;
struct blk_plug plug;
/* Read a page_cluster sized and aligned cluster around offset. */ /* Read a page_cluster sized and aligned cluster around offset. */
start_offset = offset & ~mask; start_offset = offset & ~mask;
...@@ -383,6 +385,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, ...@@ -383,6 +385,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
if (!start_offset) /* First page is swap header. */ if (!start_offset) /* First page is swap header. */
start_offset++; start_offset++;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) { for (offset = start_offset; offset <= end_offset ; offset++) {
/* Ok, do the async read-ahead now */ /* Ok, do the async read-ahead now */
page = read_swap_cache_async(swp_entry(swp_type(entry), offset), page = read_swap_cache_async(swp_entry(swp_type(entry), offset),
...@@ -391,6 +394,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, ...@@ -391,6 +394,8 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
continue; continue;
page_cache_release(page); page_cache_release(page);
} }
blk_finish_plug(&plug);
lru_add_drain(); /* Push any new pages onto the LRU now */ lru_add_drain(); /* Push any new pages onto the LRU now */
return read_swap_cache_async(entry, gfp_mask, vma, addr); return read_swap_cache_async(entry, gfp_mask, vma, addr);
} }
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册