提交 5c3ad2eb 编写于 作者: V Vlastimil Babka 提交者: Linus Torvalds

mm, page_alloc: simplify pageset_update()

pageset_update() attempts to update pcplist's high and batch values in a
way that readers don't observe batch > high.  It uses smp_wmb() to order
the updates in a way to achieve this.  However, without proper pairing
read barriers in readers this guarantee doesn't hold, and there are no
such barriers in e.g.  free_unref_page_commit().

Commit 88e8ac11 ("mm, page_alloc: fix core hung in
free_pcppages_bulk()") already showed this is problematic, and solved this
by ultimately only trusing pcp->count of the current cpu with interrupts
disabled.

The update dance with unpaired write barriers thus makes no sense.
Replace them with plain WRITE_ONCE to prevent store tearing, and document
that the values can change asynchronously and should not be trusted for
correctness.

All current readers appear to be OK after 88e8ac11.  Convert them to
READ_ONCE to prevent unnecessary read tearing, but mainly to alert anybody
making future changes to the code that special care is needed.

Link: https://lkml.kernel.org/r/20201111092812.11329-5-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
上级 69a8396a
...@@ -1344,7 +1344,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, ...@@ -1344,7 +1344,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
{ {
int migratetype = 0; int migratetype = 0;
int batch_free = 0; int batch_free = 0;
int prefetch_nr = 0; int prefetch_nr = READ_ONCE(pcp->batch);
bool isolated_pageblocks; bool isolated_pageblocks;
struct page *page, *tmp; struct page *page, *tmp;
LIST_HEAD(head); LIST_HEAD(head);
...@@ -1395,8 +1395,10 @@ static void free_pcppages_bulk(struct zone *zone, int count, ...@@ -1395,8 +1395,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
* avoid excessive prefetching due to large count, only * avoid excessive prefetching due to large count, only
* prefetch buddy for the first pcp->batch nr of pages. * prefetch buddy for the first pcp->batch nr of pages.
*/ */
if (prefetch_nr++ < pcp->batch) if (prefetch_nr) {
prefetch_buddy(page); prefetch_buddy(page);
prefetch_nr--;
}
} while (--count && --batch_free && !list_empty(list)); } while (--count && --batch_free && !list_empty(list));
} }
...@@ -3197,10 +3199,8 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn) ...@@ -3197,10 +3199,8 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
pcp = &this_cpu_ptr(zone->pageset)->pcp; pcp = &this_cpu_ptr(zone->pageset)->pcp;
list_add(&page->lru, &pcp->lists[migratetype]); list_add(&page->lru, &pcp->lists[migratetype]);
pcp->count++; pcp->count++;
if (pcp->count >= pcp->high) { if (pcp->count >= READ_ONCE(pcp->high))
unsigned long batch = READ_ONCE(pcp->batch); free_pcppages_bulk(zone, READ_ONCE(pcp->batch), pcp);
free_pcppages_bulk(zone, batch, pcp);
}
} }
/* /*
...@@ -3385,7 +3385,7 @@ static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype, ...@@ -3385,7 +3385,7 @@ static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype,
do { do {
if (list_empty(list)) { if (list_empty(list)) {
pcp->count += rmqueue_bulk(zone, 0, pcp->count += rmqueue_bulk(zone, 0,
pcp->batch, list, READ_ONCE(pcp->batch), list,
migratetype, alloc_flags); migratetype, alloc_flags);
if (unlikely(list_empty(list))) if (unlikely(list_empty(list)))
return NULL; return NULL;
...@@ -6270,13 +6270,16 @@ static int zone_batchsize(struct zone *zone) ...@@ -6270,13 +6270,16 @@ static int zone_batchsize(struct zone *zone)
} }
/* /*
* pcp->high and pcp->batch values are related and dependent on one another: * pcp->high and pcp->batch values are related and generally batch is lower
* ->batch must never be higher then ->high. * than high. They are also related to pcp->count such that count is lower
* The following function updates them in a safe manner without read side * than high, and as soon as it reaches high, the pcplist is flushed.
* locking.
* *
* Any new users of pcp->batch and pcp->high should ensure they can cope with * However, guaranteeing these relations at all times would require e.g. write
* those fields changing asynchronously (acording to the above rule). * barriers here but also careful usage of read barriers at the read side, and
* thus be prone to error and bad for performance. Thus the update only prevents
* store tearing. Any new users of pcp->batch and pcp->high should ensure they
* can cope with those fields changing asynchronously, and fully trust only the
* pcp->count field on the local CPU with interrupts disabled.
* *
* mutex_is_locked(&pcp_batch_high_lock) required when calling this function * mutex_is_locked(&pcp_batch_high_lock) required when calling this function
* outside of boot time (or some other assurance that no concurrent updaters * outside of boot time (or some other assurance that no concurrent updaters
...@@ -6285,15 +6288,8 @@ static int zone_batchsize(struct zone *zone) ...@@ -6285,15 +6288,8 @@ static int zone_batchsize(struct zone *zone)
static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
unsigned long batch) unsigned long batch)
{ {
/* start with a fail safe value for batch */ WRITE_ONCE(pcp->batch, batch);
pcp->batch = 1; WRITE_ONCE(pcp->high, high);
smp_wmb();
/* Update high, then batch, in order */
pcp->high = high;
smp_wmb();
pcp->batch = batch;
} }
static void pageset_init(struct per_cpu_pageset *p) static void pageset_init(struct per_cpu_pageset *p)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册