提交 · 4bfbd121c88d841d060c10b027b2e748772ab48e · openanolis / cloud-kernel

23 4月, 2020 15 次提交

mm, compaction: finish pageblock scanning on contention · 4bfbd121

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit cb2dcaf023c2cf12d45289c82d4030d33f7df73e upstream

Async migration aborts on spinlock contention but contention can be high
when there are multiple compaction attempts and kswapd is active.  The
consequence is that the migration scanners move forward uselessly while
still contending on locks for longer while leaving suitable migration
sources behind.

This patch will acquire the lock but track when contention occurs.  When
it does, the current pageblock will finish as compaction may succeed for
that block and then abort.  This will have a variable impact on latency
as in some cases useless scanning is avoided (reduces latency) but a
lock will be contended (increase latency) or a single contended
pageblock is scanned that would otherwise have been skipped (increase
latency).

                                     5.0.0-rc1              5.0.0-rc1
                                norescan-v3r16    finishcontend-v3r16
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3002.07 (   0.00%)     3153.17 (  -5.03%)
Amean     fault-both-5      4684.47 (   0.00%)     4280.52 (   8.62%)
Amean     fault-both-7      6815.54 (   0.00%)     5811.50 *  14.73%*
Amean     fault-both-12    10864.02 (   0.00%)     9276.85 (  14.61%)
Amean     fault-both-18    12247.52 (   0.00%)    11032.67 (   9.92%)
Amean     fault-both-24    15683.99 (   0.00%)    14285.70 (   8.92%)
Amean     fault-both-30    18620.02 (   0.00%)    16293.76 *  12.49%*
Amean     fault-both-32    19250.28 (   0.00%)    16721.02 *  13.14%*

                                5.0.0-rc1              5.0.0-rc1
                           norescan-v3r16    finishcontend-v3r16
Percentage huge-1         0.00 (   0.00%)        0.00 (   0.00%)
Percentage huge-3        95.00 (   0.00%)       96.82 (   1.92%)
Percentage huge-5        94.22 (   0.00%)       95.40 (   1.26%)
Percentage huge-7        92.35 (   0.00%)       95.92 (   3.86%)
Percentage huge-12       91.90 (   0.00%)       96.73 (   5.25%)
Percentage huge-18       89.58 (   0.00%)       96.77 (   8.03%)
Percentage huge-24       90.03 (   0.00%)       96.05 (   6.69%)
Percentage huge-30       89.14 (   0.00%)       96.81 (   8.60%)
Percentage huge-32       90.58 (   0.00%)       97.41 (   7.54%)

There is a variable impact that is mostly good on latency while allocation
success rates are slightly higher.  System CPU usage is reduced by about
10% but scan rate impact is mixed

Compaction migrate scanned    27997659.00    20148867
Compaction free scanned      120782791.00   118324914

Migration scan rates are reduced 28% which is expected as a pageblock is
used by the async scanner instead of skipped.  The impact on the free
scanner is known to be variable.  Overall the primary justification for
this patch is that completing scanning of a pageblock is very important
for later patches.

[yuehaibing@huawei.com: fix unused variable warning]
Link: http://lkml.kernel.org/r/20190118175136.31341-14-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

4bfbd121

mm, compaction: avoid rescanning the same pageblock multiple times · 1586467d

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit 804d3121ba5f03af0ab225e2f688ee3ee669c0d2 upstream

Pageblocks are marked for skip when no pages are isolated after a scan.
However, it's possible to hit corner cases where the migration scanner
gets stuck near the boundary between the source and target scanner. Due
to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that are
migrated can be reallocated before the pageblock is complete. The
pageblock is not necessarily skipped so it can be rescanned multiple
times. Similarly, a pageblock with some dirty/writeback pages may fail
to migrate and be rescanned until writeback completes which is wasteful.

This patch tracks if a pageblock is being rescanned. If so, then the
entire pageblock will be migrated as one operation. This narrows the
race window during which pages can be reallocated during migration.
Secondly, if there are pages that cannot be isolated then the pageblock
will still be fully scanned and marked for skipping. On the second
rescan, the pageblock skip is set and the migration scanner makes
progress.

5.0.0-rc1 5.0.0-rc1
findfree-v3r16 norescan-v3r16
Amean fault-both-1 0.00 ( 0.00%) 0.00 * 0.00%*
Amean fault-both-3 3200.68 ( 0.00%) 3002.07 ( 6.21%)
Amean fault-both-5 4847.75 ( 0.00%) 4684.47 ( 3.37%)
Amean fault-both-7 6658.92 ( 0.00%) 6815.54 ( -2.35%)
Amean fault-both-12 11077.62 ( 0.00%) 10864.02 ( 1.93%)
Amean fault-both-18 12403.97 ( 0.00%) 12247.52 ( 1.26%)
Amean fault-both-24 15607.10 ( 0.00%) 15683.99 ( -0.49%)
Amean fault-both-30 18752.27 ( 0.00%) 18620.02 ( 0.71%)
Amean fault-both-32 21207.54 ( 0.00%) 19250.28 * 9.23%*

5.0.0-rc1 5.0.0-rc1
findfree-v3r16 norescan-v3r16
Percentage huge-3 96.86 ( 0.00%) 95.00 ( -1.91%)
Percentage huge-5 93.72 ( 0.00%) 94.22 ( 0.53%)
Percentage huge-7 94.31 ( 0.00%) 92.35 ( -2.08%)
Percentage huge-12 92.66 ( 0.00%) 91.90 ( -0.82%)
Percentage huge-18 91.51 ( 0.00%) 89.58 ( -2.11%)
Percentage huge-24 90.50 ( 0.00%) 90.03 ( -0.52%)
Percentage huge-30 91.57 ( 0.00%) 89.14 ( -2.65%)
Percentage huge-32 91.00 ( 0.00%) 90.58 ( -0.46%)

Negligible difference but this was likely a case when the specific
corner case was not hit. A previous run of the same patch based on an
earlier iteration of the series showed large differences where migration
rates could be halved when the corner case was hit.

The specific corner case where migration scan rates go through the roof
was due to a dirty/writeback pageblock located at the boundary of the
migration/free scanner did not happen in this case. When it does
happen, the scan rates multipled by massive margins.

Link: http://lkml.kernel.org/r/20190118175136.31341-13-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

1586467d

mm, compaction: use free lists to quickly locate a migration target · 8eaac357

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit 5a811889de10f1ebb8e03a2744be006e909c405c upstream

Similar to the migration scanner, this patch uses the free lists to
quickly locate a migration target.  The search is different in that
lower orders will be searched for a suitable high PFN if necessary but
the search is still bound.  This is justified on the grounds that the
free scanner typically scans linearly much more than the migration
scanner.

If a free page is found, it is isolated and compaction continues if
enough pages were isolated.  For SYNC* scanning, the full pageblock is
scanned for any remaining free pages so that is can be marked for
skipping in the near future.

1-socket thpfioscale
                                     5.0.0-rc1              5.0.0-rc1
                                 isolmig-v3r15         findfree-v3r16
Amean     fault-both-3      3024.41 (   0.00%)     3200.68 (  -5.83%)
Amean     fault-both-5      4749.30 (   0.00%)     4847.75 (  -2.07%)
Amean     fault-both-7      6454.95 (   0.00%)     6658.92 (  -3.16%)
Amean     fault-both-12    10324.83 (   0.00%)    11077.62 (  -7.29%)
Amean     fault-both-18    12896.82 (   0.00%)    12403.97 (   3.82%)
Amean     fault-both-24    13470.60 (   0.00%)    15607.10 * -15.86%*
Amean     fault-both-30    17143.99 (   0.00%)    18752.27 (  -9.38%)
Amean     fault-both-32    17743.91 (   0.00%)    21207.54 * -19.52%*

The impact on latency is variable but the search is optimistic and
sensitive to the exact system state.  Success rates are similar but the
major impact is to the rate of scanning

                                5.0.0-rc1      5.0.0-rc1
                            isolmig-v3r15 findfree-v3r16
Compaction migrate scanned    25646769          29507205
Compaction free scanned      201558184         100359571

The free scan rates are reduced by 50%.  The 2-socket reductions for the
free scanner are more dramatic which is a likely reflection that the
machine has more memory.

[dan.carpenter@oracle.com: fix static checker warning]
[vbabka@suse.cz: correct number of pages scanned for lower orders]
Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

8eaac357

mm, compaction: keep migration source private to a single compaction instance · 87f9b9a3

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit e380bebe4771548df9bece8b7ad9dab07d9158a6 upstream

Due to either a fast search of the free list or a linear scan, it is
possible for multiple compaction instances to pick the same pageblock
for migration.  This is lucky for one scanner and increased scanning for
all the others.  It also allows a race between requests on which first
allocates the resulting free block.

This patch tests and updates the pageblock skip for the migration
scanner carefully.  When isolating a block, it will check and skip if
the block is already in use.  Once the zone lock is acquired, it will be
rechecked so that only one scanner can set the pageblock skip for
exclusive use.  Any scanner contending will continue with a linear scan.
The skip bit is still set if no pages can be isolated in a range.  While
this may result in redundant scanning, it avoids unnecessarily acquiring
the zone lock when there are no suitable migration sources.

1-socket thpscale
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3390.40 (   0.00%)     3024.41 (  10.80%)
Amean     fault-both-5      5082.28 (   0.00%)     4749.30 (   6.55%)
Amean     fault-both-7      7012.51 (   0.00%)     6454.95 (   7.95%)
Amean     fault-both-12    11346.63 (   0.00%)    10324.83 (   9.01%)
Amean     fault-both-18    15324.19 (   0.00%)    12896.82 *  15.84%*
Amean     fault-both-24    16088.50 (   0.00%)    13470.60 *  16.27%*
Amean     fault-both-30    18723.42 (   0.00%)    17143.99 (   8.44%)
Amean     fault-both-32    18612.01 (   0.00%)    17743.91 (   4.66%)

                                5.0.0-rc1              5.0.0-rc1
                            findmig-v3r15          isolmig-v3r15
Percentage huge-3        89.83 (   0.00%)       92.96 (   3.48%)
Percentage huge-5        91.96 (   0.00%)       93.26 (   1.41%)
Percentage huge-7        92.85 (   0.00%)       93.63 (   0.84%)
Percentage huge-12       92.74 (   0.00%)       92.80 (   0.07%)
Percentage huge-18       91.71 (   0.00%)       91.62 (  -0.10%)
Percentage huge-24       92.13 (   0.00%)       91.50 (  -0.69%)
Percentage huge-30       93.79 (   0.00%)       92.73 (  -1.13%)
Percentage huge-32       91.27 (   0.00%)       91.94 (   0.74%)

This shows a reasonable reduction in latency as multiple compaction
scanners do not operate on the same blocks with a similar allocation
success rate.

Compaction migrate scanned    41093126    25646769

Migration scan rates are reduced by 38%.

Link: http://lkml.kernel.org/r/20190118175136.31341-11-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

87f9b9a3

mm, compaction: use free lists to quickly locate a migration source · 255de7ac

由 Mel Gorman 提交于 4月 04, 2020

to #26255339

commit 70b44595eafe9c7c235f076d653a268ca1ab9fdb upstream

The migration scanner is a linear scan of a zone with a potentiall large
search space. Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate. These still get scanned in the common case of allocating a THP
and the cost accumulates.

The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP. It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire
block. The lowest PFN found during searches is tracked as the basis of
the start for the linear search after the first search of the free list
fails. After the search, the free list is shuffled so that the next
search will not encounter the same page. If the search fails then the
subsequent searches will be shorter and the linear scanner is used.

If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used.
It is somewhat pointless to use the list search in those cases. Small
free pages must be used for the search and there is no guarantee that
movable pages are located within that block that are contiguous.

5.0.0-rc1 5.0.0-rc1
noboost-v3r10 findmig-v3r15
Amean fault-both-3 3771.41 ( 0.00%) 3390.40 ( 10.10%)
Amean fault-both-5 5409.05 ( 0.00%) 5082.28 ( 6.04%)
Amean fault-both-7 7040.74 ( 0.00%) 7012.51 ( 0.40%)
Amean fault-both-12 11887.35 ( 0.00%) 11346.63 ( 4.55%)
Amean fault-both-18 16718.19 ( 0.00%) 15324.19 ( 8.34%)
Amean fault-both-24 21157.19 ( 0.00%) 16088.50 * 23.96%*
Amean fault-both-30 21175.92 ( 0.00%) 18723.42 * 11.58%*
Amean fault-both-32 21339.03 ( 0.00%) 18612.01 * 12.78%*

5.0.0-rc1 5.0.0-rc1
noboost-v3r10 findmig-v3r15
Percentage huge-3 86.50 ( 0.00%) 89.83 ( 3.85%)
Percentage huge-5 92.52 ( 0.00%) 91.96 ( -0.61%)
Percentage huge-7 92.44 ( 0.00%) 92.85 ( 0.44%)
Percentage huge-12 92.98 ( 0.00%) 92.74 ( -0.25%)
Percentage huge-18 91.70 ( 0.00%) 91.71 ( 0.02%)
Percentage huge-24 91.59 ( 0.00%) 92.13 ( 0.60%)
Percentage huge-30 90.14 ( 0.00%) 93.79 ( 4.04%)
Percentage huge-32 90.03 ( 0.00%) 91.27 ( 1.37%)

This shows an improvement in allocation latencies with similar
allocation success rates. While not presented, there was a 31%
reduction in migration scanning and a 8% reduction on system CPU usage.
A 2-socket machine showed similar benefits.

[mgorman@techsingularity.net: several fixes]
Link: http://lkml.kernel.org/r/20190204120111.GL9565@techsingularity.net
[vbabka@suse.cz: migrate block that was found-fast, some optimisations]
Link: http://lkml.kernel.org/r/20190118175136.31341-10-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <Vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

255de7ac

mm, compaction: ignore the fragmentation avoidance boost for isolation and compaction · a7ff78ca

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit fd1444b2729289ea3ef6b6096be604f8983e9f9f upstream

When pageblocks get fragmented, watermarks are artifically boosted to
reclaim pages to avoid further fragmentation events.  However,
compaction is often either fragmentation-neutral or moving movable pages
away from unmovable/reclaimable pages.  As the true watermarks are
preserved, allow compaction to ignore the boost factor.

The expected impact is very slight as the main benefit is that
compaction is slightly more likely to succeed when the system has been
fragmented very recently.  On both 1-socket and 2-socket machines for
THP-intensive allocation during fragmentation the success rate was
increased by less than 1% which is marginal.  However, detailed tracing
indicated that failure of migration due to a premature ENOMEM triggered
by watermark checks were eliminated.

Link: http://lkml.kernel.org/r/20190118175136.31341-9-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

a7ff78ca

mm, compaction: always finish scanning of a full pageblock · d3a528a2

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit efe771c7603bc524425070d651e70e9c56c57f28 upstream

When compaction is finishing, it uses a flag to ensure the pageblock is
complete but it makes sense to always complete migration of a pageblock.
Minimally, skip information is based on a pageblock and partially
scanned pageblocks may incur more scanning in the future.  The pageblock
skip handling also becomes more strict later in the series and the hint
is more useful if a complete pageblock was always scanned.

The potentially impacts latency as more scanning is done but it's not a
consistent win or loss as the scanning is not always a high percentage
of the pageblock and sometimes it is offset by future reductions in
scanning.  Hence, the results are not presented this time due to a
misleading mix of gains/losses without any clear pattern.  However, full
scanning of the pageblock is important for later patches.

Link: http://lkml.kernel.org/r/20190118175136.31341-8-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

d3a528a2

mm, migrate: immediately fail migration of a page with no migration handler · bed49b98

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit 806031bb5ec36ed879d64249d5a5cf9c6657f89d upstream

Pages with no migration handler use a fallback handler which sometimes
works and sometimes persistently retries.  A historical example was
blockdev pages but there are others such as odd refcounting when
page->private is used.  These are retried multiple times which is
wasteful during compaction so this patch will fail migration faster
unless the caller specifies MIGRATE_SYNC.

This is not expected to help THP allocation success rates but it did
reduce latencies very slightly in some cases.

1-socket thpfioscale
                                        4.20.0                 4.20.0
                              noreserved-v2r15         failfast-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3839.67 (   0.00%)     3833.72 (   0.15%)
Amean     fault-both-5      5177.47 (   0.00%)     4967.15 (   4.06%)
Amean     fault-both-7      7245.03 (   0.00%)     7139.19 (   1.46%)
Amean     fault-both-12    11534.89 (   0.00%)    11326.30 (   1.81%)
Amean     fault-both-18    16241.10 (   0.00%)    16270.70 (  -0.18%)
Amean     fault-both-24    19075.91 (   0.00%)    19839.65 (  -4.00%)
Amean     fault-both-30    22712.11 (   0.00%)    21707.05 (   4.43%)
Amean     fault-both-32    21692.92 (   0.00%)    21968.16 (  -1.27%)

The 2-socket results are not materially different.  Scan rates are
similar as expected.

Link: http://lkml.kernel.org/r/20190118175136.31341-7-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

bed49b98

mm, compaction: rename map_pages to split_map_pages · 47ee435b

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit 4469ab98477b290f6728b79f8d225d9d88ce16e3 upstream

It's non-obvious that high-order free pages are split into order-0 pages
from the function name.  Fix it.

Link: http://lkml.kernel.org/r/20190118175136.31341-6-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

47ee435b

mm, compaction: remove unnecessary zone parameter in some instances · 66dfb67e

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit 40cacbcb324036233a927418441323459d28d19b upstream

A zone parameter is passed into a number of top-level compaction
functions despite the fact that it's already in compact_control.  This
is harmless but it did need an audit to check if zone actually ever
changes meaningfully.  This patches removes the parameter in a number of
top-level functions.  The change could be much deeper but this was
enough to briefly clarify the flow.

No functional change.

Link: http://lkml.kernel.org/r/20190118175136.31341-5-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

66dfb67e

mm, compaction: remove last_migrated_pfn from compact_control · 3e1a250c

由 Mel Gorman 提交于 4月 04, 2020

to #26255339

commit 566e54e113eb2b669f9300db2c2df400cbb06646 upstream

The last_migrated_pfn field is a bit dubious as to whether it really
helps but either way, the information from it can be inferred without
increasing the size of compact_control so remove the field.

Link: http://lkml.kernel.org/r/20190118175136.31341-4-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

3e1a250c

mm, compaction: rearrange compact_control · 7f7d6863

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit c5943b9c5312d4fa23175ff146e901b865e4a60a upstream

compact_control spans two cache lines with write-intensive lines on
both.  Rearrange so the most write-intensive fields are in the same
cache line.  This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.

Link: http://lkml.kernel.org/r/20190118175136.31341-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

7f7d6863

mm, compaction: shrink compact_control · 42fbe240

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit c5fbd937b603885f1db3280ca212ed28add895bc upstream

Patch series "Increase success rates and reduce latency of compaction", v3.

This series reduces scan rates and success rates of compaction,
primarily by using the free lists to shorten scans, better controlling
of skip information and whether multiple scanners can target the same
block and capturing pageblocks before being stolen by parallel requests.
The series is based on mmotm from January 9th, 2019 with the previous
compaction series reverted.

I'm mostly using thpscale to measure the impact of the series.  The
benchmark creates a large file, maps it, faults it, punches holes in the
mapping so that the virtual address space is fragmented and then tries
to allocate THP.  It re-executes for different numbers of threads.  From
a fragmentation perspective, the workload is relatively benign but it
does stress compaction.

The overall impact on latencies for a 1-socket machine is

				      baseline		      patches
Amean     fault-both-3      3832.09 (   0.00%)     2748.56 *  28.28%*
Amean     fault-both-5      4933.06 (   0.00%)     4255.52 (  13.73%)
Amean     fault-both-7      7017.75 (   0.00%)     6586.93 (   6.14%)
Amean     fault-both-12    11610.51 (   0.00%)     9162.34 *  21.09%*
Amean     fault-both-18    17055.85 (   0.00%)    11530.06 *  32.40%*
Amean     fault-both-24    19306.27 (   0.00%)    17956.13 (   6.99%)
Amean     fault-both-30    22516.49 (   0.00%)    15686.47 *  30.33%*
Amean     fault-both-32    23442.93 (   0.00%)    16564.83 *  29.34%*

The allocation success rates are much improved

			 	 baseline		 patches
Percentage huge-3        85.99 (   0.00%)       97.96 (  13.92%)
Percentage huge-5        88.27 (   0.00%)       96.87 (   9.74%)
Percentage huge-7        85.87 (   0.00%)       94.53 (  10.09%)
Percentage huge-12       82.38 (   0.00%)       98.44 (  19.49%)
Percentage huge-18       83.29 (   0.00%)       99.14 (  19.04%)
Percentage huge-24       81.41 (   0.00%)       97.35 (  19.57%)
Percentage huge-30       80.98 (   0.00%)       98.05 (  21.08%)
Percentage huge-32       80.53 (   0.00%)       97.06 (  20.53%)

That's a nearly perfect allocation success rate.

The biggest impact is on the scan rates

Compaction migrate scanned    55893379    19341254
Compaction free scanned      474739990    11903963

The number of pages scanned for migration was reduced by 65% and the
free scanner was reduced by 97.5%.  So much less work in exchange for
lower latency and better success rates.

The series was also evaluated using a workload that heavily fragments
memory but the benefits there are also significant, albeit not
presented.

It was commented that we should be rethinking scanning entirely and to a
large extent I agree.  However, to achieve that you need a lot of this
series in place first so it's best to make the linear scanners as best
as possible before ripping them out.

This patch (of 22):

The isolate and migrate scanners should never isolate more than a
pageblock of pages so unsigned int is sufficient saving 8 bytes on a
64-bit build.

Link: http://lkml.kernel.org/r/20190118175136.31341-2-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

42fbe240

mm: move zone watermark accesses behind an accessor · 1bb2b85e

由 Mel Gorman 提交于 4月 04, 2020

to #26255339

commit a921444382b49cc7fdeca3fba3e278bc09484a27 upstream

This is a preparation patch only, no functional change.

Link: http://lkml.kernel.org/r/20181123114528.28802-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

1bb2b85e

alinux: Revert "mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone" · b725e258

由 Yang Shi 提交于 4月 04, 2020

to #26255339

This reverts commit 4d8bdf7f.

The commit was backported from v5.4 to stable tree, but it breaks the
context depended by backporting compaction optimization made in v5.1.
So revert this commit for now, the commit will be re-applied after the
compaction optimization series.
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

b725e258

22 4月, 2020 25 次提交

vt: vt_ioctl: fix race in VT_RESIZEX · 9002dc1a

由 Eric Dumazet 提交于 2月 10, 2020

fix #26220884

commit 6cd1ed50efd88261298577cd92a14f2768eddeeb upstream

We need to make sure vc_cons[i].d is not NULL after grabbing
console_lock(), or risk a crash.

general protection fault, probably for non-canonical address 0xdffffc0000000068: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000340-0x0000000000000347]
CPU: 1 PID: 19462 Comm: syz-executor.5 Not tainted 5.5.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vt_ioctl+0x1f96/0x26d0 drivers/tty/vt/vt_ioctl.c:883
Code: 74 41 e8 bd a6 84 fd 48 89 d8 48 c1 e8 03 42 80 3c 28 00 0f 85 e4 04 00 00 48 8b 03 48 8d b8 40 03 00 00 48 89 fa 48 c1 ea 03 <42> 0f b6 14 2a 84 d2 74 09 80 fa 03 0f 8e b1 05 00 00 44 89 b8 40
RSP: 0018:ffffc900086d7bb0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffffff8c34ee88 RCX: ffffc9001415c000
RDX: 0000000000000068 RSI: ffffffff83f0e6e3 RDI: 0000000000000340
RBP: ffffc900086d7cd0 R08: ffff888054ce0100 R09: fffffbfff16a2f6d
R10: ffff888054ce0998 R11: ffff888054ce0100 R12: 000000000000001d
R13: dffffc0000000000 R14: 1ffff920010daf79 R15: 000000000000ff7f
FS:  00007f7d13c12700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd477e3c38 CR3: 0000000095d0a000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 tty_ioctl+0xa37/0x14f0 drivers/tty/tty_io.c:2660
 vfs_ioctl fs/ioctl.c:47 [inline]
 ksys_ioctl+0x123/0x180 fs/ioctl.c:763
 __do_sys_ioctl fs/ioctl.c:772 [inline]
 __se_sys_ioctl fs/ioctl.c:770 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:770
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45b399
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f7d13c11c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f7d13c126d4 RCX: 000000000045b399
RDX: 0000000020000080 RSI: 000000000000560a RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000666 R14: 00000000004c7f04 R15: 000000000075bf2c
Modules linked in:
---[ end trace 80970faf7a67eb77 ]---
RIP: 0010:vt_ioctl+0x1f96/0x26d0 drivers/tty/vt/vt_ioctl.c:883
Code: 74 41 e8 bd a6 84 fd 48 89 d8 48 c1 e8 03 42 80 3c 28 00 0f 85 e4 04 00 00 48 8b 03 48 8d b8 40 03 00 00 48 89 fa 48 c1 ea 03 <42> 0f b6 14 2a 84 d2 74 09 80 fa 03 0f 8e b1 05 00 00 44 89 b8 40
RSP: 0018:ffffc900086d7bb0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffffff8c34ee88 RCX: ffffc9001415c000
RDX: 0000000000000068 RSI: ffffffff83f0e6e3 RDI: 0000000000000340
RBP: ffffc900086d7cd0 R08: ffff888054ce0100 R09: fffffbfff16a2f6d
R10: ffff888054ce0998 R11: ffff888054ce0100 R12: 000000000000001d
R13: dffffc0000000000 R14: 1ffff920010daf79 R15: 000000000000ff7f
FS:  00007f7d13c12700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd477e3c38 CR3: 0000000095d0a000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: stable <stable@vger.kernel.org>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20200210190721.200418-1-edumazet@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

9002dc1a

arm64: enable per-task stack canaries · 81b3f7e6

由 Ard Biesheuvel 提交于 12月 12, 2018

fix #26081752

commit 0a1213fa7432778b71a1c0166bf56660a3aab030 upstream

This enables the use of per-task stack canary values if GCC has
support for emitting the stack canary reference relative to the
value of sp_el0, which holds the task struct pointer in the arm64
kernel.

The $(eval) extends KBUILD_CFLAGS at the moment the make rule is
applied, which means asm-offsets.o (which we rely on for the offset
value) is built without the arguments, and everything built afterwards
has the options set.
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Reviewed-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Acked-by: NZou Cao <zoucao@linux.alibaba.com>

81b3f7e6

spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls · 7deb8737

由 wuxu.wu 提交于 1月 01, 2020

fix #25872428

commit 19b61392c5a852b4e8a0bf35aecb969983c5932d upstream

dw_spi_irq() and dw_spi_transfer_one concurrent calls.

I find a panic in dw_writer(): txw = *(u8 *)(dws->tx), when dw->tx==null,
dw->len==4, and dw->tx_end==1.

When tpm driver's message overtime dw_spi_irq() and dw_spi_transfer_one
may concurrent visit dw_spi, so I think dw_spi structure lack of protection.

Otherwise dw_spi_transfer_one set dw rx/tx buffer and then open irq,
store dw rx/tx instructions and other cores handle irq load dw rx/tx
instructions may out of order.

	[ 1025.321302] Call trace:
	...
	[ 1025.321319]  __crash_kexec+0x98/0x148
	[ 1025.321323]  panic+0x17c/0x314
	[ 1025.321329]  die+0x29c/0x2e8
	[ 1025.321334]  die_kernel_fault+0x68/0x78
	[ 1025.321337]  __do_kernel_fault+0x90/0xb0
	[ 1025.321346]  do_page_fault+0x88/0x500
	[ 1025.321347]  do_translation_fault+0xa8/0xb8
	[ 1025.321349]  do_mem_abort+0x68/0x118
	[ 1025.321351]  el1_da+0x20/0x8c
	[ 1025.321362]  dw_writer+0xc8/0xd0
	[ 1025.321364]  interrupt_transfer+0x60/0x110
	[ 1025.321365]  dw_spi_irq+0x48/0x70
	...
Signed-off-by: Nwuxu.wu <wuxu.wu@huawei.com>
Link: https://lore.kernel.org/r/1577849981-31489-1-git-send-email-wuxu.wu@huawei.comSigned-off-by: NMark Brown <broonie@kernel.org>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>

7deb8737

iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE · 649821ec

由 Suravee Suthikulpanit 提交于 3月 30, 2020

fix #26319040

commit 730ad0ede130015a773229573559e97ba0943065 upstream.

Commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC
(de-)activation code") accidentally left out the ir_data pointer when
calling modity_irte_ga(), which causes the function amd_iommu_update_ga()
to return prematurely due to struct amd_ir_data.ref is NULL and
the "is_run" bit of IRTE does not get updated properly.

This results in bad I/O performance since IOMMU AVIC always generate GA Log
entry and notify IOMMU driver and KVM when it receives interrupt from the
PCI pass-through device instead of directly inject interrupt to the vCPU.

Fixes by passing ir_data when calling modify_irte_ga() as done previously.

Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

649821ec

iommu/amd: Re-factor guest virtual APIC (de-)activation code · d5a28aba

由 Suthikulpanit, Suravee 提交于 3月 30, 2020

fix #26319040

commit b9c6ff94e43a0ee053e0c1d983fba1ac4953b762 upstream.

Re-factore the logic for activate/deactivate guest virtual APIC mode
(GAM)
into helper functions, and export them for other drivers (e.g. SVM).
to support run-time activate/deactivate of SVM AVIC.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

d5a28aba

iommu/amd: Lock code paths traversing protection_domain->dev_list · fd130609

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit 2a78f9962565e53b78363eaf516eb052009e8020 upstream.

The traversing of this list requires protection_domain->lock to be taken
to avoid nasty races with attach/detach code. Make sure the lock is held
on all code-paths traversing this list.
Reported-by: NFilippo Sironi <sironi@amazon.de>
Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

fd130609

iommu/amd: Lock dev_data in attach/detach code paths · 30061d95

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit ab7b2577f0d119052b98b8d913bad369ac2760eb upstream.

Make sure that attaching a detaching a device can't race against each
other and protect the iommu_dev_data with a spin_lock in these code
paths.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

30061d95

iommu/amd: Check for busy devices earlier in attach_device() · 9fcb8428

由 tianyi 提交于 3月 30, 2020

fix #26319040

commit 45e528d9c479aeef2d3d1db1e619b243f91e324f upstream.

Check early in attach_device whether the device is already attached to a
domain. This also simplifies the code path so that __attach_device() can
be removed.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

9fcb8428

iommu/amd: Take domain->lock for complete attach/detach path · 19082e2a

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit f6c0bfce271b2dd613e8b8e009eefe89c1f788e8 upstream.

The code-paths before __attach_device() and __detach_device() are called
also access and modify domain state, so take the domain lock there too.
This allows to get rid of the __detach_device() function.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

19082e2a

iommu/amd: Remove amd_iommu_devtable_lock · 7ba34a94

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit 3a11905b69eb026402448c750f97a0eadfa76b08 upstream.

The lock is not necessary because the device table does not
contain shared state that needs protection. Locking is only
needed on an individual entry basis, and that needs to
happen on the iommu_dev_data level.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

7ba34a94

iommu/amd: Remove domain->updated · 1eec40e2

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit f15d9a992f901d4f22db868adf800844d1cac9f2 upstream.

iommu/amd: Remove domain->updated

This struct member was used to track whether a domain
change requires updates to the device-table and IOMMU cache
flushes. The problem is, that access to this field is racy
since locking in the common mapping code-paths has been
eliminated.

Move the updated field to the stack to get rid of all
potential races and remove the field from the struct.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

1eec40e2

ACPI: PPTT: Consistently use unsigned int as parameter type · dfedf6da

由 Tian Tao 提交于 12月 30, 2019

to #25688970

commit 643956e61ced913a2bbdcf2c95f3d03026b39d1c upstream

The fourth parameter 'level' of function 'acpi_find_cache_level()' is
a signed interger, but its caller 'acpi_find_cache_node()' passes that
parameter an unsigned interger.

Make the paramter type inconsistency go away.
Signed-off-by: NTian Tao <tiantao6@huawei.com>
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
[ rjw: Subject/changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

dfedf6da

ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens · 428adfc0

由 Jeremy Linton 提交于 6月 26, 2019

to #25688970

commit 56855a99f3d0d1e9f1f4e24f5851f9bf14c83296 upstream

ACPI 6.3 adds a flag to indicate that child nodes are all
identical cores. This is useful to authoritatively determine
if a set of (possibly offline) cores are identical or not.

Since the flag doesn't give us a unique id we can generate
one and use it to create bitmaps of sibling nodes, or simply
in a loop to determine if a subset of cores are identical.
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NHanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

428adfc0

ACPI/PPTT: Modify node flag detection to find last IDENTICAL · 03bca653

由 Jeremy Linton 提交于 6月 26, 2019

to #25688970

commit ed2b664fcc8073c09394393756df3fc86977bbac upstream

The ACPI specification implies that the IDENTICAL flag should be
set on all non leaf nodes where the children are identical.
This means that we need to be searching for the last node with
the identical flag set rather than the first one.

Since this flag is also dependent on the table revision, we
need to add a bit of extra code to verify the table revision,
and the next node's state in the traversal. Since we want to
avoid function pointers here, lets just special case
the IDENTICAL flag.
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NHanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

03bca653

ACPI: Fix comment typos · ef551233

由 Bjorn Helgaas 提交于 3月 25, 2019

to #25688970

commit 603fadf33604a2e170eb833f99f569d3597f1f09 upstream

Fix some misspellings in comments.  No functional change intended.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

ef551233

ACPI: tables: Simplify PPTT leaf node detection · e3ff3ded

由 Jeremy Linton 提交于 3月 01, 2019

to #25688970

commit 4909e6df213a7c3e5e282538356f31ab68828793 upstream

ACPI 6.3 bumps the PPTT table revision and adds a LEAF_NODE flag.

This allows us to avoid a second pass through the table to assure
that the node in question is a leaf.
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

e3ff3ded

ACPI/PPTT: Add acpi_pptt_warn_missing() to consolidate logs · 92cceebb

由 John Garry 提交于 2月 08, 2019

to #25688970

commit 6cafe700b08cfd261a279b9e5ed99f3a346fe3b0 upstream

For a system using ACPI-based FW without a PPTT, we may get many warnings
about the lack of a PPTT, as shown:

root@(none)$ dmesg | grep -i pptt
[    0.010125] ACPI PPTT: No PPTT table found, cpu topology may be inaccurate
[    7.138339] ACPI PPTT: No PPTT table found, cache topology may be inaccurate
[    7.145368] ACPI PPTT: No PPTT table found, cache topology may be inaccurate

These logs are generated with pr_warn_once(), so the intention was for a
single log, but the logs overlap, so consolidate them.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

92cceebb

drm/amdgpu: add VM eviction lock v3 · d3dc53f4

由 Christian König 提交于 3月 18, 2020

to #25447038

commit b4ff0f8a85f3c523942e57b716e8722e7f6799cc upstream.

This allows to invalidate VM entries without taking the reservation
lock.

v3: use -EBUSY
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Nmfkang <mfkang@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

d3dc53f4

drm/amdgpu: move VM eviction decision into amdgpu_vm.c · 50638c21

由 Christian König 提交于 3月 19, 2020

to #25447038

commit 6ceeb144b1d6952a36afa6c29718beac575f2a3f upstream.

When a page tables needs to be evicted the VM code should
decide if that is possible or not.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Nmfkang <mfkang@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

50638c21

drm/amdgpu: stop evicting busy PDs/PTs · c7723fa2

由 Christian König 提交于 11月 07, 2018

to #25447038

commit 1bd4e4ca7bb8f681ff4e2b05c97ce975ccd781d6 upstream.

Otherwise we won't be able to cleanly handle page faults.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Nmfkang <mfkang@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

c7723fa2

sysctl: handle overflow in proc_get_long · 662ef34f

由 Christian Brauner 提交于 3月 07, 2019

fix #27124689

commit 7f2923c4f73f21cfd714d12a2d48de8c21f11cfe upstream.

proc_get_long() is a funny function.  It uses simple_strtoul() and for a
good reason.  proc_get_long() wants to always succeed the parse and
return the maybe incorrect value and the trailing characters to check
against a pre-defined list of acceptable trailing values.  However,
simple_strtoul() explicitly ignores overflows which can cause funny
things like the following to happen:

  echo 18446744073709551616 > /proc/sys/fs/file-max
  cat /proc/sys/fs/file-max
  0

(Which will cause your system to silently die behind your back.)

On the other hand kstrtoul() does do overflow detection but does not
return the trailing characters, and also fails the parse when anything
other than '\n' is a trailing character whereas proc_get_long() wants to
be more lenient.

Now, before adding another kstrtoul() function let's simply add a static
parse strtoul_lenient() which:
 - fails on overflow with -ERANGE
 - returns the trailing characters to the caller

The reason why we should fail on ERANGE is that we already do a partial
fail on overflow right now.  Namely, when the TMPBUFLEN is exceeded.  So
we already reject values such as 184467440737095516160 (21 chars) but
accept values such as 18446744073709551616 (20 chars) but both are
overflows.  So we should just always reject 64bit overflows and not
special-case this based on the number of chars.

Link: http://lkml.kernel.org/r/20190107222700.15954-2-christian@brauner.ioSigned-off-by: NChristian Brauner <christian@brauner.io>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

662ef34f

configs: align configs of aarch64 to x86_64 · d56f1adb

由 Shile Zhang 提交于 4月 14, 2020

to #26536261

Keep the common configs same between x86_64 and aarch64.
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d56f1adb

configs: update aarch64 config · 753594c9

由 Shile Zhang 提交于 4月 14, 2020

to #24582903

Update aarch64 configs since gcc version and more minor changes.
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

753594c9

SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge · a10b7c40

由 Yihao Wu 提交于 4月 06, 2020

fix #25707555

commit 43e33924c38e8faeb0c12035481cb150e602e39d linux-next

Deleting list entry within hlist_for_each_entry_safe is not safe unless
next pointer (tmp) is protected too. It's not, because once hash_lock
is released, cache_clean may delete the entry that tmp points to. Then
cache_purge can walk to a deleted entry and tries to double free it.

Fix this bug by holding only the deleted entry's reference.
Suggested-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NNeilBrown <neilb@suse.de>
[ cel: removed unused variable ]
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a10b7c40

sched: Avoid scale real weight down to zero · 9b83fd88

由 Michael Wang 提交于 3月 27, 2020

fix #26198889

commit 26cf52229efc87e2effa9d788f9b33c40fb3358a linux-next

During our testing, we found a case that shares no longer
working correctly, the cgroup topology is like:

  /sys/fs/cgroup/cpu/A		(shares=102400)
  /sys/fs/cgroup/cpu/A/B	(shares=2)
  /sys/fs/cgroup/cpu/A/B/C	(shares=1024)

  /sys/fs/cgroup/cpu/D		(shares=1024)
  /sys/fs/cgroup/cpu/D/E	(shares=1024)
  /sys/fs/cgroup/cpu/D/E/F	(shares=1024)

The same benchmark is running in group C & F, no other tasks are
running, the benchmark is capable to consumed all the CPUs.

We suppose the group C will win more CPU resources since it could
enjoy all the shares of group A, but it's F who wins much more.

The reason is because we have group B with shares as 2, since
A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
so A->cfs_rq.load.weight become very small.

And in calc_group_shares() we calculate shares as:

  load = max(scale_load_down(cfs_rq->load.weight),
cfs_rq->avg.load_avg);
  shares = (tg_shares * load) / tg_weight;

Since the 'cfs_rq->load.weight' is too small, the load become 0
after scale down, although 'tg_shares' is 102400, shares of the se
which stand for group A on root cfs_rq become 2.

While the se of D on root cfs_rq is far more bigger than 2, so it
wins the battle.

Thus when scale_load_down() scale real weight down to 0, it's no
longer telling the real story, the caller will have the wrong
information and the calculation will be buggy.

This patch add check in scale_load_down(), so the real weight will
be >= MIN_SHARES after scale, after applied the group C wins as
expected.
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.comAcked-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>

9b83fd88

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功