提交 · d3a528a25601e710c8e67b6503128954d9bd75dd · openanolis / cloud-kernel

23 4月, 2020 9 次提交

mm, compaction: always finish scanning of a full pageblock · d3a528a2

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit efe771c7603bc524425070d651e70e9c56c57f28 upstream

When compaction is finishing, it uses a flag to ensure the pageblock is
complete but it makes sense to always complete migration of a pageblock.
Minimally, skip information is based on a pageblock and partially
scanned pageblocks may incur more scanning in the future.  The pageblock
skip handling also becomes more strict later in the series and the hint
is more useful if a complete pageblock was always scanned.

The potentially impacts latency as more scanning is done but it's not a
consistent win or loss as the scanning is not always a high percentage
of the pageblock and sometimes it is offset by future reductions in
scanning.  Hence, the results are not presented this time due to a
misleading mix of gains/losses without any clear pattern.  However, full
scanning of the pageblock is important for later patches.

Link: http://lkml.kernel.org/r/20190118175136.31341-8-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

d3a528a2

mm, migrate: immediately fail migration of a page with no migration handler · bed49b98

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit 806031bb5ec36ed879d64249d5a5cf9c6657f89d upstream

Pages with no migration handler use a fallback handler which sometimes
works and sometimes persistently retries.  A historical example was
blockdev pages but there are others such as odd refcounting when
page->private is used.  These are retried multiple times which is
wasteful during compaction so this patch will fail migration faster
unless the caller specifies MIGRATE_SYNC.

This is not expected to help THP allocation success rates but it did
reduce latencies very slightly in some cases.

1-socket thpfioscale
                                        4.20.0                 4.20.0
                              noreserved-v2r15         failfast-v2r15
Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
Amean     fault-both-3      3839.67 (   0.00%)     3833.72 (   0.15%)
Amean     fault-both-5      5177.47 (   0.00%)     4967.15 (   4.06%)
Amean     fault-both-7      7245.03 (   0.00%)     7139.19 (   1.46%)
Amean     fault-both-12    11534.89 (   0.00%)    11326.30 (   1.81%)
Amean     fault-both-18    16241.10 (   0.00%)    16270.70 (  -0.18%)
Amean     fault-both-24    19075.91 (   0.00%)    19839.65 (  -4.00%)
Amean     fault-both-30    22712.11 (   0.00%)    21707.05 (   4.43%)
Amean     fault-both-32    21692.92 (   0.00%)    21968.16 (  -1.27%)

The 2-socket results are not materially different.  Scan rates are
similar as expected.

Link: http://lkml.kernel.org/r/20190118175136.31341-7-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

bed49b98

mm, compaction: rename map_pages to split_map_pages · 47ee435b

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit 4469ab98477b290f6728b79f8d225d9d88ce16e3 upstream

It's non-obvious that high-order free pages are split into order-0 pages
from the function name.  Fix it.

Link: http://lkml.kernel.org/r/20190118175136.31341-6-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

47ee435b

mm, compaction: remove unnecessary zone parameter in some instances · 66dfb67e

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit 40cacbcb324036233a927418441323459d28d19b upstream

A zone parameter is passed into a number of top-level compaction
functions despite the fact that it's already in compact_control.  This
is harmless but it did need an audit to check if zone actually ever
changes meaningfully.  This patches removes the parameter in a number of
top-level functions.  The change could be much deeper but this was
enough to briefly clarify the flow.

No functional change.

Link: http://lkml.kernel.org/r/20190118175136.31341-5-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

66dfb67e

mm, compaction: remove last_migrated_pfn from compact_control · 3e1a250c

由 Mel Gorman 提交于 4月 04, 2020

to #26255339

commit 566e54e113eb2b669f9300db2c2df400cbb06646 upstream

The last_migrated_pfn field is a bit dubious as to whether it really
helps but either way, the information from it can be inferred without
increasing the size of compact_control so remove the field.

Link: http://lkml.kernel.org/r/20190118175136.31341-4-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

3e1a250c

mm, compaction: rearrange compact_control · 7f7d6863

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit c5943b9c5312d4fa23175ff146e901b865e4a60a upstream

compact_control spans two cache lines with write-intensive lines on
both.  Rearrange so the most write-intensive fields are in the same
cache line.  This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.

Link: http://lkml.kernel.org/r/20190118175136.31341-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

7f7d6863

mm, compaction: shrink compact_control · 42fbe240

由 Mel Gorman 提交于 3月 05, 2019

to #26255339

commit c5fbd937b603885f1db3280ca212ed28add895bc upstream

Patch series "Increase success rates and reduce latency of compaction", v3.

This series reduces scan rates and success rates of compaction,
primarily by using the free lists to shorten scans, better controlling
of skip information and whether multiple scanners can target the same
block and capturing pageblocks before being stolen by parallel requests.
The series is based on mmotm from January 9th, 2019 with the previous
compaction series reverted.

I'm mostly using thpscale to measure the impact of the series.  The
benchmark creates a large file, maps it, faults it, punches holes in the
mapping so that the virtual address space is fragmented and then tries
to allocate THP.  It re-executes for different numbers of threads.  From
a fragmentation perspective, the workload is relatively benign but it
does stress compaction.

The overall impact on latencies for a 1-socket machine is

				      baseline		      patches
Amean     fault-both-3      3832.09 (   0.00%)     2748.56 *  28.28%*
Amean     fault-both-5      4933.06 (   0.00%)     4255.52 (  13.73%)
Amean     fault-both-7      7017.75 (   0.00%)     6586.93 (   6.14%)
Amean     fault-both-12    11610.51 (   0.00%)     9162.34 *  21.09%*
Amean     fault-both-18    17055.85 (   0.00%)    11530.06 *  32.40%*
Amean     fault-both-24    19306.27 (   0.00%)    17956.13 (   6.99%)
Amean     fault-both-30    22516.49 (   0.00%)    15686.47 *  30.33%*
Amean     fault-both-32    23442.93 (   0.00%)    16564.83 *  29.34%*

The allocation success rates are much improved

			 	 baseline		 patches
Percentage huge-3        85.99 (   0.00%)       97.96 (  13.92%)
Percentage huge-5        88.27 (   0.00%)       96.87 (   9.74%)
Percentage huge-7        85.87 (   0.00%)       94.53 (  10.09%)
Percentage huge-12       82.38 (   0.00%)       98.44 (  19.49%)
Percentage huge-18       83.29 (   0.00%)       99.14 (  19.04%)
Percentage huge-24       81.41 (   0.00%)       97.35 (  19.57%)
Percentage huge-30       80.98 (   0.00%)       98.05 (  21.08%)
Percentage huge-32       80.53 (   0.00%)       97.06 (  20.53%)

That's a nearly perfect allocation success rate.

The biggest impact is on the scan rates

Compaction migrate scanned    55893379    19341254
Compaction free scanned      474739990    11903963

The number of pages scanned for migration was reduced by 65% and the
free scanner was reduced by 97.5%.  So much less work in exchange for
lower latency and better success rates.

The series was also evaluated using a workload that heavily fragments
memory but the benefits there are also significant, albeit not
presented.

It was commented that we should be rethinking scanning entirely and to a
large extent I agree.  However, to achieve that you need a lot of this
series in place first so it's best to make the linear scanners as best
as possible before ripping them out.

This patch (of 22):

The isolate and migrate scanners should never isolate more than a
pageblock of pages so unsigned int is sufficient saving 8 bytes on a
64-bit build.

Link: http://lkml.kernel.org/r/20190118175136.31341-2-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

42fbe240

mm: move zone watermark accesses behind an accessor · 1bb2b85e

由 Mel Gorman 提交于 4月 04, 2020

to #26255339

commit a921444382b49cc7fdeca3fba3e278bc09484a27 upstream

This is a preparation patch only, no functional change.

Link: http://lkml.kernel.org/r/20181123114528.28802-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

1bb2b85e

alinux: Revert "mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone" · b725e258

由 Yang Shi 提交于 4月 04, 2020

to #26255339

This reverts commit 4d8bdf7f.

The commit was backported from v5.4 to stable tree, but it breaks the
context depended by backporting compaction optimization made in v5.1.
So revert this commit for now, the commit will be re-applied after the
compaction optimization series.
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

b725e258

22 4月, 2020 27 次提交

vt: vt_ioctl: fix race in VT_RESIZEX · 9002dc1a

由 Eric Dumazet 提交于 2月 10, 2020

fix #26220884

commit 6cd1ed50efd88261298577cd92a14f2768eddeeb upstream

We need to make sure vc_cons[i].d is not NULL after grabbing
console_lock(), or risk a crash.

general protection fault, probably for non-canonical address 0xdffffc0000000068: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000340-0x0000000000000347]
CPU: 1 PID: 19462 Comm: syz-executor.5 Not tainted 5.5.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:vt_ioctl+0x1f96/0x26d0 drivers/tty/vt/vt_ioctl.c:883
Code: 74 41 e8 bd a6 84 fd 48 89 d8 48 c1 e8 03 42 80 3c 28 00 0f 85 e4 04 00 00 48 8b 03 48 8d b8 40 03 00 00 48 89 fa 48 c1 ea 03 <42> 0f b6 14 2a 84 d2 74 09 80 fa 03 0f 8e b1 05 00 00 44 89 b8 40
RSP: 0018:ffffc900086d7bb0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffffff8c34ee88 RCX: ffffc9001415c000
RDX: 0000000000000068 RSI: ffffffff83f0e6e3 RDI: 0000000000000340
RBP: ffffc900086d7cd0 R08: ffff888054ce0100 R09: fffffbfff16a2f6d
R10: ffff888054ce0998 R11: ffff888054ce0100 R12: 000000000000001d
R13: dffffc0000000000 R14: 1ffff920010daf79 R15: 000000000000ff7f
FS:  00007f7d13c12700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd477e3c38 CR3: 0000000095d0a000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 tty_ioctl+0xa37/0x14f0 drivers/tty/tty_io.c:2660
 vfs_ioctl fs/ioctl.c:47 [inline]
 ksys_ioctl+0x123/0x180 fs/ioctl.c:763
 __do_sys_ioctl fs/ioctl.c:772 [inline]
 __se_sys_ioctl fs/ioctl.c:770 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:770
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45b399
Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f7d13c11c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f7d13c126d4 RCX: 000000000045b399
RDX: 0000000020000080 RSI: 000000000000560a RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000666 R14: 00000000004c7f04 R15: 000000000075bf2c
Modules linked in:
---[ end trace 80970faf7a67eb77 ]---
RIP: 0010:vt_ioctl+0x1f96/0x26d0 drivers/tty/vt/vt_ioctl.c:883
Code: 74 41 e8 bd a6 84 fd 48 89 d8 48 c1 e8 03 42 80 3c 28 00 0f 85 e4 04 00 00 48 8b 03 48 8d b8 40 03 00 00 48 89 fa 48 c1 ea 03 <42> 0f b6 14 2a 84 d2 74 09 80 fa 03 0f 8e b1 05 00 00 44 89 b8 40
RSP: 0018:ffffc900086d7bb0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffffff8c34ee88 RCX: ffffc9001415c000
RDX: 0000000000000068 RSI: ffffffff83f0e6e3 RDI: 0000000000000340
RBP: ffffc900086d7cd0 R08: ffff888054ce0100 R09: fffffbfff16a2f6d
R10: ffff888054ce0998 R11: ffff888054ce0100 R12: 000000000000001d
R13: dffffc0000000000 R14: 1ffff920010daf79 R15: 000000000000ff7f
FS:  00007f7d13c12700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd477e3c38 CR3: 0000000095d0a000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: stable <stable@vger.kernel.org>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20200210190721.200418-1-edumazet@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

9002dc1a

arm64: enable per-task stack canaries · 81b3f7e6

由 Ard Biesheuvel 提交于 12月 12, 2018

fix #26081752

commit 0a1213fa7432778b71a1c0166bf56660a3aab030 upstream

This enables the use of per-task stack canary values if GCC has
support for emitting the stack canary reference relative to the
value of sp_el0, which holds the task struct pointer in the arm64
kernel.

The $(eval) extends KBUILD_CFLAGS at the moment the make rule is
applied, which means asm-offsets.o (which we rely on for the offset
value) is built without the arguments, and everything built afterwards
has the options set.
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Reviewed-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Acked-by: NZou Cao <zoucao@linux.alibaba.com>

81b3f7e6

spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls · 7deb8737

由 wuxu.wu 提交于 1月 01, 2020

fix #25872428

commit 19b61392c5a852b4e8a0bf35aecb969983c5932d upstream

dw_spi_irq() and dw_spi_transfer_one concurrent calls.

I find a panic in dw_writer(): txw = *(u8 *)(dws->tx), when dw->tx==null,
dw->len==4, and dw->tx_end==1.

When tpm driver's message overtime dw_spi_irq() and dw_spi_transfer_one
may concurrent visit dw_spi, so I think dw_spi structure lack of protection.

Otherwise dw_spi_transfer_one set dw rx/tx buffer and then open irq,
store dw rx/tx instructions and other cores handle irq load dw rx/tx
instructions may out of order.

	[ 1025.321302] Call trace:
	...
	[ 1025.321319]  __crash_kexec+0x98/0x148
	[ 1025.321323]  panic+0x17c/0x314
	[ 1025.321329]  die+0x29c/0x2e8
	[ 1025.321334]  die_kernel_fault+0x68/0x78
	[ 1025.321337]  __do_kernel_fault+0x90/0xb0
	[ 1025.321346]  do_page_fault+0x88/0x500
	[ 1025.321347]  do_translation_fault+0xa8/0xb8
	[ 1025.321349]  do_mem_abort+0x68/0x118
	[ 1025.321351]  el1_da+0x20/0x8c
	[ 1025.321362]  dw_writer+0xc8/0xd0
	[ 1025.321364]  interrupt_transfer+0x60/0x110
	[ 1025.321365]  dw_spi_irq+0x48/0x70
	...
Signed-off-by: Nwuxu.wu <wuxu.wu@huawei.com>
Link: https://lore.kernel.org/r/1577849981-31489-1-git-send-email-wuxu.wu@huawei.comSigned-off-by: NMark Brown <broonie@kernel.org>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>

7deb8737

iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE · 649821ec

由 Suravee Suthikulpanit 提交于 3月 30, 2020

fix #26319040

commit 730ad0ede130015a773229573559e97ba0943065 upstream.

Commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC
(de-)activation code") accidentally left out the ir_data pointer when
calling modity_irte_ga(), which causes the function amd_iommu_update_ga()
to return prematurely due to struct amd_ir_data.ref is NULL and
the "is_run" bit of IRTE does not get updated properly.

This results in bad I/O performance since IOMMU AVIC always generate GA Log
entry and notify IOMMU driver and KVM when it receives interrupt from the
PCI pass-through device instead of directly inject interrupt to the vCPU.

Fixes by passing ir_data when calling modify_irte_ga() as done previously.

Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

649821ec

iommu/amd: Re-factor guest virtual APIC (de-)activation code · d5a28aba

由 Suthikulpanit, Suravee 提交于 3月 30, 2020

fix #26319040

commit b9c6ff94e43a0ee053e0c1d983fba1ac4953b762 upstream.

Re-factore the logic for activate/deactivate guest virtual APIC mode
(GAM)
into helper functions, and export them for other drivers (e.g. SVM).
to support run-time activate/deactivate of SVM AVIC.

Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

d5a28aba

iommu/amd: Lock code paths traversing protection_domain->dev_list · fd130609

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit 2a78f9962565e53b78363eaf516eb052009e8020 upstream.

The traversing of this list requires protection_domain->lock to be taken
to avoid nasty races with attach/detach code. Make sure the lock is held
on all code-paths traversing this list.
Reported-by: NFilippo Sironi <sironi@amazon.de>
Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

fd130609

iommu/amd: Lock dev_data in attach/detach code paths · 30061d95

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit ab7b2577f0d119052b98b8d913bad369ac2760eb upstream.

Make sure that attaching a detaching a device can't race against each
other and protect the iommu_dev_data with a spin_lock in these code
paths.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

30061d95

iommu/amd: Check for busy devices earlier in attach_device() · 9fcb8428

由 tianyi 提交于 3月 30, 2020

fix #26319040

commit 45e528d9c479aeef2d3d1db1e619b243f91e324f upstream.

Check early in attach_device whether the device is already attached to a
domain. This also simplifies the code path so that __attach_device() can
be removed.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

9fcb8428

iommu/amd: Take domain->lock for complete attach/detach path · 19082e2a

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit f6c0bfce271b2dd613e8b8e009eefe89c1f788e8 upstream.

The code-paths before __attach_device() and __detach_device() are called
also access and modify domain state, so take the domain lock there too.
This allows to get rid of the __detach_device() function.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

19082e2a

iommu/amd: Remove amd_iommu_devtable_lock · 7ba34a94

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit 3a11905b69eb026402448c750f97a0eadfa76b08 upstream.

The lock is not necessary because the device table does not
contain shared state that needs protection. Locking is only
needed on an individual entry basis, and that needs to
happen on the iommu_dev_data level.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

7ba34a94

iommu/amd: Remove domain->updated · 1eec40e2

由 Joerg Roedel 提交于 3月 30, 2020

fix #26319040

commit f15d9a992f901d4f22db868adf800844d1cac9f2 upstream.

iommu/amd: Remove domain->updated

This struct member was used to track whether a domain
change requires updates to the device-table and IOMMU cache
flushes. The problem is, that access to this field is racy
since locking in the common mapping code-paths has been
eliminated.

Move the updated field to the stack to get rid of all
potential races and remove the field from the struct.

Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: NFilippo Sironi <sironi@amazon.de>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com>
Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>

1eec40e2

ACPI: PPTT: Consistently use unsigned int as parameter type · dfedf6da

由 Tian Tao 提交于 12月 30, 2019

to #25688970

commit 643956e61ced913a2bbdcf2c95f3d03026b39d1c upstream

The fourth parameter 'level' of function 'acpi_find_cache_level()' is
a signed interger, but its caller 'acpi_find_cache_node()' passes that
parameter an unsigned interger.

Make the paramter type inconsistency go away.
Signed-off-by: NTian Tao <tiantao6@huawei.com>
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
[ rjw: Subject/changelog ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

dfedf6da

ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens · 428adfc0

由 Jeremy Linton 提交于 6月 26, 2019

to #25688970

commit 56855a99f3d0d1e9f1f4e24f5851f9bf14c83296 upstream

ACPI 6.3 adds a flag to indicate that child nodes are all
identical cores. This is useful to authoritatively determine
if a set of (possibly offline) cores are identical or not.

Since the flag doesn't give us a unique id we can generate
one and use it to create bitmaps of sibling nodes, or simply
in a loop to determine if a subset of cores are identical.
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NHanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

428adfc0

ACPI/PPTT: Modify node flag detection to find last IDENTICAL · 03bca653

由 Jeremy Linton 提交于 6月 26, 2019

to #25688970

commit ed2b664fcc8073c09394393756df3fc86977bbac upstream

The ACPI specification implies that the IDENTICAL flag should be
set on all non leaf nodes where the children are identical.
This means that we need to be searching for the last node with
the identical flag set rather than the first one.

Since this flag is also dependent on the table revision, we
need to add a bit of extra code to verify the table revision,
and the next node's state in the traversal. Since we want to
avoid function pointers here, lets just special case
the IDENTICAL flag.
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NHanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

03bca653

ACPI: Fix comment typos · ef551233

由 Bjorn Helgaas 提交于 3月 25, 2019

to #25688970

commit 603fadf33604a2e170eb833f99f569d3597f1f09 upstream

Fix some misspellings in comments.  No functional change intended.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

ef551233

ACPI: tables: Simplify PPTT leaf node detection · e3ff3ded

由 Jeremy Linton 提交于 3月 01, 2019

to #25688970

commit 4909e6df213a7c3e5e282538356f31ab68828793 upstream

ACPI 6.3 bumps the PPTT table revision and adds a LEAF_NODE flag.

This allows us to avoid a second pass through the table to assure
that the node in question is a leaf.
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Reviewed-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

e3ff3ded

ACPI/PPTT: Add acpi_pptt_warn_missing() to consolidate logs · 92cceebb

由 John Garry 提交于 2月 08, 2019

to #25688970

commit 6cafe700b08cfd261a279b9e5ed99f3a346fe3b0 upstream

For a system using ACPI-based FW without a PPTT, we may get many warnings
about the lack of a PPTT, as shown:

root@(none)$ dmesg | grep -i pptt
[    0.010125] ACPI PPTT: No PPTT table found, cpu topology may be inaccurate
[    7.138339] ACPI PPTT: No PPTT table found, cache topology may be inaccurate
[    7.145368] ACPI PPTT: No PPTT table found, cache topology may be inaccurate

These logs are generated with pr_warn_once(), so the intention was for a
single log, but the logs overlap, so consolidate them.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
Acked-by: Nzou cao <zoucao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

92cceebb

drm/amdgpu: add VM eviction lock v3 · d3dc53f4

由 Christian König 提交于 3月 18, 2020

to #25447038

commit b4ff0f8a85f3c523942e57b716e8722e7f6799cc upstream.

This allows to invalidate VM entries without taking the reservation
lock.

v3: use -EBUSY
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Nmfkang <mfkang@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

d3dc53f4

drm/amdgpu: move VM eviction decision into amdgpu_vm.c · 50638c21

由 Christian König 提交于 3月 19, 2020

to #25447038

commit 6ceeb144b1d6952a36afa6c29718beac575f2a3f upstream.

When a page tables needs to be evicted the VM code should
decide if that is possible or not.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Nmfkang <mfkang@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

50638c21

drm/amdgpu: stop evicting busy PDs/PTs · c7723fa2

由 Christian König 提交于 11月 07, 2018

to #25447038

commit 1bd4e4ca7bb8f681ff4e2b05c97ce975ccd781d6 upstream.

Otherwise we won't be able to cleanly handle page faults.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Nmfkang <mfkang@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

c7723fa2

sysctl: handle overflow in proc_get_long · 662ef34f

由 Christian Brauner 提交于 3月 07, 2019

fix #27124689

commit 7f2923c4f73f21cfd714d12a2d48de8c21f11cfe upstream.

proc_get_long() is a funny function.  It uses simple_strtoul() and for a
good reason.  proc_get_long() wants to always succeed the parse and
return the maybe incorrect value and the trailing characters to check
against a pre-defined list of acceptable trailing values.  However,
simple_strtoul() explicitly ignores overflows which can cause funny
things like the following to happen:

  echo 18446744073709551616 > /proc/sys/fs/file-max
  cat /proc/sys/fs/file-max
  0

(Which will cause your system to silently die behind your back.)

On the other hand kstrtoul() does do overflow detection but does not
return the trailing characters, and also fails the parse when anything
other than '\n' is a trailing character whereas proc_get_long() wants to
be more lenient.

Now, before adding another kstrtoul() function let's simply add a static
parse strtoul_lenient() which:
 - fails on overflow with -ERANGE
 - returns the trailing characters to the caller

The reason why we should fail on ERANGE is that we already do a partial
fail on overflow right now.  Namely, when the TMPBUFLEN is exceeded.  So
we already reject values such as 184467440737095516160 (21 chars) but
accept values such as 18446744073709551616 (20 chars) but both are
overflows.  So we should just always reject 64bit overflows and not
special-case this based on the number of chars.

Link: http://lkml.kernel.org/r/20190107222700.15954-2-christian@brauner.ioSigned-off-by: NChristian Brauner <christian@brauner.io>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

662ef34f

configs: align configs of aarch64 to x86_64 · d56f1adb

由 Shile Zhang 提交于 4月 14, 2020

to #26536261

Keep the common configs same between x86_64 and aarch64.
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d56f1adb

configs: update aarch64 config · 753594c9

由 Shile Zhang 提交于 4月 14, 2020

to #24582903

Update aarch64 configs since gcc version and more minor changes.
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

753594c9

SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge · a10b7c40

由 Yihao Wu 提交于 4月 06, 2020

fix #25707555

commit 43e33924c38e8faeb0c12035481cb150e602e39d linux-next

Deleting list entry within hlist_for_each_entry_safe is not safe unless
next pointer (tmp) is protected too. It's not, because once hash_lock
is released, cache_clean may delete the entry that tmp points to. Then
cache_purge can walk to a deleted entry and tries to double free it.

Fix this bug by holding only the deleted entry's reference.
Suggested-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NNeilBrown <neilb@suse.de>
[ cel: removed unused variable ]
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a10b7c40

sched: Avoid scale real weight down to zero · 9b83fd88

由 Michael Wang 提交于 3月 27, 2020

fix #26198889

commit 26cf52229efc87e2effa9d788f9b33c40fb3358a linux-next

During our testing, we found a case that shares no longer
working correctly, the cgroup topology is like:

  /sys/fs/cgroup/cpu/A		(shares=102400)
  /sys/fs/cgroup/cpu/A/B	(shares=2)
  /sys/fs/cgroup/cpu/A/B/C	(shares=1024)

  /sys/fs/cgroup/cpu/D		(shares=1024)
  /sys/fs/cgroup/cpu/D/E	(shares=1024)
  /sys/fs/cgroup/cpu/D/E/F	(shares=1024)

The same benchmark is running in group C & F, no other tasks are
running, the benchmark is capable to consumed all the CPUs.

We suppose the group C will win more CPU resources since it could
enjoy all the shares of group A, but it's F who wins much more.

The reason is because we have group B with shares as 2, since
A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
so A->cfs_rq.load.weight become very small.

And in calc_group_shares() we calculate shares as:

  load = max(scale_load_down(cfs_rq->load.weight),
cfs_rq->avg.load_avg);
  shares = (tg_shares * load) / tg_weight;

Since the 'cfs_rq->load.weight' is too small, the load become 0
after scale down, although 'tg_shares' is 102400, shares of the se
which stand for group A on root cfs_rq become 2.

While the se of D on root cfs_rq is far more bigger than 2, so it
wins the battle.

Thus when scale_load_down() scale real weight down to 0, it's no
longer telling the real story, the caller will have the wrong
information and the calculation will be buggy.

This patch add check in scale_load_down(), so the real weight will
be >= MIN_SHARES after scale, after applied the group C wins as
expected.
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.comAcked-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>

9b83fd88

sched/fair: Fix race between runtime distribution and assignment · 70a23044

由 Huaixin Chang 提交于 3月 24, 2020

fix #25892693

commit 26a8b12747c975b33b4a82d62e4a307e1c07f31b upstream

Currently, there is a potential race between distribute_cfs_runtime()
and assign_cfs_rq_runtime(). Race happens when cfs_b->runtime is read,
distributes without holding lock and finds out there is not enough
runtime to charge against after distribution. Because
assign_cfs_rq_runtime() might be called during distribution, and use
cfs_b->runtime at the same time.

Fibtest is the tool to test this race. Assume all gcfs_rq is throttled
and cfs period timer runs, slow threads might run and sleep, returning
unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local
pool. If all this happens sufficiently quickly, cfs_b->runtime will drop
a lot. If runtime distributed is large too, over-use of runtime happens.

A runtime over-using by about 70 percent of quota is seen when we
test fibtest on a 96-core machine. We run fibtest with 1 fast thread and
95 slow threads in test group, configure 10ms quota for this group and
see the CPU usage of fibtest is 17.0%, which is far from than the
expected 10%.

On a smaller machine with 32 cores, we also run fibtest with 96
threads. CPU usage is more than 12%, which is also more than expected
10%. This shows that on similar workloads, this race do affect CPU
bandwidth control.

Solve this by holding lock inside distribute_cfs_runtime().

Fixes: c06f04c7 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
Signed-off-by: NHuaixin Chang <changhuaixin@linux.alibaba.com>
Reviewed-by: NBen Segall <bsegall@google.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
Link: https://lore.kernel.org/lkml/20200325092602.22471-1-changhuaixin@linux.alibaba.com/Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>

70a23044

alinux: cgroup: Fix task_css_check rcu warnings · 798cfa76

由 Xunlei Pang 提交于 3月 23, 2020

to #26424323

task_css() should be protected by rcu, fix several callers.

Fixes: 1f49a738 ("alinux: psi: Support PSI under cgroup v1")
Acked-by: NMichael Wang <yun.wany@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NYang Shi <yang.shi@linux.alibaba.com>

798cfa76

21 4月, 2020 1 次提交

alinux: config: disable CONFIG_NFS_V3_ACL and CONFIG_NFSD_V3_ACL · 29846134

由 Chunmei Xu 提交于 4月 20, 2020

to #26616987

Disable CONFIG_NFS_V3_ACL and CONFIG_NFSD_V3_ACL for aarch64,
to be same with x86
Signed-off-by: NChunmei Xu <xuchunmei@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

29846134

17 4月, 2020 3 次提交

alinux: kernel: reap zombie process by specified pid · ac2b5c94

由 zhongjiang-ali 提交于 2月 26, 2020

to #26788859

We've met several real-world issues that the child reaper
(i.e. systemd) gets stuck in some aborted status and cann't
reap its zombie children, so we provide the interface to do
By specified the pid.
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com>

ac2b5c94

alinux: Fix an potential null pointer reference in dump_header · e483e6eb

由 zhongjiang-ali 提交于 2月 25, 2020

to #26424311

Commit 5028e358 ("alinux: mm: oom_kill: show killed task's cgroup
info in global oom") introduces an potential null pointer reference. It
is because the task 'p' maybe an null pointer in same code path.

Fixes: 5028e358 ("alinux: mm: oom_kill: show killed task's cgroup
info in global oom")
Signed-off-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

e483e6eb

mm: do not allow MADV_PAGEOUT for CoW pages · 7e691477

由 Michal Hocko 提交于 3月 31, 2020

task #25182720

commit 12e967fd8e4e6c3d275b4c69c890adc838891300 upstream

Jann has brought up a very interesting point [1].  While shared pages
are excluded from MADV_PAGEOUT normally, CoW pages can be easily
reclaimed that way.  This can lead to all sorts of hard to debug
problems.  E.g.  performance problems outlined by Daniel [2].

There are runtime environments where there is a substantial memory
shared among security domains via CoW memory and a easy to reclaim way
of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either
performance degradation in for the parent process which might be more
privileged or even open side channel attacks.

The feasibility of the latter is not really clear to me TBH but there is
no real reason for exposure at this stage.  It seems there is no real
use case to depend on reclaiming CoW memory via madvise at this stage so
it is much easier to simply disallow it and this is what this patch
does.  Put it simply MADV_{PAGEOUT,COLD} can operate only on the
exclusively owned memory which is a straightforward semantic.

[1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
[2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com

Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

7e691477

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功