提交 · 0c5c16c4533c9d27b97b9f947623b9c35e7c9377 · openeuler / Kernel

03 11月, 2022 29 次提交

netfilter: nfnetlink_osf: fix possible bogus match in nf_osf_find() · 0c5c16c4

由 Pablo Neira Ayuso 提交于 11月 03, 2022

stable inclusion
from stable-v5.10.146
commit 5d75fef3e61e797fab5c3fbba88caa74ab92ad47
category: bugfix
bugzilla: 187890, https://gitee.com/src-openeuler/kernel/issues/I5X2IJ
CVE: CVE-2022-42432

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5d75fef3e61e797fab5c3fbba88caa74ab92ad47

--------------------------------

[ Upstream commit 559c36c5 ]

nf_osf_find() incorrectly returns true on mismatch, this leads to
copying uninitialized memory area in nft_osf which can be used to leak
stale kernel stack data to userspace.

Fixes: 22c7652c ("netfilter: nft_osf: Add version option support")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NLu Wei <luwei32@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0c5c16c4

nilfs2: fix leak of nilfs_root in case of writer thread creation failure · 6c736224

由 Ryusuke Konishi 提交于 11月 03, 2022

mainline inclusion
from mainline-v6.1-rc1
commit d0d51a97
category: bugfix
bugzilla: 187884,https://gitee.com/src-openeuler/kernel/issues/I5X2OB
CVE: CVE-2022-3646

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0d51a97063db4704a5ef6bc978dddab1636a306

--------------------------------

If nilfs_attach_log_writer() failed to create a log writer thread, it
frees a data structure of the log writer without any cleanup.  After
commit e912a5b6 ("nilfs2: use root object to get ifile"), this causes
a leak of struct nilfs_root, which started to leak an ifile metadata inode
and a kobject on that struct.

In addition, if the kernel is booted with panic_on_warn, the above
ifile metadata inode leak will cause the following panic when the
nilfs2 kernel module is removed:

  kmem_cache_destroy nilfs2_inode_cache: Slab cache still has objects when
  called from nilfs_destroy_cachep+0x16/0x3a [nilfs2]
  WARNING: CPU: 8 PID: 1464 at mm/slab_common.c:494 kmem_cache_destroy+0x138/0x140
  ...
  RIP: 0010:kmem_cache_destroy+0x138/0x140
  Code: 00 20 00 00 e8 a9 55 d8 ff e9 76 ff ff ff 48 8b 53 60 48 c7 c6 20 70 65 86 48 c7 c7 d8 69 9c 86 48 8b 4c 24 28 e8 ef 71 c7 00 <0f> 0b e9 53 ff ff ff c3 48 81 ff ff 0f 00 00 77 03 31 c0 c3 53 48
  ...
  Call Trace:
   <TASK>
   ? nilfs_palloc_freev.cold.24+0x58/0x58 [nilfs2]
   nilfs_destroy_cachep+0x16/0x3a [nilfs2]
   exit_nilfs_fs+0xa/0x1b [nilfs2]
    __x64_sys_delete_module+0x1d9/0x3a0
   ? __sanitizer_cov_trace_pc+0x1a/0x50
   ? syscall_trace_enter.isra.19+0x119/0x190
   do_syscall_64+0x34/0x80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
   ...
   </TASK>
  Kernel panic - not syncing: panic_on_warn set ...

This patch fixes these issues by calling nilfs_detach_log_writer() cleanup
function if spawning the log writer thread fails.

Link: https://lkml.kernel.org/r/20221007085226.57667-1-konishi.ryusuke@gmail.com
Fixes: e912a5b6 ("nilfs2: use root object to get ifile")
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+7381dc4ad60658ca4c05@syzkaller.appspotmail.com
Tested-by: NRyusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6c736224

net: mvpp2: fix mvpp2 debugfs leak · 510d05d8

由 Russell King (Oracle) 提交于 11月 03, 2022

mainline inclusion
from mainline-v6.0-rc7
commit 0152dfee
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5W7AX
CVE: CVE-2022-3535

--------------------------------

When mvpp2 is unloaded, the driver specific debugfs directory is not
removed, which technically leads to a memory leak. However, this
directory is only created when the first device is probed, so the
hardware is present. Removing the module is only something a developer
would to when e.g. testing out changes, so the module would be
reloaded. So this memory leak is minor.

The original attempt in commit fe2c9c61 ("net: mvpp2: debugfs: fix
memory leak when using debugfs_lookup()") that was labelled as a memory
leak fix was not, it fixed a refcount leak, but in doing so created a
problem when the module is reloaded - the directory already exists, but
mvpp2_root is NULL, so we lose all debugfs entries. This fix has been
reverted.

This is the alternative fix, where we remove the offending directory
whenever the driver is unloaded.

Fixes: 21da57a2 ("net: mvpp2: add a debugfs interface for the Header Parser")
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NMarcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/r/E1ofOAB-00CzkG-UO@rmk-PC.armlinux.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Reviewed-by: Nzhangqiao <zhangqiao22@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

510d05d8

x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry · e930096f

由 Chen Zhongjin 提交于 11月 03, 2022

mainline inclusion
from mainline-v6.0-rc3
commit fc2e426b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q4RA
CVE: NA

--------------------------------

When meeting ftrace trampolines in ORC unwinding, unwinder uses address
of ftrace_{regs_}call address to find the ORC entry, which gets next frame at
sp+176.

If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be
sp+8 instead of 176. It makes unwinder skip correct frame and throw
warnings such as "wrong direction" or "can't access registers", etc,
depending on the content of the incorrect frame address.

By adding the base address ftrace_{regs_}caller with the offset
*ip - ops->trampoline*, we can get the correct address to find the ORC entry.

Also change "caller" to "tramp_addr" to make variable name conform to
its content.

[ mingo: Clarified the changelog a bit. ]

Fixes: 6be7fa3c ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines")
Signed-off-by: NChen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.comSigned-off-by: NChen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e930096f

kprobes: don't call disarm_kprobe() for disabled kprobes · fe227102

由 Kuniyuki Iwashima 提交于 11月 03, 2022

mainline inclusion
from mainline-v6.0-rc3
commit 9c80e799
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q4Z0
CVE: NA

--------------------------------

The assumption in __disable_kprobe() is wrong, and it could try to disarm
an already disarmed kprobe and fire the WARN_ONCE() below. [0]  We can
easily reproduce this issue.

1. Write 0 to /sys/kernel/debug/kprobes/enabled.

  # echo 0 > /sys/kernel/debug/kprobes/enabled

2. Run execsnoop.  At this time, one kprobe is disabled.

  # /usr/share/bcc/tools/execsnoop &
  [1] 2460
  PCOMM            PID    PPID   RET ARGS

  # cat /sys/kernel/debug/kprobes/list
  ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
  ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]

3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes
   kprobes_all_disarmed to false but does not arm the disabled kprobe.

  # echo 1 > /sys/kernel/debug/kprobes/enabled

  # cat /sys/kernel/debug/kprobes/list
  ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
  ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]

4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the
   disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().

  # fg
  /usr/share/bcc/tools/execsnoop
  ^C

Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses
some cleanups and leaves the aggregated kprobe in the hash table.  Then,
__unregister_trace_kprobe() initialises tk->rp.kp.list and creates an
infinite loop like this.

  aggregated kprobe.list -> kprobe.list -.
                                     ^    |
                                     '.__.'

In this situation, these commands fall into the infinite loop and result
in RCU stall or soft lockup.

  cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the
                                       infinite loop with RCU.

  /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex,
                                   and __get_valid_kprobe() is stuck in
				   the loop.

To avoid the issue, make sure we don't call disarm_kprobe() for disabled
kprobes.

[0]
Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2)
WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
Modules linked in: ena
CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28
Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94
RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001
RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff
RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff
R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40
R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000
FS:  00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
 __disable_kprobe (kernel/kprobes.c:1716)
 disable_kprobe (kernel/kprobes.c:2392)
 __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340)
 disable_trace_kprobe (kernel/trace/trace_kprobe.c:429)
 perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168)
 perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295)
 _free_event (kernel/events/core.c:4971)
 perf_event_release_kernel (kernel/events/core.c:5176)
 perf_release (kernel/events/core.c:5186)
 __fput (fs/file_table.c:321)
 task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1))
 exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201)
 syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
 do_syscall_64 (arch/x86/entry/common.c:87)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7fe7ff210654
Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc
RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654
RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008
RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30
R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600
R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560
</TASK>

Link: https://lkml.kernel.org/r/20220813020509.90805-1-kuniyu@amazon.com
Fixes: 69d54b91 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reported-by: NAyushman Dutta <ayudutta@amazon.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
Cc: Ayushman Dutta <ayudutta@amazon.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NChen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fe227102

block: fix inaccurate io_ticks by set 'precise_iostat' · 05b2399b

由 Zhang Wensheng 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5X6RT
CVE: NA

--------------------------------

After introducing commit 5b18b5a7 ("block: delete part_round_stats
and switch to less precise counting"), '%util' accounted by iostat
will be over reality data. In fact, the device is quite idle, but
iostat may show '%util' as a big number (e.g. 50%). It can produce by fio:

fio --name=1 --direct=1 --bs=4k --rw=read --filename=/dev/sda \
	   --thinktime=4ms --runtime=180
We fix this by using a switch(precise_iostat=1) to control whether or not
acconut ioticks precisely.

fixes: 5b18b5a7 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

05b2399b

arm64: fix rodata=full · e45f1890

由 Mark Rutland 提交于 11月 03, 2022

mainline inclusion
from mainline-v6.0-rc3
commit 2e8cff0a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PPS3?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2e8cff0a0eee87b27f0cf87ad8310eb41b5886ab

--------------------------------

On arm64, "rodata=full" has been suppored (but not documented) since
commit:

  c55191e9 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well")

As it's necessary to determine the rodata configuration early during
boot, arm64 has an early_param() handler for this, whereas init/main.c
has a __setup() handler which is run later.

Unfortunately, this split meant that since commit:

  f9a40b08 ("init/main.c: return 1 from handled __setup() functions")

... passing "rodata=full" would result in a spurious warning from the
__setup() handler (though RO permissions would be configured
appropriately).

Further, "rodata=full" has been broken since commit:

  0d6ea3ac ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")

... which caused strtobool() to parse "full" as false (in addition to
many other values not documented for the "rodata=" kernel parameter.

This patch fixes this breakage by:

* Moving the core parameter parser to an __early_param(), such that it
  is available early.

* Adding an (optional) arch hook which arm64 can use to parse "full".

* Updating the documentation to mention that "full" is valid for arm64.

* Having the core parameter parser handle "on" and "off" explicitly,
  such that any undocumented values (e.g. typos such as "ful") are
  reported as errors rather than being silently accepted.

Note that __setup() and early_param() have opposite conventions for
their return values, where __setup() uses 1 to indicate a parameter was
handled and early_param() uses 0 to indicate a parameter was handled.

Fixes: f9a40b08 ("init/main.c: return 1 from handled __setup() functions")
Fixes: 0d6ea3ac ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

conflicts:
	Documentation/admin-guide/kernel-parameters.txt
	arch/arm64/include/asm/setup.h
	arch/arm64/mm/mmu.c
	init/main.c
Signed-off-by: NXia Longlong <xialonglong1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e45f1890

block: fix kabi broken in request_queue · f732aa57

由 Yu Kuai 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

--------------------------------
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f732aa57

blk-mq: fix kabi broken in blk_mq_tags · 55d9222b

由 Yu Kuai 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

--------------------------------
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

55d9222b

blk-mq: fix kabi broken in blk_mq_tag_set · bd9180c9

由 Yu Kuai 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

--------------------------------
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bd9180c9

blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags · 9a02a415

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 0994c64e
category: bugfix
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0994c64eb4159ba019e7fedc7ba0dd6a69235b40

--------------------------------

Since it is now possible for a tagset to share a single set of tags, the
iter function should not re-iter the tags for the count of #hw queues in
that case. Rather it should just iter once.

Fixes: e155b0c2 ("blk-mq: Use shared tags for shared sbitmap support")
Reported-by: NKashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NKashyap Desai <kashyap.desai@broadcom.com>
Link: https://lore.kernel.org/r/1634550083-202815-1-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9a02a415

blk-mq-sched: Don't reference queue tagset in blk_mq_sched_tags_teardown() · d55cc591

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 8bdf7b3f
category: bugfix
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8bdf7b3fe1f48a2c1c212d4685903bba01409c0e

--------------------------------

We should not reference the queue tagset in blk_mq_sched_tags_teardown()
(see function comment) for the blk-mq flags, so use the passed flags
instead.

This solves a use-after-free, similarly fixed earlier (and since broken
again) in commit f0c1c4d2 ("blk-mq: fix use-after-free in
blk_mq_exit_sched").
Reported-by: NLinux Kernel Functional Testing <lkft@linaro.org>
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: NAnders Roxell <anders.roxell@linaro.org>
Fixes: e155b0c2 ("blk-mq: Use shared tags for shared sbitmap support")
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1634890340-15432-1-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
	block/blk-mq-sched.c
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d55cc591

blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags · c42786d6

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.17-rc1
commit fea9f92f
category: bugfix
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fea9f92f1748083cb82049ed503be30c3d3a9b69

--------------------------------

Kashyap reports high CPU usage in blk_mq_queue_tag_busy_iter() and callees
using megaraid SAS RAID card since moving to shared tags [0].

Previously, when shared tags was shared sbitmap, this function was less
than optimum since we would iter through all tags for all hctx's,
yet only ever match upto tagset depth number of rqs.

Since the change to shared tags, things are even less efficient if we have
parallel callers of blk_mq_queue_tag_busy_iter(). This is because in
bt_iter() -> blk_mq_find_and_get_req() there would be more contention on
accessing each request ref and tags->lock since they are now shared among
all HW queues.

Optimise by having separate calls to bt_for_each() for when we're using
shared tags. In this case no longer pass a hctx, as it is no longer
relevant, and teach bt_iter() about this.

Ming suggested something along the lines of this change, apart from a
different implementation.

[0] https://lore.kernel.org/linux-block/e4e92abbe9d52bcba6b8cc6c91c442cc@mail.gmail.com/Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reported-and-tested-by: NKashyap Desai <kashyap.desai@broadcom.com>
Fixes: e155b0c2 ("blk-mq: Use shared tags for shared sbitmap support")
Link: https://lore.kernel.org/r/1638794990-137490-4-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
	block/blk-mq-tag.c
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c42786d6

blk-mq: Stop using pointers for blk_mq_tags bitmap tags · 46efdd72

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit ae0f1a73
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae0f1a732f4a5db284e2af02c305255734efd19c

--------------------------------

Now that we use shared tags for shared sbitmap support, we don't require
the tags sbitmap pointers, so drop them.

This essentially reverts commit 222a5ae0 ("blk-mq: Use pointers for
blk_mq_tags bitmap tags").

Function blk_mq_init_bitmap_tags() is removed also, since it would be only
a wrappper for blk_mq_init_bitmaps().
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1633429419-228500-14-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

46efdd72

blk-mq: Use shared tags for shared sbitmap support · bb59b765

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit e155b0c2
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e155b0c238b20f0a866f4334d292656665836c8a

--------------------------------

Currently we use separate sbitmap pairs and active_queues atomic_t for
shared sbitmap support.

However a full sets of static requests are used per HW queue, which is
quite wasteful, considering that the total number of requests usable at
any given time across all HW queues is limited by the shared sbitmap depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set or request queue, and
not per HW queue.

So replace the sbitmap pairs and active_queues atomic_t with a shared
tags per tagset and request queue, which will hold a set of shared static
rqs.

Since there is now no valid HW queue index to be passed to the blk_mq_ops
.init and .exit_request callbacks, pass an invalid index token. This
changes the semantics of the APIs, such that the callback would need to
validate the HW queue index before using it. Currently no user of shared
sbitmap actually uses the HW queue index (as would be expected).
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Conflict: c6f9c0e2 ("blk-mq: allow hardware queue to get more tag
while sharing a tag set") is merged, which will cause lots of conflicts
for this patch, and in the meantime, the functionality will need to be
adapted in that patch.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bb59b765

blk-mq: Always use blk_mq_is_sbitmap_shared · 3494cec3

由 Nikolay Borisov 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.13-rc1
commit 39aa56db
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39aa56db50b9ca5cad597e561b4b160b6cbbb65b

--------------------------------
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20210311081713.2763171-1-nborisov@suse.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3494cec3

blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}() · b3260e04

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 645db34e
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=645db34e50501aac141713fb47a315e5202ff890

--------------------------------

Refactor blk_mq_free_map_and_requests() such that it can be used at many
sites at which the tag map and rqs are freed.

Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
alloc equivalent.
Suggested-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-12-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflict: commit a846a8e6 ("blk-mq: don't free tags if the tag_set is
used by other device in queue initialztion") is already backported,
blk_mq_free_map_and_rqs() is moved to __blk_mq_update_nr_hw_queues()
instead of blk_mq_realloc_hw_ctxs().
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b3260e04

blk-mq: Add blk_mq_alloc_map_and_rqs() · 625089f4

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 63064be1
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63064be150e4b1ba1e4af594ef5aa81adf21a52d

--------------------------------

Add a function to combine allocating tags and the associated requests,
and factor out common patterns to use this new function.

Some function only call blk_mq_alloc_map_and_rqs() now, but more
functionality will be added later.

Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they
are only used in blk-mq.c, and finally rename some functions for
conciseness and consistency with other function names:
- __blk_mq_alloc_map_and_{request -> rqs}()
- blk_mq_alloc_{map_and_requests -> set_map_and_rqs}()
Suggested-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-11-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

625089f4

blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap() · 61034a65

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit a7e7388d
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a7e7388dced47a10ca13ae95ca975ea2830f196b

--------------------------------

Put the functionality to update the sched shared sbitmap size in a common
function.

Since the same formula is always used to resize, and it can be got from
the request queue argument, so just pass the request queue pointer.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-10-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

61034a65

blk-mq: Don't clear driver tags own mapping · 55419a90

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 4f245d5b
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4f245d5bf0f7432c881e22a77066160a6cba8e03

--------------------------------

Function blk_mq_clear_rq_mapping() is required to clear the sched tags
mappings in driver tags rqs[].

But there is no need for a driver tags to clear its own mapping, so skip
clearing the mapping in this scenario.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-9-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

55419a90

blk-mq: Pass driver tags to blk_mq_clear_rq_mapping() · dcaefd5d

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit f32e4eaf
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f32e4eafaf29aa493bdea3123dac09ad135b599b

--------------------------------

Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags
in future, so pass a driver tags pointer instead of the tagset container
and HW queue index.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-8-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dcaefd5d

blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}() · 456e4454

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 1820f4f0
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1820f4f0a5e75633ff384167027bf45ed1b10234

--------------------------------

To be more concise and consistent in naming, rename
blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs().
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

456e4454

blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}() · 49bf36b8

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit d99a6bb3
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d99a6bb337677d812d5bef7795c9fcf17f1ccebe

--------------------------------

Function blk_mq_sched_alloc_tags() does same as
__blk_mq_alloc_map_and_request(), so give a similar name to be consistent.

Similarly rename label err_free_tags -> err_free_map_and_rqs.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-6-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

49bf36b8

blk-mq: Invert check in blk_mq_update_nr_requests() · dad99ea7

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.19-rc1
commit f6adcef5
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f6adcef5f317c78a899ca9766e90e47026834158

--------------------------------

It's easier to read:

if (x)
	X;
else
	Y;

over:

if (!x)
	Y;
else
	X;

No functional change intended.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-5-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dad99ea7

blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() · 76bea5f8

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 8fa04464
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8fa044640f128c0cd015f72cda42989a664889fc

--------------------------------

For shared sbitmap, if the call to blk_mq_tag_update_depth() was
successful for any hctx when hctx->sched_tags is not set, then it would be
successful for all (due to nature in which blk_mq_tag_update_depth()
fails).

As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for
each hctx. So relocate the call until after the hctx iteration under the
!q->elevator check, which is equivalent (to !hctx->sched_tags).
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-4-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

76bea5f8

blk-mq: Change rqs check in blk_mq_free_rqs() · 1bc13e6d

由 John Garry 提交于 11月 03, 2022

mainline inclusion
from mainline-v5.16-rc1
commit 65de57bb
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=65de57bb2e66f1fbede166c1307570ebd09eae83

--------------------------------

The original code in commit 24d2f903 ("blk-mq: split out tag
initialization, support shared tags") would check tags->rqs is non-NULL and
then dereference tags->rqs[].

Then in commit 2af8cbe3 ("blk-mq: split tag ->rqs[] into two"), we
started to dereference tags->static_rqs[], but continued to check non-NULL
tags->rqs.

Check tags->static_rqs as non-NULL instead, which is more logical.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-2-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1bc13e6d

Revert "blk-mq: fix kabi broken by "blk-mq: Use request queue-wide tags for tagset-wide sbitmap"" · 15a3940b

由 Zhang Wensheng 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

--------------------------------

This reverts commit 26eda960.

The related feilds will be modified in later patches which are
backported from mainline.
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

15a3940b

block: fix null-deref in percpu_ref_put · 51e35e67

由 Zhang Wensheng 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5N162
CVE: NA

--------------------------------

In the use of q_usage_counter of request_queue, blk_cleanup_queue using
"wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
to wait q_usage_counter becoming zero. however, if the q_usage_counter
becoming zero quickly, and percpu_ref_exit will execute and ref->data
will be freed, maybe another process will cause a null-defef problem
like below:

	CPU0                             CPU1
blk_cleanup_queue
 blk_freeze_queue
  blk_mq_freeze_queue_wait
				scsi_end_request
				 percpu_ref_get
				 ...
				 percpu_ref_put
				  atomic_long_sub_and_test
  percpu_ref_exit
   ref->data -> NULL
   				   ref->data->release(ref) -> null-deref

Fix it by setting flag(QUEUE_FLAG_USAGE_COUNT_SYNC) to add synchronization
mechanism, when ref->data->release is called, the flag will be setted,
and the "wait_event" in blk_mq_freeze_queue_wait must wait flag becoming
true as well, which will limit percpu_ref_exit to execute ahead of time.
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

51e35e67

!161 SPR: IPI Virtualization Support · 3eb8e5e6

由 openeuler-ci-bot 提交于 11月 03, 2022

Merge Pull Request from: @x56Jason 
 
This PR is to enable IPI virtualization support for Intel SPR, meanwhile fix kabi changes.

## Intel-Kernel Issue
#I5ODSC

## Test

1, build and boot succeed with CONFIG_SMP enabled or disabled.
2, Use IPI benchmark (https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@caviumnetworks.com) to do unicast IPI testing ("Normal IPI" case in the benchmark):
 - With host disabled IPI virtualization, and guest enable x2apic mode, we can see a lot of MSR_WR_VMEXITs
 - With host enabled IPI virtualization, and guest enable x2apic mode, MSR_WR_VMEXITs caused by IPI disappears.

## Known Issue
N/A

## Default Config Change
N/A
 
 
Link:https://gitee.com/openeuler/kernel/pulls/161 
Reviewed-by: Chen Wei <chenwei@xfusion.com> 
Reviewed-by: Kevin Zhu <zhukeqian1@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

3eb8e5e6

02 11月, 2022 11 次提交

x86: Use -mindirect-branch-cs-prefix for RETPOLINE builds · 0d3b47a1

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit 6849ed81a33ae616bfadf40e5c68af265d106dff
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6849ed81a33ae616bfadf40e5c68af265d106dff

--------------------------------

commit 68cf4f2a upstream.

In order to further enable commit:

  bbe2df3f ("x86/alternative: Try inline spectre_v2=retpoline,amd")

add the new GCC flag -mindirect-branch-cs-prefix:

  https://gcc.gnu.org/g:2196a681d7810ad8b227bf983f38ba716620545e
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
  https://bugs.llvm.org/show_bug.cgi?id=52323

to RETPOLINE=y builds. This should allow fully inlining retpoline,amd
for GCC builds.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NNick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20211119165630.276205624@infradead.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

0d3b47a1

x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit · 428bc200

由 Jiri Slaby 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit ecc0d92a9f6cc3f74b67d2c9887d0c800018e661
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ecc0d92a9f6cc3f74b67d2c9887d0c800018e661

--------------------------------

commit 3131ef39 upstream.

The build on x86_32 currently fails after commit

  9bb2ec60 (objtool: Update Retpoline validation)

with:

  arch/x86/kernel/../../x86/xen/xen-head.S:35: Error: no such instruction: `annotate_unret_safe'

ANNOTATE_UNRET_SAFE is defined in nospec-branch.h. And head_32.S is
missing this include. Fix this.

Fixes: 9bb2ec60 ("objtool: Update Retpoline validation")
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/63e23f80-033f-f64e-7522-2816debbc367@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

428bc200

objtool: Fix objtool regression on x32 systems · c752c887

由 Mikulas Patocka 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit 236b959da9d145c311f9daa65e3fbbe1a3c9c9af
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=236b959da9d145c311f9daa65e3fbbe1a3c9c9af

--------------------------------

commit 22682a07 upstream.

Commit c087c6e7 ("objtool: Fix type of reloc::addend") failed to
appreciate cross building from ILP32 hosts, where 'int' == 'long' and
the issue persists.

As such, use s64/int64_t/Elf64_Sxword for this field and suffer the
pain that is ISO C99 printf formats for it.

Fixes: c087c6e7 ("objtool: Fix type of reloc::addend")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
[peterz: reword changelog, s/long long/s64/]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2205161041260.11556@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

c752c887

objtool: Fix symbol creation · 32553103

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit e1db6c8a69ec74aa0ecff19857895d10f39c6d98
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e1db6c8a69ec74aa0ecff19857895d10f39c6d98

--------------------------------

commit ead165fa upstream.

Nathan reported objtool failing with the following messages:

  warning: objtool: no non-local symbols !?
  warning: objtool: gelf_update_symshndx: invalid section index

The problem is due to commit 4abff6d4 ("objtool: Fix code relocs
vs weak symbols") failing to consider the case where an object would
have no non-local symbols.

The problem that commit tries to address is adding a STB_LOCAL symbol
to the symbol table in light of the ELF spec's requirement that:

  In each symbol table, all symbols with STB_LOCAL binding preced the
  weak and global symbols.  As ``Sections'' above describes, a symbol
  table section's sh_info section header member holds the symbol table
  index for the first non-local symbol.

The approach taken is to find this first non-local symbol, move that
to the end and then re-use the freed spot to insert a new local symbol
and increment sh_info.

Except it never considered the case of object files without global
symbols and got a whole bunch of details wrong -- so many in fact that
it is a wonder it ever worked :/

Specifically:

 - It failed to re-hash the symbol on the new index, so a subsequent
   find_symbol_by_index() would not find it at the new location and a
   query for the old location would now return a non-deterministic
   choice between the old and new symbol.

 - It failed to appreciate that the GElf wrappers are not a valid disk
   format (it works because GElf is basically Elf64 and we only
   support x86_64 atm.)

 - It failed to fully appreciate how horrible the libelf API really is
   and got the gelf_update_symshndx() call pretty much completely
   wrong; with the direct consequence that if inserting a second
   STB_LOCAL symbol would require moving the same STB_GLOBAL symbol
   again it would completely come unstuck.

Write a new elf_update_symbol() function that wraps all the magic
required to update or create a new symbol at a given index.

Specifically, gelf_update_sym*() require an @ndx argument that is
relative to the @data argument; this means you have to manually
iterate the section data descriptor list and update @ndx.

Fixes: 4abff6d4 ("objtool: Fix code relocs vs weak symbols")
Reported-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Tested-by: NNathan Chancellor <nathan@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/YoPCTEYjoPqE4ZxB@hirez.programming.kicks-ass.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 5.10: elf_hash_add() takes a hash table pointer,
 not just a name]
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

32553103

objtool: Fix type of reloc::addend · 732668c6

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit 3e8afd072d098958a507fb10251a201c9899150c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3e8afd072d098958a507fb10251a201c9899150c

--------------------------------

commit c087c6e7 upstream.

Elf{32,64}_Rela::r_addend is of type: Elf{32,64}_Sword, that means
that our reloc::addend needs to be long or face tuncation issues when
we do elf_rebuild_reloc_section():

  - 107:  48 b8 00 00 00 00 00 00 00 00   movabs $0x0,%rax        109: R_X86_64_64        level4_kernel_pgt+0x80000067
  + 107:  48 b8 00 00 00 00 00 00 00 00   movabs $0x0,%rax        109: R_X86_64_64        level4_kernel_pgt-0x7fffff99

Fixes: 627fce14 ("objtool: Add ORC unwind table generation")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20220419203807.596871927@infradead.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

732668c6

objtool: Fix code relocs vs weak symbols · a300c54a

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit 42ec4d71353f4c6da1d94baf64acbb1badc16b11
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=42ec4d71353f4c6da1d94baf64acbb1badc16b11

--------------------------------

commit 4abff6d4 upstream.

Occasionally objtool driven code patching (think .static_call_sites
.retpoline_sites etc..) goes sideways and it tries to patch an
instruction that doesn't match.

Much head-scatching and cursing later the problem is as outlined below
and affects every section that objtool generates for us, very much
including the ORC data. The below uses .static_call_sites because it's
convenient for demonstration purposes, but as mentioned the ORC
sections, .retpoline_sites and __mount_loc are all similarly affected.

Consider:

foo-weak.c:

  extern void __SCT__foo(void);

  __attribute__((weak)) void foo(void)
  {
	  return __SCT__foo();
  }

foo.c:

  extern void __SCT__foo(void);
  extern void my_foo(void);

  void foo(void)
  {
	  my_foo();
	  return __SCT__foo();
  }

These generate the obvious code
(gcc -O2 -fcf-protection=none -fno-asynchronous-unwind-tables -c foo*.c):

foo-weak.o:
0000000000000000 <foo>:
   0:   e9 00 00 00 00          jmpq   5 <foo+0x5>      1: R_X86_64_PLT32       __SCT__foo-0x4

foo.o:
0000000000000000 <foo>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   e8 00 00 00 00          callq  9 <foo+0x9>      5: R_X86_64_PLT32       my_foo-0x4
   9:   48 83 c4 08             add    $0x8,%rsp
   d:   e9 00 00 00 00          jmpq   12 <foo+0x12>    e: R_X86_64_PLT32       __SCT__foo-0x4

Now, when we link these two files together, you get something like
(ld -r -o foos.o foo-weak.o foo.o):

foos.o:
0000000000000000 <foo-0x10>:
   0:   e9 00 00 00 00          jmpq   5 <foo-0xb>      1: R_X86_64_PLT32       __SCT__foo-0x4
   5:   66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)
   f:   90                      nop

0000000000000010 <foo>:
  10:   48 83 ec 08             sub    $0x8,%rsp
  14:   e8 00 00 00 00          callq  19 <foo+0x9>     15: R_X86_64_PLT32      my_foo-0x4
  19:   48 83 c4 08             add    $0x8,%rsp
  1d:   e9 00 00 00 00          jmpq   22 <foo+0x12>    1e: R_X86_64_PLT32      __SCT__foo-0x4

Noting that ld preserves the weak function text, but strips the symbol
off of it (hence objdump doing that funny negative offset thing). This
does lead to 'interesting' unused code issues with objtool when ran on
linked objects, but that seems to be working (fingers crossed).

So far so good.. Now lets consider the objtool static_call output
section (readelf output, old binutils):

foo-weak.o:

Relocation section '.rela.static_call_sites' at offset 0x2c8 contains 1 entry:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000200000002 R_X86_64_PC32          0000000000000000 .text + 0
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

foo.o:

Relocation section '.rela.static_call_sites' at offset 0x310 contains 2 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000200000002 R_X86_64_PC32          0000000000000000 .text + d
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

foos.o:

Relocation section '.rela.static_call_sites' at offset 0x430 contains 4 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000100000002 R_X86_64_PC32          0000000000000000 .text + 0
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1
0000000000000008  0000000100000002 R_X86_64_PC32          0000000000000000 .text + 1d
000000000000000c  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

So we have two patch sites, one in the dead code of the weak foo and one
in the real foo. All is well.

*HOWEVER*, when the toolchain strips unused section symbols it
generates things like this (using new enough binutils):

foo-weak.o:

Relocation section '.rela.static_call_sites' at offset 0x2c8 contains 1 entry:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000200000002 R_X86_64_PC32          0000000000000000 foo + 0
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

foo.o:

Relocation section '.rela.static_call_sites' at offset 0x310 contains 2 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000200000002 R_X86_64_PC32          0000000000000000 foo + d
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

foos.o:

Relocation section '.rela.static_call_sites' at offset 0x430 contains 4 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000100000002 R_X86_64_PC32          0000000000000000 foo + 0
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1
0000000000000008  0000000100000002 R_X86_64_PC32          0000000000000000 foo + d
000000000000000c  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

And now we can see how that foos.o .static_call_sites goes side-ways, we
now have _two_ patch sites in foo. One for the weak symbol at foo+0
(which is no longer a static_call site!) and one at foo+d which is in
fact the right location.

This seems to happen when objtool cannot find a section symbol, in which
case it falls back to any other symbol to key off of, however in this
case that goes terribly wrong!

As such, teach objtool to create a section symbol when there isn't
one.

Fixes: 44f6a7c0 ("objtool: Fix seg fault with Clang non-section symbols")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20220419203807.655552918@infradead.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

a300c54a

x86/alternative: Add debug prints to apply_retpolines() · d11b3bef

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit 38a80a3ca2cb069dd5608703b015a206a672aae5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=38a80a3ca2cb069dd5608703b015a206a672aae5

--------------------------------

commit d4b5a5c9 upstream.

Make sure we can see the text changes when booting with
'debug-alternative'.

Example output:

 [ ] SMP alternatives: retpoline at: __traceiter_initcall_level+0x1f/0x30 (ffffffff8100066f) len: 5 to: __x86_indirect_thunk_rax+0x0/0x20
 [ ] SMP alternatives: ffffffff82603e58: [2:5) optimized NOPs: ff d0 0f 1f 00
 [ ] SMP alternatives: ffffffff8100066f: orig: e8 cc 30 00 01
 [ ] SMP alternatives: ffffffff8100066f: repl: ff d0 0f 1f 00
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.422273830@infradead.orgSigned-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d11b3bef

x86/alternative: Try inline spectre_v2=retpoline,amd · 35148712

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit 3d13ee0d411a078ca1538d823c2c759b8b266fb1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3d13ee0d411a078ca1538d823c2c759b8b266fb1

--------------------------------

commit bbe2df3f upstream.

Try and replace retpoline thunk calls with:

  LFENCE
  CALL    *%\reg

for spectre_v2=retpoline,amd.

Specifically, the sequence above is 5 bytes for the low 8 registers,
but 6 bytes for the high 8 registers. This means that unless the
compilers prefix stuff the call with higher registers this replacement
will fail.

Luckily GCC strongly favours RAX for the indirect calls and most (95%+
for defconfig-x86_64) will be converted. OTOH clang strongly favours
R11 and almost nothing gets converted.

Note: it will also generate a correct replacement for the Jcc.d32
case, except unless the compilers start to prefix stuff that, it'll
never fit. Specifically:

  Jncc.d8 1f
  LFENCE
  JMP     *%\reg
1:

is 7-8 bytes long, where the original instruction in unpadded form is
only 6 bytes.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.359986601@infradead.org
[cascardo: RETPOLINE_AMD was renamed to RETPOLINE_LFENCE]
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

35148712

x86/alternative: Handle Jcc __x86_indirect_thunk_\reg · 1efa9bcc

由 Peter Zijlstra 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit b0e2dc950654162bc68cec530156251e7ad3f03a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b0e2dc950654162bc68cec530156251e7ad3f03a

--------------------------------

commit 2f0cbb2a upstream.

Handle the rare cases where the compiler (clang) does an indirect
conditional tail-call using:

  Jcc __x86_indirect_thunk_\reg

For the !RETPOLINE case this can be rewritten to fit the original (6
byte) instruction like:

  Jncc.d8	1f
  JMP		*%\reg
  NOP
1:
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.296470217@infradead.orgSigned-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

1efa9bcc

x86/insn-eval: Handle return values from the decoder · f5f8f3fc

由 Borislav Petkov 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.133
commit e6f8dc86a1c15b862486a61abcb54b88e8c177e3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YVKO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e6f8dc86a1c15b862486a61abcb54b88e8c177e3

--------------------------------

commit 6e8c83d2 upstream.

Now that the different instruction-inspecting functions return a value,
test that and return early from callers if error has been encountered.

While at it, do not call insn_get_modrm() when calling
insn_get_displacement() because latter will make sure to call
insn_get_modrm() if ModRM hasn't been parsed yet.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210304174237.31945-6-bp@alien8.deSigned-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

f5f8f3fc

x86/pat: Fix x86_has_pat_wp() · 866f85a9

由 Juergen Gross 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit 06a5dc3911a3b29acefd53470bdeccb88deb155e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=06a5dc3911a3b29acefd53470bdeccb88deb155e

--------------------------------

commit 230ec83d upstream.

x86_has_pat_wp() is using a wrong test, as it relies on the normal
PAT configuration used by the kernel. In case the PAT MSR has been
setup by another entity (e.g. Xen hypervisor) it might return false
even if the PAT configuration is allowing WP mappings. This due to the
fact that when running as Xen PV guest the PAT MSR is setup by the
hypervisor and cannot be changed by the guest. This results in the WP
related entry to be at a different position when running as Xen PV
guest compared to the bare metal or fully virtualized case.

The correct way to test for WP support is:

1. Get the PTE protection bits needed to select WP mode by reading
   __cachemode2pte_tbl[_PAGE_CACHE_MODE_WP] (depending on the PAT MSR
   setting this might return protection bits for a stronger mode, e.g.
   UC-)
2. Translate those bits back into the real cache mode selected by those
   PTE bits by reading __pte2cachemode_tbl[__pte2cm_idx(prot)]
3. Test for the cache mode to be _PAGE_CACHE_MODE_WP

Fixes: f88a68fa ("x86/mm: Extend early_memremap() support with additional attrs")
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14
Link: https://lore.kernel.org/r/20220503132207.17234-1-jgross@suse.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

866f85a9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功