提交 · d364232d409ae8f73dda1f1f9199d6bea8dfcb3c · openeuler / Kernel

08 7月, 2021 2 次提交

memcg: Add static key for memcg priority · d364232d

由 Jing Xiangfeng 提交于 7月 06, 2021

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZN3O
CVE: NA

--------------------------------------

This patch adds a default-false static key to disable memcg priority
feature. If you want to enable it by writing 1:

echo 1 > /proc/sys/vm/memcg_qos_enable
Signed-off-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NLiu Shixin <liushixin2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d364232d

memcg: support priority for oom · e9373c8b

由 Jing Xiangfeng 提交于 7月 06, 2021

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZN3O
CVE: NA

--------------------------------------

We first kill the process from the low priority memcg if OOM occurs.
If the process is not found, then fallback to normal handle.
Signed-off-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NLiu Shixin <liushixin2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e9373c8b

06 7月, 2021 9 次提交

iommu/vt-d: Fix general protection fault in aux_detach_device() · 172b3700

由 Liu Yi L 提交于 7月 02, 2021

mainline inclusion
from mainline-v5.11-rc3
commit 18abda7a
category: bugfix
bugzilla: 108082
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18abda7a2d555783d28ea1701f3ec95e96237a86

-------------------------------------------------------------------------

The aux-domain attach/detach are not tracked, some data structures might
be used after free. This causes general protection faults when multiple
subdevices are created and assigned to a same guest machine:

  | general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI
  | RIP: 0010:intel_iommu_aux_detach_device+0x12a/0x1f0
  | [...]
  | Call Trace:
  |  iommu_aux_detach_device+0x24/0x70
  |  vfio_mdev_detach_domain+0x3b/0x60
  |  ? vfio_mdev_set_domain+0x50/0x50
  |  iommu_group_for_each_dev+0x4f/0x80
  |  vfio_iommu_detach_group.isra.0+0x22/0x30
  |  vfio_iommu_type1_detach_group.cold+0x71/0x211
  |  ? find_exported_symbol_in_section+0x4a/0xd0
  |  ? each_symbol_section+0x28/0x50
  |  __vfio_group_unset_container+0x4d/0x150
  |  vfio_group_try_dissolve_container+0x25/0x30
  |  vfio_group_put_external_user+0x13/0x20
  |  kvm_vfio_group_put_external_user+0x27/0x40 [kvm]
  |  kvm_vfio_destroy+0x45/0xb0 [kvm]
  |  kvm_put_kvm+0x1bb/0x2e0 [kvm]
  |  kvm_vm_release+0x22/0x30 [kvm]
  |  __fput+0xcc/0x260
  |  ____fput+0xe/0x10
  |  task_work_run+0x8f/0xb0
  |  do_exit+0x358/0xaf0
  |  ? wake_up_state+0x10/0x20
  |  ? signal_wake_up_state+0x1a/0x30
  |  do_group_exit+0x47/0xb0
  |  __x64_sys_exit_group+0x18/0x20
  |  do_syscall_64+0x57/0x1d0
  |  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix the crash by tracking the subdevices when attaching and detaching
aux-domains.

Fixes: 67b8e02b ("iommu/vt-d: Aux-domain specific domain attach/detach")
Co-developed-by: NXin Zeng <xin.zeng@intel.com>
Signed-off-by: NXin Zeng <xin.zeng@intel.com>
Signed-off-by: NLiu Yi L <yi.l.liu@intel.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1609949037-25291-3-git-send-email-yi.l.liu@intel.comSigned-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

172b3700

seccomp/cache: Report cache data through /proc/pid/seccomp_cache · 39be1ac0

由 YiFei Zhu 提交于 6月 30, 2021

stable inclusion
from stable-5.11-rc1
commit 0d8315dd
bugzilla: 167382
CVE: N/A

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0d8315dddd2899f519fe1ca3d4d5cdaf44ea421e

-------------------------------------------------

Currently the kernel does not provide an infrastructure to translate
architecture numbers to a human-readable name. Translating syscall
numbers to syscall names is possible through FTRACE_SYSCALL
infrastructure but it does not provide support for compat syscalls.

This will create a file for each PID as /proc/pid/seccomp_cache.
The file will be empty when no seccomp filters are loaded, or be
in the format of:
<arch name> <decimal syscall number> <ALLOW | FILTER>
where ALLOW means the cache is guaranteed to allow the syscall,
and filter means the cache will pass the syscall to the BPF filter.

For the docker default profile on x86_64 it looks like:
x86_64 0 ALLOW
x86_64 1 ALLOW
x86_64 2 ALLOW
x86_64 3 ALLOW
[...]
x86_64 132 ALLOW
x86_64 133 ALLOW
x86_64 134 FILTER
x86_64 135 FILTER
x86_64 136 FILTER
x86_64 137 ALLOW
x86_64 138 ALLOW
x86_64 139 FILTER
x86_64 140 ALLOW
x86_64 141 ALLOW
[...]

This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default
of N because I think certain users of seccomp might not want the
application to know which syscalls are definitely usable. For
the same reason, it is also guarded by CAP_SYS_ADMIN.
Suggested-by: NJann Horn <jannh@google.com>
Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com/Signed-off-by: NYiFei Zhu <yifeifz2@illinois.edu>
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/94e663fa53136f5a11f432c661794d1ee7060779.1605101222.git.yifeifz2@illinois.eduSigned-off-by: NGONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

39be1ac0

mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare · d568aeab

由 Peter Xu 提交于 6月 30, 2021

stable inclusion
from stable-5.10.46
commit 12eb3c2c1a4f6e7c30de2aa0a09cb1b9e19fa9c0
bugzilla: 168323
CVE: NA

--------------------------------

commit 099dd687 upstream.

I found it by pure code review, that pte_same_as_swp() of unuse_vma()
didn't take uffd-wp bit into account when comparing ptes.
pte_same_as_swp() returning false negative could cause failure to
swapoff swap ptes that was wr-protected by userfaultfd.

Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com
Fixes: f45ec5ff ("userfaultfd: wp: support swap and page migration")
Signed-off-by: NPeter Xu <peterx@redhat.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[5.7+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d568aeab

mm: relocate 'write_protect_seq' in struct mm_struct · c6746072

由 Feng Tang 提交于 6月 30, 2021

stable inclusion
from stable-5.10.46
commit 103c4a08baec6723cf2d4999c873a1634f8d6bc0
bugzilla: 168323
CVE: NA

--------------------------------

[ Upstream commit 2e302543 ]

0day robot reported a 9.2% regression for will-it-scale mmap1 test
case[1], caused by commit 57efa1fe ("mm/gup: prevent gup_fast from
racing with COW during fork").

Further debug shows the regression is due to that commit changes the
offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
cache alignment changes.

From the perf data, the contention for 'mmap_lock' is very severe and
takes around 95% cpu cycles, and it is a rw_semaphore

        struct rw_semaphore {
                atomic_long_t count;	/* 8 bytes */
                atomic_long_t owner;	/* 8 bytes */
                struct optimistic_spin_queue osq; /* spinner MCS lock */
                ...

Before commit 57efa1fe adds the 'write_protect_seq', it happens to
have a very optimal cache alignment layout, as Linus explained:

 "and before the addition of the 'write_protect_seq' field, the
  mmap_sem was at offset 120 in 'struct mm_struct'.

  Which meant that count and owner were in two different cachelines,
  and then when you have contention and spend time in
  rwsem_down_write_slowpath(), this is probably *exactly* the kind
  of layout you want.

  Because first the rwsem_write_trylock() will do a cmpxchg on the
  first cacheline (for the optimistic fast-path), and then in the
  case of contention, rwsem_down_write_slowpath() will just access
  the second cacheline.

  Which is probably just optimal for a load that spends a lot of
  time contended - new waiters touch that first cacheline, and then
  they queue themselves up on the second cacheline."

After the commit, the rw_semaphore is at offset 128, which means the
'count' and 'owner' fields are now in the same cacheline, and causes
more cache bouncing.

Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
affect its offset:

  CONFIG_MMU
  CONFIG_MEMBARRIER
  CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES

The layout above is on 64 bits system with 0day's default kernel config
(similar to RHEL-8.3's config), in which all these 3 options are 'y'.
And the layout can vary with different kernel configs.

Relayouting a structure is usually a double-edged sword, as sometimes it
can helps one case, but hurt other cases.  For this case, one solution
is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
(when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
hole in 'mm_struct' will not change other fields' alignment, while
restoring the regression.

Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1]
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Signed-off-by: NFeng Tang <feng.tang@intel.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c6746072

regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting · f66b9197

由 Axel Lin 提交于 6月 30, 2021

stable inclusion
from stable-5.10.46
commit 0609c36696e7668d265c29ee88bad079201f700f
bugzilla: 168323
CVE: NA

--------------------------------

[ Upstream commit 0514582a ]

The valid selectors for bd70528 bucks are 0 ~ 0xf, so the .n_voltages
should be 16 (0x10). Use 0x10 to make it consistent with BD70528_LDO_VOLTS.
Also remove redundant defines for BD70528_BUCK_VOLTS.
Signed-off-by: NAxel Lin <axel.lin@ingics.com>
Acked-by: NMatti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Link: https://lore.kernel.org/r/20210523071045.2168904-1-axel.lin@ingics.comSigned-off-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f66b9197

ptp: improve max_adj check against unreasonable values · c8b68d15

由 Jakub Kicinski 提交于 6月 30, 2021

stable inclusion
from stable-5.10.46
commit 9a479495629246c5dcfec55f7f425f5149f29ac0
bugzilla: 168323
CVE: NA

--------------------------------

[ Upstream commit 475b92f9 ]

Scaled PPM conversion to PPB may (on 64bit systems) result
in a value larger than s32 can hold (freq/scaled_ppm is a long).
This means the kernel will not correctly reject unreasonably
high ->freq values (e.g. > 4294967295ppb, 281474976645 scaled PPM).

The conversion is equivalent to a division by ~66 (65.536),
so the value of ppb is always smaller than ppm, but not small
enough to assume narrowing the type from long -> s32 is okay.

Note that reasonable user space (e.g. ptp4l) will not use such
high values, anyway, 4289046510ppb ~= 4.3x, so the fix is
somewhat pedantic.

Fixes: d39a7435 ("ptp: validate the requested frequency adjustment.")
Fixes: d94ba80e ("ptp: Added a brand new class driver for ptp clocks.")
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c8b68d15

net: make get_net_ns return error if NET_NS is disabled · 991bf60b

由 Changbin Du 提交于 6月 30, 2021

stable inclusion
from stable-5.10.46
commit 4abfd597fe60bfa677bfe177e3a6a551e3a3f792
bugzilla: 168323
CVE: NA

--------------------------------

[ Upstream commit ea6932d7 ]

There is a panic in socket ioctl cmd SIOCGSKNS when NET_NS is not enabled.
The reason is that nsfs tries to access ns->ops but the proc_ns_operations
is not implemented in this case.

[7.670023] Unable to handle kernel NULL pointer dereference at virtual address 00000010
[7.670268] pgd = 32b54000
[7.670544] [00000010] *pgd=00000000
[7.671861] Internal error: Oops: 5 [#1] SMP ARM
[7.672315] Modules linked in:
[7.672918] CPU: 0 PID: 1 Comm: systemd Not tainted 5.13.0-rc3-00375-g6799d4f2 #16
[7.673309] Hardware name: Generic DT based system
[7.673642] PC is at nsfs_evict+0x24/0x30
[7.674486] LR is at clear_inode+0x20/0x9c

The same to tun SIOCGSKNS command.

To fix this problem, we make get_net_ns() return -EINVAL when NET_NS is
disabled. Meanwhile move it to right place net/core/net_namespace.c.
Signed-off-by: NChangbin Du <changbin.du@gmail.com>
Fixes: c62cce2c ("net: add an ioctl to get a socket network namespace")
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

991bf60b

net/mlx5e: Fix page reclaim for dead peer hairpin · c17cc0e6

由 Dima Chumak 提交于 6月 30, 2021

stable inclusion
from stable-5.10.46
commit be7f3f401d224e1efe8112b2fa8b837eeb8c5e52
bugzilla: 168323
CVE: NA

--------------------------------

[ Upstream commit a3e5fd93 ]

When adding a hairpin flow, a firmware-side send queue is created for
the peer net device, which claims some host memory pages for its
internal ring buffer. If the peer net device is removed/unbound before
the hairpin flow is deleted, then the send queue is not destroyed which
leads to a stack trace on pci device remove:

[ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
[ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110
[ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0
[ 748.002171] ------------[ cut here ]------------
[ 748.001177] FW pages counter is 4 after reclaiming all pages
[ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]                      [  +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core]
[ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1
[ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]
[ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9
[ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286
[ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000
[ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51
[ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8
[ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30
[ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000
[ 748.001429] FS:  00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000
[ 748.001695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0
[ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 748.001654] Call Trace:
[ 748.000576]  ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core]
[ 748.001416]  ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core]
[ 748.001354]  ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core]
[ 748.001203]  mlx5_function_teardown+0x30/0x60 [mlx5_core]
[ 748.001275]  mlx5_uninit_one+0xa7/0xc0 [mlx5_core]
[ 748.001200]  remove_one+0x5f/0xc0 [mlx5_core]
[ 748.001075]  pci_device_remove+0x9f/0x1d0
[ 748.000833]  device_release_driver_internal+0x1e0/0x490
[ 748.001207]  unbind_store+0x19f/0x200
[ 748.000942]  ? sysfs_file_ops+0x170/0x170
[ 748.001000]  kernfs_fop_write_iter+0x2bc/0x450
[ 748.000970]  new_sync_write+0x373/0x610
[ 748.001124]  ? new_sync_read+0x600/0x600
[ 748.001057]  ? lock_acquire+0x4d6/0x700
[ 748.000908]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[ 748.001126]  ? fd_install+0x1c9/0x4d0
[ 748.000951]  vfs_write+0x4d0/0x800
[ 748.000804]  ksys_write+0xf9/0x1d0
[ 748.000868]  ? __x64_sys_read+0xb0/0xb0
[ 748.000811]  ? filp_open+0x50/0x50
[ 748.000919]  ? syscall_enter_from_user_mode+0x1d/0x50
[ 748.001223]  do_syscall_64+0x3f/0x80
[ 748.000892]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 748.001026] RIP: 0033:0x7f58bcfb22f7
[ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7
[ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003
[ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001
[ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d
[ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700
[ 748.001564] irq event stamp: 0
[ 748.000787] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0
[ 748.001854] softirqs last  enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0
[ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 748.001492] ---[ end trace a6fabd773d1c51ae ]---

Fix by destroying the send queue of a hairpin peer net device that is
being removed/unbound, which returns the allocated ring buffer pages to
the host.

Fixes: 4d8fcf21 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules")
Signed-off-by: NDima Chumak <dchumak@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c17cc0e6

blk-mq: add new API of blk_mq_hctx_set_fq_lock_class · 5476d7ce

由 Ming Lei 提交于 6月 28, 2021

mainline inclusion
from mainline-5.11-rc1
commit fb01a293
category: bugfix
bugzilla: 108493
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fb01a2932e81a1fb2273f87ff92dc8172b8880ee

---------------------------

flush_end_io() may be called recursively from some driver, such as
nvme-loop, so lockdep may complain 'possible recursive locking'.
Commit b3c6a599("block: Fix a lockdep complaint triggered by
request queue flushing") tried to address this issue by assigning
dynamically allocated per-flush-queue lock class. This solution
adds synchronize_rcu() for each hctx's release handler, and causes
horrible SCSI MQ probe delay(more than half an hour on megaraid sas).

Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so
we just need to use driver specific lock class for avoiding the
lockdep warning of 'possible recursive locking'.
Tested-by: NKashyap Desai <kashyap.desai@broadcom.com>
Reported-by: NQian Cai <cai@redhat.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5476d7ce

03 7月, 2021 13 次提交

scsi: remove unused kobj map for sd devie to avoid memleak · ecb923f0

由 Yufen Yu 提交于 7月 01, 2021

hulk inclusion
category: bugfix
bugzilla: 168625
CVE: NA

-------------------------------------------------

After calling add_disk, we have register new kobj map for sd device,
then we can remove old unused kobj map which probed by sd_remove.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ecb923f0

mm/zswap: add the flag can_sleep_mapped · 1a1d0a85

由 Tian Tao 提交于 6月 17, 2021

mainline inclusion
from mainline-5.12-rc1
commit fc6697a8
category: bugfix
bugzilla: 107205
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fc6697a89f56d9773b2fbff718d4cf2a6d63379d

-------------------------------------------------

Patch series "Fix the compatibility of zsmalloc and zswap".

Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.

The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context.  So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.

This patch (of 2):

Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false.  Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.

[natechancellor@gmail.com: add return value in zswap_frontswap_load]
  Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
  Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
  Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
  Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com

Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.comSigned-off-by: NTian Tao <tiantao6@hisilicon.com>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NVitaly Wool <vitaly.wool@konsulko.com>
Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: NMike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1a1d0a85

bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper · a9c81664

由 Yonghong Song 提交于 6月 17, 2021

mainline inclusion
from mainline-v5.13-rc1
commit b910eaaa
category: bugfix
bugzilla: 106537
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b910eaaaa4b89976ef02e5d6448f3f73dc671d91

-------------------------------------------------

Jiri Olsa reported a bug ([1]) in kernel where cgroup local
storage pointer may be NULL in bpf_get_local_storage() helper.
There are two issues uncovered by this bug:
  (1). kprobe or tracepoint prog incorrectly sets cgroup local storage
       before prog run,
  (2). due to change from preempt_disable to migrate_disable,
       preemption is possible and percpu storage might be overwritten
       by other tasks.

This issue (1) is fixed in [2]. This patch tried to address issue (2).
The following shows how things can go wrong:
  task 1:   bpf_cgroup_storage_set() for percpu local storage
         preemption happens
  task 2:   bpf_cgroup_storage_set() for percpu local storage
         preemption happens
  task 1:   run bpf program

task 1 will effectively use the percpu local storage setting by task 2
which will be either NULL or incorrect ones.

Instead of just one common local storage per cpu, this patch fixed
the issue by permitting 8 local storages per cpu and each local
storage is identified by a task_struct pointer. This way, we
allow at most 8 nested preemption between bpf_cgroup_storage_set()
and bpf_cgroup_storage_unset(). The percpu local storage slot
is released (calling bpf_cgroup_storage_unset()) by the same task
after bpf program finished running.
bpf_test_run() is also fixed to use the new bpf_cgroup_storage_set()
interface.

The patch is tested on top of [2] with reproducer in [1].
Without this patch, kernel will emit error in 2-3 minutes.
With this patch, after one hour, still no error.

 [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
 [2] https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.comSigned-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NRoman Gushchin <guro@fb.com>
Link: https://lore.kernel.org/bpf/20210323055146.3334476-1-yhs@fb.com

Conflicts:
    include/linux/bpf.h
    net/bpf/test_run.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a9c81664

gpu: host1x: Split up client initalization and registration · f18fdb2b

由 Thierry Reding 提交于 6月 22, 2021

stable inclusion
from stable-5.10.45
commit 9c1d492baa9128b970524b9baa4b9baa7dc01c64
bugzilla: 109305
CVE: NA

--------------------------------

[ Upstream commit 0cfe5a6e ]

In some cases we may need to initialize the host1x client first before
registering it. This commit adds a new helper that will do nothing but
the initialization of the data structure.

At the same time, the initialization is removed from the registration
function. Note, however, that for simplicity we explicitly initialize
the client when the host1x_client_register() function is called, as
opposed to the low-level __host1x_client_register() function. This
allows existing callers to remain unchanged.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f18fdb2b

HID: usbhid: fix info leak in hid_submit_ctrl · 2e859113

由 Anirudh Rayabharam 提交于 6月 22, 2021

stable inclusion
from stable-5.10.45
commit b1e3596416d74ce95cc0b7b38472329a3818f8a9
bugzilla: 109305
CVE: NA

--------------------------------

[ Upstream commit 6be388f4 ]

In hid_submit_ctrl(), the way of calculating the report length doesn't
take into account that report->size can be zero. When running the
syzkaller reproducer, a report of size 0 causes hid_submit_ctrl) to
calculate transfer_buffer_length as 16384. When this urb is passed to
the usb core layer, KMSAN reports an info leak of 16384 bytes.

To fix this, first modify hid_report_len() to account for the zero
report size case by using DIV_ROUND_UP for the division. Then, call it
from hid_submit_ctrl().

Reported-by: syzbot+7c2bb71996f95a82524c@syzkaller.appspotmail.com
Signed-off-by: NAnirudh Rayabharam <mail@anirudhrb.com>
Acked-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2e859113

livepatch: fix unload hook could not be excuted · 9c222ac0

由 Ye Weihua 提交于 6月 23, 2021

hulk inclusion
category: bugfix
bugzilla: 110621
CVE: NA

--------------------------------

Livepatch can add some hook functions when inserting and disabling the
patch. The hook functions called during inserting is named load hooks,
and the hook functions called during disabling is named unload hooks.

During the test, it is found that unload hooks is not executed. The
reason is that the __klp_free_objects() is called before
klp_free_patch_finish() is executed. This function deletes obj from the
patch list. Therefore, klp_for_each_object in klp_free_patch_finish()
cannot fund obj. As a result, the klp_unload_hook() is not executed.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9c222ac0

iommu: sva: Fix compile error in iommu_sva_bind_group · 27abd90f

由 Lijun Fang 提交于 6月 22, 2021

hulk inclusion
category: bugfix
bugzilla: 51855
CVE: NA

---------------------------------------------

iommu_sva_bind_group should return pointer rather than an integer,
so it will cause compile error, if not defined CONFIG_IOMMU_API

Fixes: 45e52e3fc545 ("iommu: Add group variant for SVA bind/unbind")
Signed-off-by: NLijun Fang <fanglijun3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

27abd90f

kvm: fix previous commit for 32-bit builds · 688b677d

由 Paolo Bonzini 提交于 6月 18, 2021

stable inclusion
from stable-5.10.44
commit 9064c9d544b906a63e359db5f908594bc07580e6
bugzilla: 109295
CVE: NA

--------------------------------

commit 4422829e upstream.

array_index_nospec does not work for uint64_t on 32-bit builds.
However, the size of a memory slot must be less than 20 bits wide
on those system, since the memory slot must fit in the user
address space.  So just store it in an unsigned long.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

688b677d

sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling · ba635552

由 Dietmar Eggemann 提交于 6月 18, 2021

stable inclusion
from stable-5.10.44
commit 190a7f908993cc7f5e8bbe67aa88eeb0bdbbbaa5
bugzilla: 109295
CVE: NA

--------------------------------

commit 68d7a190 upstream.

The util_est internal UTIL_AVG_UNCHANGED flag which is used to prevent
unnecessary util_est updates uses the LSB of util_est.enqueued. It is
exposed via _task_util_est() (and task_util_est()).

Commit 92a801e5 ("sched/fair: Mask UTIL_AVG_UNCHANGED usages")
mentions that the LSB is lost for util_est resolution but
find_energy_efficient_cpu() checks if task_util_est() returns 0 to
return prev_cpu early.

_task_util_est() returns the max value of util_est.ewma and
util_est.enqueued or'ed w/ UTIL_AVG_UNCHANGED.
So task_util_est() returning the max of task_util() and
_task_util_est() will never return 0 under the default
SCHED_FEAT(UTIL_EST, true).

To fix this use the MSB of util_est.enqueued instead and keep the flag
util_est internal, i.e. don't export it via _task_util_est().

The maximal possible util_avg value for a task is 1024 so the MSB of
'unsigned int util_est.enqueued' isn't used to store a util value.

As a caveat the code behind the util_est_se trace point has to filter
UTIL_AVG_UNCHANGED to see the real util_est.enqueued value which should
be easy to do.

This also fixes an issue report by Xuewen Yan that util_est_update()
only used UTIL_AVG_UNCHANGED for the subtrahend of the equation:

  last_enqueued_diff = ue.enqueued - (task_util() | UTIL_AVG_UNCHANGED)

Fixes: b89997aa sched/pelt: Fix task util_est update filtering
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NXuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: NVincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210602145808.1562603-1-dietmar.eggemann@arm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ba635552

RDMA/mlx4: Do not map the core_clock page to user space unless enabled · e4f418ff

由 Shay Drory 提交于 6月 18, 2021

stable inclusion
from stable-5.10.44
commit cb1aa1da04882d1860f733e24aeebdbbc85724d7
bugzilla: 109295
CVE: NA

--------------------------------

commit 404e5a12 upstream.

Currently when mlx4 maps the hca_core_clock page to the user space there
are read-modifiable registers, one of which is semaphore, on this page as
well as the clock counter. If user reads the wrong offset, it can modify
the semaphore and hang the device.

Do not map the hca_core_clock page to the user space unless the device has
been put in a backwards compatibility mode to support this feature.

After this patch, mlx4 core_clock won't be mapped to user space on the
majority of existing devices and the uverbs device time feature in
ibv_query_rt_values_ex() will be disabled.

Fixes: 52033cfb ("IB/mlx4: Add mmap call to map the hardware clock")
Link: https://lore.kernel.org/r/9632304e0d6790af84b3b706d8c18732bc0d5e27.1622726305.git.leonro@nvidia.comSigned-off-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e4f418ff

regulator: bd71828: Fix .n_voltages settings · e304bce2

由 Axel Lin 提交于 6月 18, 2021

stable inclusion
from stable-5.10.44
commit 4579f65176792a54e842d28605330ef0f4916df2
bugzilla: 109295
CVE: NA

--------------------------------

commit 4c668630 upstream.

Current .n_voltages settings do not cover the latest 2 valid selectors,
so it fails to set voltage for the hightest voltage support.
The latest linear range has step_uV = 0, so it does not matter if we
count the .n_voltages to maximum selector + 1 or the first selector of
latest linear range + 1.
To simplify calculating the n_voltages, let's just set the
.n_voltages to maximum selector + 1.

Fixes: 522498f8 ("regulator: bd71828: Basic support for ROHM bd71828 PMIC regulators")
Signed-off-by: NAxel Lin <axel.lin@ingics.com>
Reviewed-by: NMatti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Link: https://lore.kernel.org/r/20210523071045.2168904-2-axel.lin@ingics.comSigned-off-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e304bce2

usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms · c43985c2

由 Kyle Tso 提交于 6月 18, 2021

stable inclusion
from stable-5.10.44
commit b452e8bb7c525fbecc69aef44b0721d2ece032bc
bugzilla: 109295
CVE: NA

--------------------------------

commit 6490fa56 upstream.

Current timer PD_T_SINK_WAIT_CAP is set to 240ms which will violate the
SinkWaitCapTimer (tTypeCSinkWaitCap 310 - 620 ms) defined in the PD
Spec if the port is faster enough when running the state machine. Set it
to the lower bound 310ms to ensure the timeout is in Spec.

Fixes: f0690a25 ("staging: typec: USB Type-C Port Manager (tcpm)")
Cc: stable <stable@vger.kernel.org>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NKyle Tso <kyletso@google.com>
Link: https://lore.kernel.org/r/20210528081613.730661-1-kyletso@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c43985c2

kvm: avoid speculation-based attacks from out-of-range memslot accesses · 0f841f09

由 Paolo Bonzini 提交于 6月 18, 2021

stable inclusion
from stable-5.10.44
commit 7af299b97734c7e7f465b42a2139ce4d77246975
bugzilla: 109295
CVE: NA

--------------------------------

commit da27a83f upstream.

KVM's mechanism for accessing guest memory translates a guest physical
address (gpa) to a host virtual address using the right-shifted gpa
(also known as gfn) and a struct kvm_memory_slot.  The translation is
performed in __gfn_to_hva_memslot using the following formula:

      hva = slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE

It is expected that gfn falls within the boundaries of the guest's
physical memory.  However, a guest can access invalid physical addresses
in such a way that the gfn is invalid.

__gfn_to_hva_memslot is called from kvm_vcpu_gfn_to_hva_prot, which first
retrieves a memslot through __gfn_to_memslot.  While __gfn_to_memslot
does check that the gfn falls within the boundaries of the guest's
physical memory or not, a CPU can speculate the result of the check and
continue execution speculatively using an illegal gfn. The speculation
can result in calculating an out-of-bounds hva.  If the resulting host
virtual address is used to load another guest physical address, this
is effectively a Spectre gadget consisting of two consecutive reads,
the second of which is data dependent on the first.

Right now it's not clear if there are any cases in which this is
exploitable.  One interesting case was reported by the original author
of this patch, and involves visiting guest page tables on x86.  Right
now these are not vulnerable because the hva read goes through get_user(),
which contains an LFENCE speculation barrier.  However, there are
patches in progress for x86 uaccess.h to mask kernel addresses instead of
using LFENCE; once these land, a guest could use speculation to read
from the VMM's ring 3 address space.  Other architectures such as ARM
already use the address masking method, and would be susceptible to
this same kind of data-dependent access gadgets.  Therefore, this patch
proactively protects from these attacks by masking out-of-bounds gfns
in __gfn_to_hva_memslot, which blocks speculation of invalid hvas.

Sean Christopherson noted that this patch does not cover
kvm_read_guest_offset_cached.  This however is limited to a few bytes
past the end of the cache, and therefore it is unlikely to be useful in
the context of building a chain of data dependent accesses.
Reported-by: NArtemiy Margaritov <artemiy.margaritov@gmail.com>
Co-developed-by: NArtemiy Margaritov <artemiy.margaritov@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0f841f09

15 6月, 2021 7 次提交

bus: ti-sysc: Fix am335x resume hang for usb otg module · 9f181b28

由 Tony Lindgren 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit d551b8e857775a6ea48f365d9611fe5c470008a3
bugzilla: 109284
CVE: NA

--------------------------------

[ Upstream commit 4d7b324e ]

On am335x, suspend and resume only works once, and the system hangs if
suspend is attempted again. However, turns out suspend and resume works
fine multiple times if the USB OTG driver for musb controller is loaded.

The issue is caused my the interconnect target module losing context
during suspend, and it needs a restore on resume to be reconfigure again
as debugged earlier by Dave Gerlach <d-gerlach@ti.com>.

There are also other modules that need a restore on resume, like gpmc as
noted by Dave. So let's add a common way to restore an interconnect
target module based on a quirk flag. For now, let's enable the quirk for
am335x otg only to fix the suspend and resume issue.

As gpmc is not causing hangs based on tests with BeagleBone, let's patch
gpmc separately. For gpmc, we also need a hardware reset done before
restore according to Dave.

To reinit the modules, we decouple system suspend from PM runtime. We
replace calls to pm_runtime_force_suspend() and pm_runtime_force_resume()
with direct calls to internal functions and rely on the driver internal
state. There no point trying to handle complex system suspend and resume
quirks via PM runtime.

This is issue should have already been noticed with commit 1819ef2e
("bus: ti-sysc: Use swsup quirks also for am335x musb") when quirk
handling was added for am335x otg for swsup. But the issue went unnoticed
as having musb driver loaded hides the issue, and suspend and resume works
once without the driver loaded.

Fixes: 1819ef2e ("bus: ti-sysc: Use swsup quirks also for am335x musb")
Suggested-by: NDave Gerlach <d-gerlach@ti.com>
Signed-off-by: NTony Lindgren <tony@atomide.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9f181b28

net/mlx5: DR, Create multi-destination flow table with level less than 64 · c912d5a8

由 Yevgeny Kliteynik 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 2a8cda3867cd06fbc3f414a78e1c692f973d21e4
bugzilla: 109284
CVE: NA

--------------------------------

[ Upstream commit 216214c6 ]

Flow table that contains flow pointing to multiple flow tables or multiple
TIRs must have a level lower than 64. In our case it applies to muli-
destination flow table.
Fix the level of the created table to comply with HW Spec definitions, and
still make sure that its level lower than SW-owned tables, so that it
would be possible to point from the multi-destination FW table to SW
tables.

Fixes: 34583bee ("net/mlx5: DR, Create multi-destination table for SW-steering use")
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: NAlex Vesker <valex@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c912d5a8

net: usb: cdc_ncm: don't spew notifications · 5146c41f

由 Grant Grundler 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 70df000fb8808e2ea63e6beb2ff570c083592614
bugzilla: 109284
CVE: NA

--------------------------------

[ Upstream commit de658a19 ]

RTL8156 sends notifications about every 32ms.
Only display/log notifications when something changes.

This issue has been reported by others:
	https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832472
	https://lkml.org/lkml/2020/8/27/1083

...
[785962.779840] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[785962.929944] usb 1-1: New USB device found, idVendor=0bda, idProduct=8156, bcdDevice=30.00
[785962.929949] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[785962.929952] usb 1-1: Product: USB 10/100/1G/2.5G LAN
[785962.929954] usb 1-1: Manufacturer: Realtek
[785962.929956] usb 1-1: SerialNumber: 000000001
[785962.991755] usbcore: registered new interface driver cdc_ether
[785963.017068] cdc_ncm 1-1:2.0: MAC-Address: 00:24:27:88:08:15
[785963.017072] cdc_ncm 1-1:2.0: setting rx_max = 16384
[785963.017169] cdc_ncm 1-1:2.0: setting tx_max = 16384
[785963.017682] cdc_ncm 1-1:2.0 usb0: register 'cdc_ncm' at usb-0000:00:14.0-1, CDC NCM, 00:24:27:88:08:15
[785963.019211] usbcore: registered new interface driver cdc_ncm
[785963.023856] usbcore: registered new interface driver cdc_wdm
[785963.025461] usbcore: registered new interface driver cdc_mbim
[785963.038824] cdc_ncm 1-1:2.0 enx002427880815: renamed from usb0
[785963.089586] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.121673] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
[785963.153682] cdc_ncm 1-1:2.0 enx002427880815: network connection: disconnected
...

This is about 2KB per second and will overwrite all contents of a 1MB
dmesg buffer in under 10 minutes rendering them useless for debugging
many kernel problems.

This is also an extra 180 MB/day in /var/logs (or 1GB per week) rendering
the majority of those logs useless too.

When the link is up (expected state), spew amount is >2x higher:
...
[786139.600992] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.632997] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.665097] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.697100] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
[786139.729094] cdc_ncm 2-1:2.0 enx002427880815: network connection: connected
[786139.761108] cdc_ncm 2-1:2.0 enx002427880815: 2500 mbit/s downlink 2500 mbit/s uplink
...

Chrome OS cannot support RTL8156 until this is fixed.
Signed-off-by: NGrant Grundler <grundler@chromium.org>
Reviewed-by: NHayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20210120011208.3768105-1-grundler@chromium.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5146c41f

SUNRPC: More fixes for backlog congestion · b73c5591

由 Trond Myklebust 提交于 6月 07, 2021

stable inclusion
from stable-5.10.42
commit 899b5131e74cd09ddd204addf763deaf6870ab57
bugzilla: 55093
CVE: NA

--------------------------------

commit e86be3a0 upstream.

Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
as socket based transports.
Ensure we always initialise the request after waking up from the backlog
list.

Fixes: e877a88d ("SUNRPC in case of backlog, hand free slots directly to waiting task")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b73c5591

linux/bits.h: fix compilation error with GENMASK · cffb222b

由 Rikard Falkeborn 提交于 6月 07, 2021

stable inclusion
from stable-5.10.42
commit 1354ec840899e87259286cc844d4c161ea86fae7
bugzilla: 55093
CVE: NA

--------------------------------

[ Upstream commit f747e666 ]

GENMASK() has an input check which uses __builtin_choose_expr() to
enable a compile time sanity check of its inputs if they are known at
compile time.

However, it turns out that __builtin_constant_p() does not always return
a compile time constant [0].  It was thought this problem was fixed with
gcc 4.9 [1], but apparently this is not the case [2].

Switch to use __is_constexpr() instead which always returns a compile time
constant, regardless of its inputs.

Link: https://lore.kernel.org/lkml/42b4342b-aefc-a16a-0d43-9f9c0d63ba7a@rasmusvillemoes.dk [0]
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19449 [1]
Link: https://lore.kernel.org/lkml/1ac7bbc2-45d9-26ed-0b33-bf382b8d858b@I-love.SAKURA.ne.jp [2]
Link: https://lkml.kernel.org/r/20210511203716.117010-1-rikard.falkeborn@gmail.comSigned-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com>
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cffb222b

{net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table · b4bb871d

由 Eli Cohen 提交于 6月 07, 2021

stable inclusion
from stable-5.10.42
commit 89a0e388c6f2ea73434727ca4780150874e225a7
bugzilla: 55093
CVE: NA

--------------------------------

commit 7c9f131f upstream.

net/mlx5: Expose MPFS configuration API

MPFS is the multi physical function switch that bridges traffic between
the physical port and any physical functions associated with it. The
driver is required to add or remove MAC entries to properly forward
incoming traffic to the correct physical function.

We export the API to control MPFS so that other drivers, such as
mlx5_vdpa are able to add MAC addresses of their network interfaces.

The MAC address of the vdpa interface must be configured into the MPFS L2
address. Failing to do so could cause, in some NIC configurations, failure
to forward packets to the vdpa network device instance.

Fix this by adding calls to update the MPFS table.

CC: <mst@redhat.com>
CC: <jasowang@redhat.com>
CC: <virtualization@lists.linux-foundation.org>
Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: NEli Cohen <elic@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b4bb871d

drivers: base: Fix device link removal · b9dbe96f

由 Rafael J. Wysocki 提交于 6月 07, 2021

stable inclusion
from stable-5.10.42
commit d007150b4e15bfcb8d36cfd88a5645d42e44d383
bugzilla: 55093
CVE: NA

--------------------------------

commit 80dd33cf upstream.

When device_link_free() drops references to the supplier and
consumer devices of the device link going away and the reference
being dropped turns out to be the last one for any of those
device objects, its ->release callback will be invoked and it
may sleep which goes against the SRCU callback execution
requirements.

To address this issue, make the device link removal code carry out
the device_link_free() actions preceded by SRCU synchronization from
a separate work item (the "long" workqueue is used for that, because
it does not matter when the device link memory is released and it may
take time to get to that point) instead of using SRCU callbacks.

While at it, make the code work analogously when SRCU is not enabled
to reduce the differences between the SRCU and non-SRCU cases.

Fixes: 843e600b ("driver core: Fix sleeping in invalid context during device link deletion")
Cc: stable <stable@vger.kernel.org>
Reported-by: Nchenxiang (M) <chenxiang66@hisilicon.com>
Tested-by: Nchenxiang (M) <chenxiang66@hisilicon.com>
Reviewed-by: NSaravana Kannan <saravanak@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/5722787.lOV4Wx5bFT@kreacherSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b9dbe96f

04 6月, 2021 9 次提交

livepatch: put memory alloc and free out stop machine · 000c0197

由 Ye Weihua 提交于 5月 31, 2021

hulk inclusion
category: feature
bugzilla: 51924
CVE: NA

---------------------------

When a livepatch is insmod, stop machine will stop other cores, which
interrupts services. Therefore, the shorter the stop machine duration,
the better. The application and release of memory from the stop machine
can shorten the time for stopping the machine.

Especially, module_alloc and module_memfree is a kind of vmalloc, that
may sleep when called. So it is not permitted to use them in stop machine
context.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

000c0197

livepatch/core: Support function force patched/unpatched · 673d37a2

由 Cheng Jian 提交于 5月 29, 2021

euler inclusion
category: feature
bugzilla: 51921
CVE: N/A

----------------------------------------

Some functions in the kernel are always on the stack of some
thread in the system. Attempts to patch these function will
currently always fail the activeness safety check.

However, through human inspection, it can be determined that,
for a particular function, consistency is maintained even if
the old and new versions of the function run concurrently.

commit 2e93c5e1e3dc ("support forced patching") in kpatch-build
introduces a KPATCH_FORCE_UNSAFE() macro to define patched
functions that such be exempted from the activeness safety
check. now kernel implement this feature.
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NLi Bin <huawei.libin@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: NDong Kai <dongkai11@huawei.com>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

673d37a2

livepatch/ppc64: Support use func_descr for new_func · 8ef169ed

由 Cheng Jian 提交于 5月 29, 2021

hulk inclusion
category: feature
bugzilla: 51924
CVE: NA

---------------------------

The ppc64 ABI V1 function pointer points to the function descriptor,
which we use in the sample demo.

        $cat /proc/kallsyms | grep  livepatch_cmdline_proc_show
        80000000000d4830 d livepatch_cmdline_proc_show  [livepatch_sample]      -=> func descr
        80000000000d40c0 t .livepatch_cmdline_proc_show [livepatch_sample]      -=> func addr

However, the livepatch module made by kpatch just passes the address of
the function to kernel(saved in func->new_func), so the kernel needs to
obtain the toc address and combine the function descriptors to implement
long jump.
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NDong Kai <dongkai11@huawei.com>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8ef169ed

livepatch/ppc64: Implement livepatch without ftrace for ppc64be · ac81d625

由 Cheng Jian 提交于 5月 29, 2021

hulk inclusion
category: feature
bugzilla: 51924
CVE: NA

---------------------------

Initially completed the livepatch for ppc64be, the call from
the old function to the new function, using stub space, this
is actually problematic, because we cannot effectively recover
R2. This problem will be fixed later.
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NDong Kai <dongkai11@huawei.com>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ac81d625

livepatch/core: Revert module_enable_ro and module_disable_ro · f440d90f

由 Dong Kai 提交于 5月 29, 2021

hulk inclusion
category: feature
bugzilla: 51921
CVE: NA

---------------------------

After commit d556e1be ("livepatch: Remove module_disable_ro() usage")
and commit 0d9fbf78 ("module: Remove module_disable_ro()") and
commit e6eff437 ("module: Make module_enable_ro() static again") merged,
the module_disable_ro is removed and module_enable_ro is make static.

It's ok for x86/ppc platform because the livepatch module relocation is
done by text poke func which internally modify the text addr by remap
to high virtaddr which has write permission.

However for arm/arm64 platform, it's apply_relocate[_add] still directly
modify the text code so we should change the module text permission before
relocation. Otherwise it will lead to following problem:

  Unable to handle kernel write to read-only memory at virtual address ffff800008a95288
  Mem abort info:
  ESR = 0x9600004f
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  Data abort info:
  ISV = 0, ISS = 0x0000004f
  CM = 0, WnR = 1
  swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000004133c000
  [ffff800008a95288] pgd=00000000bdfff003, p4d=00000000bdfff003, pud=00000000bdffe003,
		     pmd=0000000080ce7003, pte=0040000080d5d783
  Internal error: Oops: 9600004f [#1] PREEMPT SMP
  Modules linked in: livepatch_testmod_drv(OK+) testmod_drv(O)
  CPU: 0 PID: 139 Comm: insmod Tainted: G           O  K   5.10.0-01131-gf6b4602e09b2-dirty #35
  Hardware name: linux,dummy-virt (DT)
  pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
  pc : reloc_insn_imm+0x54/0x78
  lr : reloc_insn_imm+0x50/0x78
  sp : ffff800011cf3910
  ...
  Call trace:
   reloc_insn_imm+0x54/0x78
   apply_relocate_add+0x464/0x680
   klp_apply_section_relocs+0x11c/0x148
   klp_enable_patch+0x338/0x998
   patch_init+0x338/0x1000 [livepatch_testmod_drv]
   do_one_initcall+0x60/0x1d8
   do_init_module+0x58/0x1e0
   load_module+0x1fb4/0x2688
   __do_sys_finit_module+0xc0/0x128
   __arm64_sys_finit_module+0x20/0x30
   do_el0_svc+0x84/0x1b0
   el0_svc+0x14/0x20
   el0_sync_handler+0x90/0xc8
   el0_sync+0x158/0x180
   Code: 2a0503e0 9ad42a73 97d6a499 91000673 (b90002a0)
   ---[ end trace 67dd2ef1203ed335 ]---

Though the permission change is not necessary to x86/ppc platform, consider
that the jump_label_register api may modify the text code either, we just
put the change handle here instead of putting it in arch-specific relocate.

Besides, the jump_label_module_nb callback called in jump_label_register
also maybe need motify the module code either, it sort and swap the jump
entries if necessary. So just disable ro before jump_label handling and
restore it back.
Signed-off-by: NDong Kai <dongkai11@huawei.com>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f440d90f

livepatch/core: Support jump_label · cd79d861

由 Cheng Jian 提交于 5月 29, 2021

hulk inclusion
category: feature
bugzilla: 51921
CVE: NA

-----------------------------------------------

The kpatch-build processes the __jump_table special section,
and only the jump_lable used by the changed functions will be
included in __jump_table section, and the livepatch should
process the tracepoint again after the dynamic relocation.

NOTE: adding new tracepoints definition is not supported.
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: NDong Kai <dongkai11@huawei.com>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cd79d861

livepatch/core: Supprt load and unload hooks · 4773e57d

由 Cheng Jian 提交于 5月 29, 2021

euler inclusion
category: feature
bugzilla: 51921
CVE: N/A

----------------------------------------

The front-tools kpatch-build support load and unload hooks
in the older version and already changed to use pre/post
callbacks after commit 93862e38 ("livepatch: add (un)patch
callbacks").

However, for livepatch based on stop machine consistency,
this callbacks will be called within stop_machine context if
we using it. This is dangerous because we can't known what
the user will do in the callbacks. It may trigger system
crash if using any function which internally might sleep.

Here we use the old load/unload hooks to allow user-defined
hooks. Although it's not good enough compared to pre/post
callbacks, it can meets user needs to some extent.
Of cource, this requires cooperation of kpatch-build tools.
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: NDong Kai <dongkai11@huawei.com>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4773e57d

livepatch/core: Split livepatch consistency · 086c4b46

由 Cheng Jian 提交于 5月 29, 2021

euler inclusion
category: feature
bugzilla: 51921
CVE: N/A

----------------------------------------

In the previous version we forced the association between
livepatch wo_ftrace and stop_machine. This is unwise and
obviously confusing.

commit d83a7cb3 ("livepatch: change to a per-task
consistency model") introduce a PER-TASK consistency model.
It's a hybrid of kGraft and kpatch: it uses kGraft's per-task
consistency and syscall barrier switching combined with
kpatch's stack trace switching. There are also a number of
fallback options which make it quite flexible.

So we split livepatch consistency for without ftrace to two model:
[1] PER-TASK consistency model.
per-task consistency and syscall barrier switching combined with
kpatch's stack trace switching.

[2] STOP-MACHINE consistency model.
stop-machine consistency and kpatch's stack trace switching.
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NLi Bin <huawei.libin@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: NDong Kai <dongkai11@huawei.com>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

086c4b46

livepatch/core: Allow implementation without ftrace · c33e4283

由 Cheng Jian 提交于 5月 29, 2021

euler inclusion
category: feature
bugzilla: 51921
CVE: NA

----------------------------------------

support for livepatch without ftrace mode

new config for WO_FTRACE
	CONFIG_LIVEPATCH_WO_FTRACE=y
	CONFIG_LIVEPATCH_STACK=y

Implements livepatch without ftrace by direct jump, we
directly modify the first few instructions(usually one,
but four for long jumps under ARM64) of the old function
as jump instructions by stop_machine, so it will jump to
the first address of the new function when livepatch enable

KERNEL/MODULE
call/bl A---------------old_A------------
                        | jump new_A----+--------|
                        |               |        |
                        |               |        |
                        -----------------        |
                                                 |
                                                 |
                                                 |
livepatch_module-------------                    |
|                           |                    |
|new_A <--------------------+--------------------|
|                           |
|                           |
|---------------------------|
| .plt                      |
| ......PLTS for livepatch  |
-----------------------------

something we need to consider under different architectures:

1. jump instruction
2. partial relocation in new function requires for livepatch.
3. long jumps may be required if the jump address exceeds the
   offset. both for livepatch relocation and livepatch enable.
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NLi Bin <huawei.libin@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: NDong Kai <dongkai11@huawei.com>
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c33e4283

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功