提交 · b302dc7f215f227677897d53260e2c14257ee3db · openeuler / Kernel

17 1月, 2022 1 次提交

NFC: reorganize the functions in nci_request · b302dc7f

由 Lin Ma 提交于 1月 17, 2022

stable inclusion
from linux-4.19.218
commit 62be2b1e7914b7340281f09412a7bbb62e6c8b67
CVE: CVE-2021-4202

--------------------------------

[ Upstream commit 86cdf8e3 ]

There is a possible data race as shown below:

thread-A in nci_request()       | thread-B in nci_close_device()
                                | mutex_lock(&ndev->req_lock);
test_bit(NCI_UP, &ndev->flags); |
...                             | test_and_clear_bit(NCI_UP, &ndev->flags)
mutex_lock(&ndev->req_lock);    |
                                |

This race will allow __nci_request() to be awaked while the device is
getting removed.

Similar to commit e2cb6b89 ("bluetooth: eliminate the potential race
condition when removing the HCI controller"). this patch alters the
function sequence in nci_request() to prevent the data races between the
nci_close_device().
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Fixes: 6a2968aa ("NFC: basic NCI protocol implementation")
Link: https://lore.kernel.org/r/20211115145600.8320-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b302dc7f

13 1月, 2022 3 次提交

ext4: Fix BUG_ON in ext4_bread when write quota data · c113ae0d

由 Ye Bin 提交于 1月 13, 2022

mainline inclusion
from mainline-v5.17
commit ce85548ab4295234b4f8e63a0eea0c157d2f6b25
category: bugfix
bugzilla: 185930
CVE: NA

-----------------------------------------------

We got issue as follows when run syzkaller:
[  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
[  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
[  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
[  167.983601] ------------[ cut here ]------------
[  167.984245] kernel BUG at fs/ext4/inode.c:847!
[  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
[  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
[  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
[  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
[  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
[  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
[  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
[  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
[  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
[  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
[  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
[  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  168.001899] Call Trace:
[  168.002235]  <TASK>
[  168.007167]  ext4_bread+0xd/0x53
[  168.007612]  ext4_quota_write+0x20c/0x5c0
[  168.010457]  write_blk+0x100/0x220
[  168.010944]  remove_free_dqentry+0x1c6/0x440
[  168.011525]  free_dqentry.isra.0+0x565/0x830
[  168.012133]  remove_tree+0x318/0x6d0
[  168.014744]  remove_tree+0x1eb/0x6d0
[  168.017346]  remove_tree+0x1eb/0x6d0
[  168.019969]  remove_tree+0x1eb/0x6d0
[  168.022128]  qtree_release_dquot+0x291/0x340
[  168.023297]  v2_release_dquot+0xce/0x120
[  168.023847]  dquot_release+0x197/0x3e0
[  168.024358]  ext4_release_dquot+0x22a/0x2d0
[  168.024932]  dqput.part.0+0x1c9/0x900
[  168.025430]  __dquot_drop+0x120/0x190
[  168.025942]  ext4_clear_inode+0x86/0x220
[  168.026472]  ext4_evict_inode+0x9e8/0xa22
[  168.028200]  evict+0x29e/0x4f0
[  168.028625]  dispose_list+0x102/0x1f0
[  168.029148]  evict_inodes+0x2c1/0x3e0
[  168.030188]  generic_shutdown_super+0xa4/0x3b0
[  168.030817]  kill_block_super+0x95/0xd0
[  168.031360]  deactivate_locked_super+0x85/0xd0
[  168.031977]  cleanup_mnt+0x2bc/0x480
[  168.033062]  task_work_run+0xd1/0x170
[  168.033565]  do_exit+0xa4f/0x2b50
[  168.037155]  do_group_exit+0xef/0x2d0
[  168.037666]  __x64_sys_exit_group+0x3a/0x50
[  168.038237]  do_syscall_64+0x3b/0x90
[  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae

In order to reproduce this problem, the following conditions need to be met:
1. Ext4 filesystem with no journal;
2. Filesystem image with incorrect quota data;
3. Abort filesystem forced by user;
4. umount filesystem;

As in ext4_quota_write:
...
         if (EXT4_SB(sb)->s_journal && !handle) {
                 ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
                         " cancelled because transaction is not started",
                         (unsigned long long)off, (unsigned long long)len);
                 return -EIO;
         }
...
We only check handle if NULL when filesystem has journal. There is need
check handle if NULL even when filesystem has no journal.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c113ae0d

PM: hibernate: use correct mode for swsusp_close() · 2ae13e7a

由 Thomas Zeitlhofer 提交于 1月 13, 2022

stable inclusion
from linux-v4.19.219
commit 68945e943519df1532e598fafab16ac54488933f

---------------------------------------------------

[ Upstream commit cefcf24b ]

Commit 39fbef4b ("PM: hibernate: Get block device exclusively in
swsusp_check()") changed the opening mode of the block device to
(FMODE_READ | FMODE_EXCL).

In the corresponding calls to swsusp_close(), the mode is still just
FMODE_READ which triggers the warning in blkdev_flush_mapping() on
resume from hibernate.

So, use the mode (FMODE_READ | FMODE_EXCL) also when closing the
device.

Fixes: 39fbef4b ("PM: hibernate: Get block device exclusively in swsusp_check()")
Signed-off-by: NThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2ae13e7a

Revert "watchdog: Fix check_preemption_disabled() error" · c8f15bf5

由 Yang Yingliang 提交于 1月 13, 2022

hulk inclusion
category: bugfix
bugzilla: 173968, https://gitee.com/openeuler/kernel/issues/I3J87Y
CVE: NA

---------------------------

This reverts commit b2e484e9.

When CONFIG_LOCKDEP and CONFIG_DEBUG_LOCKDEP are enabled, it detects the following error:

[   10.145007] BUG: sleeping function called from invalid context at mm/slab.h:418
[   10.145394] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
[   10.145765] Preemption disabled at:
[   10.145978] [<ffff000008f8e7b4>] hardlockup_detector_perf_init+0x20/0x100
[   10.146770] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.90+ #3
[   10.148242] Hardware name: linux,dummy-virt (DT)
[   10.148572] Call trace:
[   10.148667]  dump_backtrace+0x0/0x190
[   10.148765]  show_stack+0x24/0x30
[   10.148875]  dump_stack+0xa4/0xf8
[   10.148964]  ___might_sleep+0x150/0x180
[   10.149065]  __might_sleep+0x58/0x90
[   10.149199]  kmem_cache_alloc_trace+0x244/0x2b0
[   10.149308]  perf_event_alloc+0x74/0x680
[   10.149402]  perf_event_create_kernel_counter+0x2c/0x190
[   10.149516]  arch_probe_cpu_freq+0x84/0x1ac
[   10.149611]  hw_nmi_get_sample_period+0xb8/0x180
[   10.149713]  hardlockup_detector_event_create+0x28/0xfc
[   10.149827]  hardlockup_detector_perf_init+0x24/0x100
[   10.149943]  watchdog_nmi_probe+0x14/0x1c
[   10.150037]  lockup_detector_init+0x58/0x98
[   10.150173]  kernel_init_freeable+0x10c/0x1c4
[   10.150298]  kernel_init+0x18/0x110
[   10.150422]  ret_from_fork+0x10/0x18

In 'b2e484e9 ("watchdog: Fix check_preemption_disabled() error")', we
tried to fix check_preemption_disabled() error by disabling preemption in
hardlockup_detector_perf_init(), but missed that function
perf_event_create_kernel_counter() may sleep.

The preemption is always disabled, the problem that wanted be fixed is not
existed, so just revert this commit.

Fixes: b2e484e9 ("watchdog: Fix check_preemption_disabled() error")
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c8f15bf5

06 1月, 2022 2 次提交

arm64/mpam: fix mpam dts init arm_mpam_of_device_ids error · 7ea0c3fe

由 Xingang Wang 提交于 1月 06, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

---------------------------------------------------

[    0.596145] BUG: KASAN: global-out-of-bounds in __of_match_node.part.0+0xe0/0x110
[    0.596731] Read of size 1 at addr ffff2000099a8288 by task swapper/0/1
[    0.597247]
[    0.597372] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.90+ #34
[    0.597858] Hardware name: linux,dummy-virt (DT)
[    0.598243] Call trace:
[    0.598443]  dump_backtrace+0x0/0x360
[    0.598734]  show_stack+0x24/0x30
[    0.599004]  dump_stack+0xdc/0x128
[    0.599323]  print_address_description+0x184/0x278
[    0.599771]  kasan_report+0x204/0x330
[    0.600117]  __asan_report_load1_noabort+0x30/0x40
[    0.600566]  __of_match_node.part.0+0xe0/0x110
[    0.600980]  of_match_node+0x6c/0xa8
[    0.601316]  of_match_device+0x48/0x70
[    0.601669]  platform_match+0xa4/0x260
[    0.602037]  __driver_attach+0x68/0x128
[    0.602397]  bus_for_each_dev+0x118/0x198
[    0.602773]  driver_attach+0x48/0x60
[    0.603112]  bus_add_driver+0x330/0x658
[    0.603472]  driver_register+0x148/0x398
[    0.603839]  __platform_driver_register+0xd4/0x108
[    0.604288]  arm_mpam_driver_init+0x64/0x78
[    0.604680]  do_one_initcall+0xbc/0x488
[    0.605039]  kernel_init_freeable+0x604/0x6f8
[    0.605447]  kernel_init+0x18/0x130
[    0.605775]  ret_from_fork+0x10/0x18
[    0.606130]
[    0.606274] The buggy address belongs to the variable:
[    0.606754]  arm_mpam_of_device_ids+0xc8/0x380
[    0.607168]
[    0.607314] Memory state around the buggy address:
[    0.607762]  ffff2000099a8180: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 00
[    0.608429]  ffff2000099a8200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.609095] >ffff2000099a8280: 00 fa fa fa fa fa fa fa 05 fa fa fa fa fa fa fa
[    0.609760]                       ^
[    0.610101]  ffff2000099a8300: 00 00 07 fa fa fa fa fa 00 04 fa fa fa fa fa fa
[    0.610771]  ffff2000099a8380: 00 00 00 06 fa fa fa fa 00 01 fa fa fa fa fa fa

The arm_mpam_of_device_ids array has no end item, so the array access
might be out of bounds. When enable the KASAN config, the out of bounds
call trace occured. The add empty end item for arm_mpam_of_device_ids
array to fix this issue.

Fixes: b45bdb5a ("arm64/mpam: add device tree support for mpam initialization")
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

7ea0c3fe

arm64/mpam: fix mpam probe error for wrong init order · 82e2f45f

由 Xingang Wang 提交于 1月 06, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

---------------------------------------------------

The mpam init procedure failed when probe with ACPI:
[    1.148657 ] ACPI MPAM: No CPU has cache with PPTT reference 0x72
[    1.148658 ] ACPI MPAM: All CPUs must be online to probe mpam.
[    1.148660 ] ACPI MPAM: discovery failed: -19

This is because mpam need to be probed after all cpus be online, the
arm_mpam_driver_init must be called after cacheinfo_sysfs_init, so the
device_initcall should be replaced with device_initcall_sync.
Fixes: b45bdb5a ("arm64/mpam: add device tree support for mpam initialization")
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

82e2f45f

31 12月, 2021 10 次提交

mm: export collect_procs() · bb784b81

由 Zhang Jian 提交于 12月 31, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OXH9
CVE: NA

-------------------------------------------------

Collect the processes who have the page mapped via collect_procs().

@page if the page is a part of the hugepages/compound-page, we must
using compound_head() to find it's head page to prevent the kernel panic,
and make the page be locked.

@to_kill the function will return a linked list, when we have used
this list, we must kfree the list.

@force_early if we want to find all process, we must make it be true, if
it's false, the function will only return the process who have PF_MCE_PROCESS
or PF_MCE_EARLY mark.

limits: if force_early is true, sysctl_memory_failure_early_kill is useless.
If it's false, no process have PF_MCE_PROCESS and PF_MCE_EARLY flag, and
the sysctl_memory_failure_early_kill is enabled, function will return all tasks
whether the task have the PF_MCE_PROCESS and PF_MCE_EARLY flag.
Signed-off-by: NZhang Jian <zhangjian210@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bb784b81

net: hns: update hns version to 21.12.1 · 5dd1df36

由 Yonglong Liu 提交于 12月 31, 2021

driver inclusion
category: other
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OSUK
CVE: NA

----------------------------
Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
Reviewed-by: NKangfenglong <kangfenglong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

5dd1df36

net: hns: fix bug when two ports opened promisc mode both · 12601a9b

由 Yonglong Liu 提交于 12月 31, 2021

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OSUK
CVE: NA

----------------------------

When just adds eth1 to an OVS network, and eth1 and eth0 open
promisc mode both, the icmp6 neighbor solicitation packets from
OVS to eth1 will be sent back to the OVS network, cause
incorrect learning of arp.

The hns driver used a TCAM table to handle the promisc settings,
when setting TCAM table, the port mask of multicast should be
'0xf'(exact match), not 'port number'(fuzzy match). So when two
ports has the wrong port mask both, The icmp6 neighbor
solicitation packets will be incorrectly sent back to eth1.

This patch adds a mac_key to record the acturally port number,
use mask_key to record the 'exact match' port number to fix the
bug.

Fixes: a6c8c2c9a089 ("net: hns: fix non-promiscuous mode does not take effect problem")
Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
Reviewed-by: NKangfenglong <kangfenglong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

12601a9b

net: hns3: update hns3 version to 21.12.4 · cf7dfd77

由 Yonglong Liu 提交于 12月 31, 2021

driver inclusion
category: other
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OSRU
CVE: NA

----------------------------
Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
Reviewed-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

cf7dfd77

net: hns3: fix the concurrency between functions reading debugfs · aae46585

由 Yufeng Mo 提交于 12月 31, 2021

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OSRU
CVE: NA

----------------------------

[1298504.847848] Call trace:
[1298504.847859] [<ffff000008089e14>] dump_backtrace+0x0/0x23c
[1298504.847865] [<ffff00000808a074>] show_stack+0x24/0x2c
[1298504.847870] [<ffff0000088568a8>] dump_stack+0x84/0xa8
[1298504.847878] [<ffff0000082122fc>] bad_page+0xec/0x14c
[1298504.847883] [<ffff000008219384>] free_pages_check_bad+0x90/0x9c
[1298504.847888] [<ffff00000821307c>] __free_pages_ok+0x2b8/0x2ec
[1298504.847894] [<ffff0000082153ec>] __free_pages+0x44/0x64
[1298504.847900] [<ffff000008288788>] kfree+0x198/0x1a0
[1298504.847905] [<ffff00000823432c>] kvfree+0x3c/0x58
[1298504.847937] [<ffff0000014fabf4>] hns3_dbg_read+0xf4/0x278 [hns3]
[1298504.847944] [<ffff000008359550>] full_proxy_read+0x60/0x90
[1298504.847949] [<ffff0000082b22a4>] __vfs_read+0x58/0x178
[1298504.847952] [<ffff0000082b2454>] vfs_read+0x90/0x14c
[1298504.847956] [<ffff0000082b2b70>] SyS_read+0x60/0xc0

When different functions reading the same debugfs node, it will
cause double free problem, because different functions shared
the same node buffer.

This patch make different functions have their own buffer to fix
the problem.

Fixes: 319ba0a4 ("net: hns3: fix race condition in debugfs")
Fixes: c91910ef ("net: hns3: refactor the debugfs process")
Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
Reviewed-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

aae46585

f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() · 17bc8efe

由 Chao Yu 提交于 12月 31, 2021

mainline inclusion
from mainline-v5.17
commit 5598b24efaf4892741c798b425d543e4bed357a1
category: bugfix
CVE: CVE-2021-45469

--------------------------------

As Wenqing Liu reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=215235

- Overview
page fault in f2fs_setxattr() when mount and operate on corrupted image

- Reproduce
tested on kernel 5.16-rc3, 5.15.X under root

1. unzip tmp7.zip
2. ./single.sh f2fs 7

Sometimes need to run the script several times

- Kernel dump
loop0: detected capacity change from 0 to 131072
F2FS-fs (loop0): Found nat_bits in checkpoint
F2FS-fs (loop0): Mounted with checkpoint version = 7548c2ee
BUG: unable to handle page fault for address: ffffe47bc7123f48
RIP: 0010:kfree+0x66/0x320
Call Trace:
 __f2fs_setxattr+0x2aa/0xc00 [f2fs]
 f2fs_setxattr+0xfa/0x480 [f2fs]
 __f2fs_set_acl+0x19b/0x330 [f2fs]
 __vfs_removexattr+0x52/0x70
 __vfs_removexattr_locked+0xb1/0x140
 vfs_removexattr+0x56/0x100
 removexattr+0x57/0x80
 path_removexattr+0xa3/0xc0
 __x64_sys_removexattr+0x17/0x20
 do_syscall_64+0x37/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The root cause is in __f2fs_setxattr(), we missed to do sanity check on
last xattr entry, result in out-of-bound memory access during updating
inconsistent xattr data of target inode.

After the fix, it can detect such xattr inconsistency as below:

F2FS-fs (loop11): inode (7) has invalid last xattr entry, entry_size: 60676
F2FS-fs (loop11): inode (8) has corrupted xattr
F2FS-fs (loop11): inode (8) has corrupted xattr
F2FS-fs (loop11): inode (8) has invalid last xattr entry, entry_size: 47736

Cc: stable@vger.kernel.org
Reported-by: NWenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
  fs/f2fs/xattr.c
[yyl: replace f2fs_err() with f2fs_msg(KERN_ERR)]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Nfang wei <fangwei1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

17bc8efe

mwifiex: Fix skb_over_panic in mwifiex_usb_recv() · 6cb8051b

由 Zekun Shen 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.17
commit 04d80663
category: bugfix
CVE: CVE-2021-43976

--------------------------------

Currently, with an unknown recv_type, mwifiex_usb_recv
just return -1 without restoring the skb. Next time
mwifiex_usb_rx_complete is invoked with the same skb,
calling skb_put causes skb_over_panic.

The bug is triggerable with a compromised/malfunctioning
usb device. After applying the patch, skb_over_panic
no longer shows up with the same input.

Attached is the panic report from fuzzing.
skbuff: skb_over_panic: text:000000003bf1b5fa
 len:2048 put:4 head:00000000dd6a115b data:000000000a9445d8
 tail:0x844 end:0x840 dev:<NULL>
kernel BUG at net/core/skbuff.c:109!
invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 PID: 198 Comm: in:imklog Not tainted 5.6.0 #60
RIP: 0010:skb_panic+0x15f/0x161
Call Trace:
 <IRQ>
 ? mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb]
 skb_put.cold+0x24/0x24
 mwifiex_usb_rx_complete+0x26b/0xfcd [mwifiex_usb]
 __usb_hcd_giveback_urb+0x1e4/0x380
 usb_giveback_urb_bh+0x241/0x4f0
 ? __hrtimer_run_queues+0x316/0x740
 ? __usb_hcd_giveback_urb+0x380/0x380
 tasklet_action_common.isra.0+0x135/0x330
 __do_softirq+0x18c/0x634
 irq_exit+0x114/0x140
 smp_apic_timer_interrupt+0xde/0x380
 apic_timer_interrupt+0xf/0x20
 </IRQ>
Reported-by: NBrendan Dolan-Gavitt <brendandg@nyu.edu>
Signed-off-by: NZekun Shen <bruceshenzk@gmail.com>
Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/YX4CqjfRcTa6bVL+@Zekuns-MBP-16.fios-router.homeReviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

6cb8051b

tee: handle lookup of shm with reference count 0 · 610447b2

由 Jens Wiklander 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.17
commit dfd0743f
category: bugfix
bugzilla: NA
CVE: CVE-2021-44733

--------------------------------

Since the tee subsystem does not keep a strong reference to its idle
shared memory buffers, it races with other threads that try to destroy a
shared memory through a close of its dma-buf fd or by unmapping the
memory.

In tee_shm_get_from_id() when a lookup in teedev->idr has been
successful, it is possible that the tee_shm is in the dma-buf teardown
path, but that path is blocked by the teedev mutex. Since we don't have
an API to tell if the tee_shm is in the dma-buf teardown path or not we
must find another way of detecting this condition.

Fix this by doing the reference counting directly on the tee_shm using a
new refcount_t refcount field. dma-buf is replaced by using
anon_inode_getfd() instead, this separates the life-cycle of the
underlying file from the tee_shm. tee_shm_put() is updated to hold the
mutex when decreasing the refcount to 0 and then remove the tee_shm from
teedev->idr before releasing the mutex. This means that the tee_shm can
never be found unless it has a refcount larger than 0.

Fixes: 967c9cca ("tee: generic TEE subsystem")
Cc: stable@vger.kernel.org
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NLars Persson <larper@axis.com>
Reviewed-by: NSumit Garg <sumit.garg@linaro.org>
Reported-by: NPatrik Lantz <patrik.lantz@axis.com>
Signed-off-by: NJens Wiklander <jens.wiklander@linaro.org>
Conflicts:
  drivers/tee/tee_shm.c
  include/linux/tee_drv.h
[yyl: adjust context]
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

610447b2

tee: don't assign shm id for private shms · 08efdb81

由 Jens Wiklander 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.7-rc1
commit f1bbaced
category: cleanup
bugzilla: NA
CVE: NA

Prepare for fixing CVE-2021-44733.
--------------------------------

Private shared memory object must not be referenced from user space. To
guarantee that, don't assign an id to shared memory objects which are
driver private.
Signed-off-by: NJens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

08efdb81

tee: remove linked list of struct tee_shm · e2cda3d9

由 Jens Wiklander 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.7-rc1
commit 59a135f6
category: cleanup
bugzilla: NA
CVE: NA

Prepare for fixing CVE-2021-44733.
--------------------------------

Removes list_shm from struct tee_context since the linked list isn't used
any longer.
Signed-off-by: NJens Wiklander <jens.wiklander@linaro.org>
Conflicts:
  drivers/tee/tee_core.c
[yyl: adjust context]
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e2cda3d9

30 12月, 2021 24 次提交

mm/page_alloc: Use cmdline to disable "place pages to tail" · baeaf1da

由 Peng Liu 提交于 12月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4EG0R
CVE: NA

-----------------------------------------------

Add cmdline to disable "place pages to tail" when online memory.
Signed-off-by: NPeng Liu <liupeng256@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

baeaf1da

ext4: fix an use-after-free issue about data=journal writeback mode · b42b7eb0

由 Zhang Yi 提交于 12月 30, 2021

hulk inclusion
category: bugfix
bugzilla: 185944, https://gitee.com/openeuler/kernel/issues/I4OP92
CVE: NA
---------------------------

Our syzkaller report an use-after-free issue that accessing the freed
buffer_head on the writeback page in __ext4_journalled_writepage(). The
problem is that if there was a truncate racing with the data=journalled
writeback procedure, the writeback length could become zero and
bget_one() refuse to get buffer_head's refcount, then the truncate
procedure release buffer once we drop page lock, finally, the last
ext4_walk_page_buffers() trigger the use-after-free problem.

sync                               truncate
ext4_sync_file()
 file_write_and_wait_range()
                                   ext4_setattr(0)
                                    inode->i_size = 0
  ext4_writepage()
   len = 0
   __ext4_journalled_writepage()
    page_bufs = page_buffers(page)
    ext4_walk_page_buffers(bget_one) <- does not get refcount
                                    do_invalidatepage()
                                      free_buffer_head()
    ext4_walk_page_buffers(page_bufs) <- trigger use-after-free

After commit bdf96838 ("ext4: fix race between truncate and
__ext4_journalled_writepage()"), we have already handled the racing
case, so the bget_one() and bput_one() are not needed. So this patch
simply remove these hunk, and recheck the i_size to make it safe.

Fixes: bdf96838 ("ext4: fix race between truncate and __ext4_journalled_writepage()")
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b42b7eb0

ext4: Fix null-ptr-deref in '__ext4_journal_ensure_credits' · 52d1c41a

由 Ye Bin 提交于 12月 25, 2021

hulk inclusion
category: bugfix
bugzilla: 185945, https://gitee.com/openeuler/kernel/issues/I4O33F
CVE: NA

-----------------------------------------------

We got issue as follows when run syzkaller test:
[ 1901.130043] EXT4-fs error (device vda): ext4_remount:5624: comm syz-executor.5: Abort forced by user
[ 1901.130901] Aborting journal on device vda-8.
[ 1901.131437] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.16: Detected aborted journal
[ 1901.131566] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.11: Detected aborted journal
[ 1901.132586] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.18: Detected aborted journal
[ 1901.132751] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.9: Detected aborted journal
[ 1901.136149] EXT4-fs error (device vda) in ext4_reserve_inode_write:6035: Journal has aborted
[ 1901.136837] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-fuzzer: Detected aborted journal
[ 1901.136915] ==================================================================
[ 1901.138175] BUG: KASAN: null-ptr-deref in __ext4_journal_ensure_credits+0x74/0x140 [ext4]
[ 1901.138343] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.13: Detected aborted journal
[ 1901.138398] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.1: Detected aborted journal
[ 1901.138808] Read of size 8 at addr 0000000000000000 by task syz-executor.17/968
[ 1901.138817]
[ 1901.138852] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.30: Detected aborted journal
[ 1901.144779] CPU: 1 PID: 968 Comm: syz-executor.17 Not tainted 4.19.90-vhulk2111.1.0.h893.eulerosv2r10.aarch64+ #1
[ 1901.146479] Hardware name: linux,dummy-virt (DT)
[ 1901.147317] Call trace:
[ 1901.147552]  dump_backtrace+0x0/0x2d8
[ 1901.147898]  show_stack+0x28/0x38
[ 1901.148215]  dump_stack+0xec/0x15c
[ 1901.148746]  kasan_report+0x108/0x338
[ 1901.149207]  __asan_load8+0x58/0xb0
[ 1901.149753]  __ext4_journal_ensure_credits+0x74/0x140 [ext4]
[ 1901.150579]  ext4_xattr_delete_inode+0xe4/0x700 [ext4]
[ 1901.151316]  ext4_evict_inode+0x524/0xba8 [ext4]
[ 1901.151985]  evict+0x1a4/0x378
[ 1901.152353]  iput+0x310/0x428
[ 1901.152733]  do_unlinkat+0x260/0x428
[ 1901.153056]  __arm64_sys_unlinkat+0x6c/0xc0
[ 1901.153455]  el0_svc_common+0xc8/0x320
[ 1901.153799]  el0_svc_handler+0xf8/0x160
[ 1901.154265]  el0_svc+0x10/0x218
[ 1901.154682] ==================================================================

This issue may happens like this:
	Process1                               Process2
ext4_evict_inode
  ext4_journal_start
   ext4_truncate
     ext4_ind_truncate
       ext4_free_branches
         ext4_ind_truncate_ensure_credits
	   ext4_journal_ensure_credits_fn
	     ext4_journal_restart
	       handle->h_transaction = NULL;
                                           mount -o remount,abort  /mnt
					   -> trigger JBD abort
               start_this_handle -> will return failed
  ext4_xattr_delete_inode
    ext4_journal_ensure_credits
      ext4_journal_ensure_credits_fn
        __ext4_journal_ensure_credits
	  jbd2_handle_buffer_credits
	    journal = handle->h_transaction->t_journal; ->null-ptr-deref

Now, indirect truncate process didn't handle error. To solve this issue
maybe simply add check handle is abort in '__ext4_journal_ensure_credits'
is enough, and i also think this is necessary.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

52d1c41a

scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback · a7851bec

由 Can Guo 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.11-rc4
commit 35fc4cd3
category: bugfix
bugzilla: NA
CVE: CVE-2021-39657

-----------------------------------------------

Users can initiate resets to specific SCSI device/target/host through
IOCTL. When this happens, the SCSI cmd passed to eh_device/target/host
_reset_handler() callbacks is initialized with a request whose tag is -1.
In this case it is not right for eh_device_reset_handler() callback to
count on the LUN get from hba->lrb[-1]. Fix it by getting LUN from the SCSI
device associated with the SCSI cmd.

Link: https://lore.kernel.org/r/1609157080-26283-1-git-send-email-cang@codeaurora.orgReviewed-by: NAvri Altman <avri.altman@wdc.com>
Reviewed-by: NStanley Chu <stanley.chu@mediatek.com>
Signed-off-by: NCan Guo <cang@codeaurora.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a7851bec

netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc · dc907c5a

由 Haimin Zhang 提交于 12月 30, 2021

mainline inclusion
from mainline-5.16-rc6
commit 48122177
category: bugfix
CVE: CVE-2021-4135

--------------------------------

Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
since it may cause a potential kernel information leak issue, as follows:
1. nsim_bpf_map_alloc calls nsim_map_alloc_elem to allocate elements for
a new map.
2. nsim_map_alloc_elem uses kmalloc to allocate map's value, but doesn't
zero it.
3. A user application can use IOCTL BPF_MAP_LOOKUP_ELEM to get specific
element's information in the map.
4. The kernel function map_lookup_elem will call bpf_map_copy_value to get
the information allocated at step-2, then use copy_to_user to copy to the
user buffer.
This can only leak information for an array map.

Fixes: 395cacb5 ("netdevsim: bpf: support fake map offload")
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NHaimin Zhang <tcs.kernel@gmail.com>
Link: https://lore.kernel.org/r/20211215111530.72103-1-tcs.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

dc907c5a

lib/strncpy_from_user.c: Mask out bytes after NUL terminator. · 619199ea

由 Daniel Xu 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.10-rc5
commit 6fa6d280
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NI6R
CVE: NA

--------------------------------

do_strncpy_from_user() may copy some extra bytes after the NUL
terminator into the destination buffer. This usually does not matter for
normal string operations. However, when BPF programs key BPF maps with
strings, this matters a lot.

A BPF program may read strings from user memory by calling the
bpf_probe_read_user_str() helper which eventually calls
do_strncpy_from_user(). The program can then key a map with the
destination buffer. BPF map keys are fixed-width and string-agnostic,
meaning that map keys are treated as a set of bytes.

The issue is when do_strncpy_from_user() overcopies bytes after the NUL
terminator, it can result in seemingly identical strings occupying
multiple slots in a BPF map. This behavior is subtle and totally
unexpected by the user.

This commit masks out the bytes following the NUL while preserving
long-sized stride in the fast path.

Fixes: 6ae08ae3 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
Signed-off-by: NDaniel Xu <dxu@dxuuu.xyz>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/21efc982b3e9f2f7b0379eed642294caaa0c27a7.1605642949.git.dxu@dxuuu.xyzSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

619199ea

bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers · f0620ccb

由 Daniel Borkmann 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.5-rc1
commit 6ae08ae3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NI6R
CVE: NA

--------------------------------

The current bpf_probe_read() and bpf_probe_read_str() helpers are broken
in that they assume they can be used for probing memory access for kernel
space addresses /as well as/ user space addresses.

However, plain use of probe_kernel_read() for both cases will attempt to
always access kernel space address space given access is performed under
KERNEL_DS and some archs in-fact have overlapping address spaces where a
kernel pointer and user pointer would have the /same/ address value and
therefore accessing application memory via bpf_probe_read{,_str}() would
read garbage values.

Lets fix BPF side by making use of recently added 3d708182 ("uaccess:
Add non-pagefault user-space read functions"). Unfortunately, the only way
to fix this status quo is to add dedicated bpf_probe_read_{user,kernel}()
and bpf_probe_read_{user,kernel}_str() helpers. The bpf_probe_read{,_str}()
helpers are kept as-is to retain their current behavior.

The two *_user() variants attempt the access always under USER_DS set, the
two *_kernel() variants will -EFAULT when accessing user memory if the
underlying architecture has non-overlapping address ranges, also avoiding
throwing the kernel warning via 00c42373 ("x86-64: add warning for
non-canonical user access address dereferences").

Fixes: a5e8c070 ("bpf: add bpf_probe_read_str helper")
Fixes: 2541517c ("tracing, perf: Implement BPF programs attached to kprobes")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/796ee46e948bc808d54891a1108435f8652c6ca4.1572649915.git.daniel@iogearbox.net
Conflicts:
	kernel/trace/bpf_trace.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f0620ccb

bpf: Make use of probe_user_write in probe write helper · 97b4458c

由 Daniel Borkmann 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.5-rc1
commit eb1b6688
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NI6R
CVE: NA

--------------------------------

Convert the bpf_probe_write_user() helper to probe_user_write() such that
writes are not attempted under KERNEL_DS anymore which is buggy as kernel
and user space pointers can have overlapping addresses. Also, given we have
the access_ok() check inside probe_user_write(), the helper doesn't need
to do it twice.

Fixes: 96ae5227 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/841c461781874c07a0ee404a454c3bc0459eed30.1572649915.git.daniel@iogearbox.netSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

97b4458c

uaccess: Add strict non-pagefault kernel-space read function · 22058d2b

由 Daniel Borkmann 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.5-rc1
commit 75a1a607
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NI6R
CVE: NA

--------------------------------

Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict()
helpers which by default alias to the __probe_kernel_read() and the
__strncpy_from_unsafe(), respectively, but can be overridden by archs
which have non-overlapping address ranges for kernel space and user
space in order to bail out with -EFAULT when attempting to probe user
memory including non-canonical user access addresses [0]:

  4-level page tables:
    user-space mem: 0x0000000000000000 - 0x00007fffffffffff
    non-canonical:  0x0000800000000000 - 0xffff7fffffffffff

  5-level page tables:
    user-space mem: 0x0000000000000000 - 0x00ffffffffffffff
    non-canonical:  0x0100000000000000 - 0xfeffffffffffffff

The idea is that these helpers are complementary to the probe_user_read()
and strncpy_from_unsafe_user() which probe user-only memory. Both added
helpers here do the same, but for kernel-only addresses.

Both set of helpers are going to be used for BPF tracing. They also
explicitly avoid throwing the splat for non-canonical user addresses from
00c42373 ("x86-64: add warning for non-canonical user access address
dereferences").

For compat, the current probe_kernel_read() and strncpy_from_unsafe() are
left as-is.

  [0] Documentation/x86/x86_64/mm.txt
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: x86@kernel.org
Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.netSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

22058d2b

bpf: fix script for generating man page on BPF helpers · de92a39f

由 Quentin Monnet 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.2-rc1
commit 748c7c82
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NI6R
CVE: NA

--------------------------------

The script broke on parsing function prototype for bpf_strtoul(). This
is because the last argument for the function is a pointer to an
"unsigned long". The current version of the script only accepts "const"
and "struct", but not "unsigned", at the beginning of argument types
made of several words.

One solution could be to add "unsigned" to the list, but the issue could
come up again in the future (what about "long int"?). It turns out we do
not need to have such restrictions on the words: so let's simply accept
any series of words instead.
Reported-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

de92a39f

bpf: Backport __BPF_FUNC_MAPPER and annotation from mainline · 9c91d861

由 Pu Lehui 提交于 12月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NI6R
CVE: NA

--------------------------------

BPF program call helper functions according to bpf_func_id.
Misorder id of helper function will destroy the consistency
of BPF program, making it unusable in other versions. Let's
backport __BPF_FUNC_MAPPER and the corresponding annotation
from mainline.
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9c91d861

bpf: Fix up register-based shifts in interpreter to silence KUBSAN · 3aa9ea1b

由 Daniel Borkmann 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.12
commit 28131e9d
category: bugfix
bugzilla: NA
CVE: NA

-----------------------------------------------------------------------

syzbot reported a shift-out-of-bounds that KUBSAN observed in the
interpreter:

  [...]
  UBSAN: shift-out-of-bounds in kernel/bpf/core.c:1420:2
  shift exponent 255 is too large for 64-bit type 'long long unsigned int'
  CPU: 1 PID: 11097 Comm: syz-executor.4 Not tainted 5.12.0-rc2-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:79 [inline]
   dump_stack+0x141/0x1d7 lib/dump_stack.c:120
   ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
   __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
   ___bpf_prog_run.cold+0x19/0x56c kernel/bpf/core.c:1420
   __bpf_prog_run32+0x8f/0xd0 kernel/bpf/core.c:1735
   bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
   bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
   bpf_prog_run_clear_cb include/linux/filter.h:755 [inline]
   run_filter+0x1a1/0x470 net/packet/af_packet.c:2031
   packet_rcv+0x313/0x13e0 net/packet/af_packet.c:2104
   dev_queue_xmit_nit+0x7c2/0xa90 net/core/dev.c:2387
   xmit_one net/core/dev.c:3588 [inline]
   dev_hard_start_xmit+0xad/0x920 net/core/dev.c:3609
   __dev_queue_xmit+0x2121/0x2e00 net/core/dev.c:4182
   __bpf_tx_skb net/core/filter.c:2116 [inline]
   __bpf_redirect_no_mac net/core/filter.c:2141 [inline]
   __bpf_redirect+0x548/0xc80 net/core/filter.c:2164
   ____bpf_clone_redirect net/core/filter.c:2448 [inline]
   bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2420
   ___bpf_prog_run+0x34e1/0x77d0 kernel/bpf/core.c:1523
   __bpf_prog_run512+0x99/0xe0 kernel/bpf/core.c:1737
   bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
   bpf_test_run+0x3ed/0xc50 net/bpf/test_run.c:50
   bpf_prog_test_run_skb+0xabc/0x1c50 net/bpf/test_run.c:582
   bpf_prog_test_run kernel/bpf/syscall.c:3127 [inline]
   __do_sys_bpf+0x1ea9/0x4f00 kernel/bpf/syscall.c:4406
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  [...]

Generally speaking, KUBSAN reports from the kernel should be fixed.
However, in case of BPF, this particular report caused concerns since
the large shift is not wrong from BPF point of view, just undefined.
In the verifier, K-based shifts that are >= {64,32} (depending on the
bitwidth of the instruction) are already rejected. The register-based
cases were not given their content might not be known at verification
time. Ideas such as verifier instruction rewrite with an additional
AND instruction for the source register were brought up, but regularly
rejected due to the additional runtime overhead they incur.

As Edward Cree rightly put it:

  Shifts by more than insn bitness are legal in the BPF ISA; they are
  implementation-defined behaviour [of the underlying architecture],
  rather than UB, and have been made legal for performance reasons.
  Each of the JIT backends compiles the BPF shift operations to machine
  instructions which produce implementation-defined results in such a
  case; the resulting contents of the register may be arbitrary but
  program behaviour as a whole remains defined.

  Guard checks in the fast path (i.e. affecting JITted code) will thus
  not be accepted.

  The case of division by zero is not truly analogous here, as division
  instructions on many of the JIT-targeted architectures will raise a
  machine exception / fault on division by zero, whereas (to the best
  of my knowledge) none will do so on an out-of-bounds shift.

Given the KUBSAN report only affects the BPF interpreter, but not JITs,
one solution is to add the ANDs with 63 or 31 into ___bpf_prog_run().
That would make the shifts defined, and thus shuts up KUBSAN, and the
compiler would optimize out the AND on any CPU that interprets the shift
amounts modulo the width anyway (e.g., confirmed from disassembly that
on x86-64 and arm64 the generated interpreter code is the same before
and after this fix).

The BPF interpreter is slow path, and most likely compiled out anyway
as distros select BPF_JIT_ALWAYS_ON to avoid speculative execution of
BPF instructions by the interpreter. Given the main argument was to
avoid sacrificing performance, the fact that the AND is optimized away
from compiler for mainstream archs helps as well as a solution moving
forward. Also add a comment on LSH/RSH/ARSH translation for JIT authors
to provide guidance when they see the ___bpf_prog_run() interpreter
code and use it as a model for a new JIT backend.

Reported-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
Reported-by: NKurt Manucredo <fuzzybritches0@gmail.com>
Signed-off-by: NEric Biggers <ebiggers@kernel.org>
Co-developed-by: NEric Biggers <ebiggers@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Tested-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
Cc: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/bpf/0000000000008f912605bd30d5d7@google.com
Link: https://lore.kernel.org/bpf/bac16d8d-c174-bdc4-91bd-bfa62b410190@gmail.com
conflicts:
  kernel/bpf/core.c
Signed-off-by: NHe Fengqing <hefengqing@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3aa9ea1b

xen/netback: don't queue unlimited number of packages · bba3f529