提交 · a950b115ff837f0ddce747ee3878a4c86e6b91a3 · openeuler / Kernel

13 12月, 2022 40 次提交

mm/hugetlb: fix hugetlb not supporting softdirty tracking · a950b115

由 David Hildenbrand 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 62af37c5cd7f5fd071086cab645844bf5bcdc0ef
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=62af37c5cd7f5fd071086cab645844bf5bcdc0ef

--------------------------------

commit f96f7a40 upstream.

Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.

I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.

Reproducers part of the patches.

I propose to backport both fixes to stable trees.  The first fix needs a
small adjustment.

This patch (of 2):

Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping.  In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>

 #define HUGETLB_SIZE (2 * 1024 * 1024u)

 static void clear_softdirty(void)
 {
         int fd = open("/proc/self/clear_refs", O_WRONLY);
         const char *ctrl = "4";
         int ret;

         if (fd < 0) {
                 fprintf(stderr, "open(clear_refs) failed\n");
                 exit(1);
         }
         ret = write(fd, ctrl, strlen(ctrl));
         if (ret != strlen(ctrl)) {
                 fprintf(stderr, "write(clear_refs) failed\n");
                 exit(1);
         }
         close(fd);
 }

 int main(int argc, char **argv)
 {
         char *map;
         int fd;

         fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
         if (!fd) {
                 fprintf(stderr, "open() failed\n");
                 return -errno;
         }
         if (ftruncate(fd, HUGETLB_SIZE)) {
                 fprintf(stderr, "ftruncate() failed\n");
                 return -errno;
         }

         map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
         if (map == MAP_FAILED) {
                 fprintf(stderr, "mmap() failed\n");
                 return -errno;
         }

         *map = 0;

         if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         clear_softdirty();

         if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         *map = 0;

         return 0;
 }
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
 # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
 # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 # cat /proc/meminfo | grep HugePages_
 HugePages_Total:       2
 HugePages_Free:        1
 HugePages_Rsvd:    18446744073709551615
 HugePages_Surp:        0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.

Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com
Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a950b115

xen/privcmd: fix error exit of privcmd_ioctl_dm_op() · efc2a0db

由 Juergen Gross 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 6de50db104af0dc921f593fd95c55db86a52ceef
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6de50db104af0dc921f593fd95c55db86a52ceef

--------------------------------

commit c5deb278 upstream.

The error exit of privcmd_ioctl_dm_op() is calling unlock_pages()
potentially with pages being NULL, leading to a NULL dereference.

Additionally lock_pages() doesn't check for pin_user_pages_fast()
having been completely successful, resulting in potentially not
locking all pages into memory. This could result in sporadic failures
when using the related memory in user mode.

Fix all of that by calling unlock_pages() always with the real number
of pinned pages, which will be zero in case pages being NULL, and by
checking the number of pages pinned by pin_user_pages_fast() matching
the expected number of pages.

Cc: <stable@vger.kernel.org>
Fixes: ab520be8 ("xen/privcmd: Add IOCTL_PRIVCMD_DM_OP")
Reported-by: NRustam Subkhankulov <subkhankulov@ispras.ru>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NOleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Link: https://lore.kernel.org/r/20220825141918.3581-1-jgross@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

efc2a0db

ACPI: processor: Remove freq Qos request for all CPUs · ec3c28d1

由 Riwen Lu 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 8d5f8a4f25b1cb88e5a75e83e223e1197f9d734d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8d5f8a4f25b1cb88e5a75e83e223e1197f9d734d

--------------------------------

commit 36527b9d upstream.

The freq Qos request would be removed repeatedly if the cpufreq policy
relates to more than one CPU. Then, it would cause the "called for unknown
object" warning.

Remove the freq Qos request for each CPU relates to the cpufreq policy,
instead of removing repeatedly for the last CPU of it.

Fixes: a1bb46c3 ("ACPI: processor: Add QoS requests for all CPUs")
Reported-by: NJeremy Linton <Jeremy.Linton@arm.com>
Tested-by: NJeremy Linton <jeremy.linton@arm.com>
Signed-off-by: NRiwen Lu <luriwen@kylinos.cn>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

ec3c28d1

s390: fix double free of GS and RI CBs on fork() failure · b175f5de

由 Brian Foster 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 297ae7e87a87a001dd3dfeac1cb26a42fd929708
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=297ae7e87a87a001dd3dfeac1cb26a42fd929708

--------------------------------

commit 13cccafe upstream.

The pointers for guarded storage and runtime instrumentation control
blocks are stored in the thread_struct of the associated task. These
pointers are initially copied on fork() via arch_dup_task_struct()
and then cleared via copy_thread() before fork() returns. If fork()
happens to fail after the initial task dup and before copy_thread(),
the newly allocated task and associated thread_struct memory are
freed via free_task() -> arch_release_task_struct(). This results in
a double free of the guarded storage and runtime info structs
because the fields in the failed task still refer to memory
associated with the source task.

This problem can manifest as a BUG_ON() in set_freepointer() (with
CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled)
when running trinity syscall fuzz tests on s390x. To avoid this
problem, clear the associated pointer fields in
arch_dup_task_struct() immediately after the new task is copied.
Note that the RI flag is still cleared in copy_thread() because it
resides in thread stack memory and that is where stack info is
copied.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Fixes: 8d9047f8 ("s390/runtime instrumentation: simplify task exit handling")
Fixes: 7b83c629 ("s390/guarded storage: simplify task exit handling")
Cc: <stable@vger.kernel.org> # 4.15
Reviewed-by: NGerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: NHeiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.comSigned-off-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

b175f5de

asm-generic: sections: refactor memory_intersects · 89c0993a

由 Quanyang Wang 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit c60ae878782db483918eca4b398a9c8f076aed06
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c60ae878782db483918eca4b398a9c8f076aed06

--------------------------------

commit 0c7d7cc2 upstream.

There are two problems with the current code of memory_intersects:

First, it doesn't check whether the region (begin, end) falls inside the
region (virt, vend), that is (virt < begin && vend > end).

The second problem is if vend is equal to begin, it will return true but
this is wrong since vend (virt + size) is not the last address of the
memory region but (virt + size -1) is.  The wrong determination will
trigger the misreporting when the function check_for_illegal_area calls
memory_intersects to check if the dma region intersects with stext region.

The misreporting is as below (stext is at 0x80100000):
 WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168
 DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536]
 Modules linked in:
 CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5
 Hardware name: Xilinx Zynq Platform
  unwind_backtrace from show_stack+0x18/0x1c
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from __warn+0xb0/0x198
  __warn from warn_slowpath_fmt+0x80/0xb4
  warn_slowpath_fmt from check_for_illegal_area+0x130/0x168
  check_for_illegal_area from debug_dma_map_sg+0x94/0x368
  debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128
  __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24
  dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4
  usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214
  usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118
  usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec
  usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70
  usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360
  usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440
  usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238
  usb_stor_control_thread from kthread+0xf8/0x104
  kthread from ret_from_fork+0x14/0x2c

Refactor memory_intersects to fix the two problems above.

Before the 1d7db834 ("dma-debug: use memory_intersects()
directly"), memory_intersects is called only by printk_late_init:

printk_late_init -> init_section_intersects ->memory_intersects.

There were few places where memory_intersects was called.

When commit 1d7db834 ("dma-debug: use memory_intersects()
directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA
subsystem uses it to check for an illegal area and the calltrace above
is triggered.

[akpm@linux-foundation.org: fix nearby comment typo]
Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com
Fixes: 97955936 ("asm/sections: add helpers to check for section data")
Signed-off-by: NQuanyang Wang <quanyang.wang@windriver.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thierry Reding <treding@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

89c0993a

loop: Check for overflow while configuring loop · 7cefa105

由 Siddh Raman Pant 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 6858933131d0dadac071c4d33335a9ea4b8e76cf
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6858933131d0dadac071c4d33335a9ea4b8e76cf

--------------------------------

commit c490a0b5 upstream.

The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.

loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).

The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
	lo->lo_offset = info->lo_offset;

This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
	WARN_ON_ONCE(iter->iomap.offset > iter->pos);

Thus, check for negative value during loop_set_status_from_info().

Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e

Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NSiddh Raman Pant <code@siddh.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.meSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

7cefa105

x86/bugs: Add "unknown" reporting for MMIO Stale Data · 924f61db

由 Pawan Gupta 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 14cbbb9c9914663d0eeca6b59c1c9d4f5a547ee0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=14cbbb9c9914663d0eeca6b59c1c9d4f5a547ee0

--------------------------------

commit 7df54884 upstream.

Older Intel CPUs that are not in the affected processor list for MMIO
Stale Data vulnerabilities currently report "Not affected" in sysfs,
which may not be correct. Vulnerability status for these older CPUs is
unknown.

Add known-not-affected CPUs to the whitelist. Report "unknown"
mitigation status for CPUs that are not in blacklist, whitelist and also
don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
immunity to MMIO Stale Data vulnerabilities.

Mitigation is not deployed when the status is unknown.

  [ bp: Massage, fixup. ]

Fixes: 8d50cdf8 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
Suggested-by: NAndrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

924f61db

perf/x86/lbr: Enable the branch type for the Arch LBR by default · 91e8d872

由 Kan Liang 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 090f0ac167a04d5cbc22718655f0c733a3b99775
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=090f0ac167a04d5cbc22718655f0c733a3b99775

--------------------------------

commit 32ba156d upstream.

On the platform with Arch LBR, the HW raw branch type encoding may leak
to the perf tool when the SAVE_TYPE option is not set.

In the intel_pmu_store_lbr(), the HW raw branch type is stored in
lbr_entries[].type. If the SAVE_TYPE option is set, the
lbr_entries[].type will be converted into the generic PERF_BR_* type
in the intel_pmu_lbr_filter() and exposed to the user tools.
But if the SAVE_TYPE option is NOT set by the user, the current perf
kernel doesn't clear the field. The HW raw branch type leaks.

There are two solutions to fix the issue for the Arch LBR.
One is to clear the field if the SAVE_TYPE option is NOT set.
The other solution is to unconditionally convert the branch type and
expose the generic type to the user tools.

The latter is implemented here, because
- The branch type is valuable information. I don't see a case where
  you would not benefit from the branch type. (Stephane Eranian)
- Not having the branch type DOES NOT save any space in the
  branch record (Stephane Eranian)
- The Arch LBR HW can retrieve the common branch types from the
  LBR_INFO. It doesn't require the high overhead SW disassemble.

Fixes: 47125db2 ("perf/x86/intel/lbr: Support Architectural LBR")
Reported-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

91e8d872

btrfs: check if root is readonly while setting security xattr · 1da7a8cf

由 Goldwyn Rodrigues 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit d2bd18d50c1e835d154e018adb8f56d35d622528
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d2bd18d50c1e835d154e018adb8f56d35d622528

--------------------------------

commit b5111127 upstream.

For a filesystem which has btrfs read-only property set to true, all
write operations including xattr should be denied. However, security
xattr can still be changed even if btrfs ro property is true.

This happens because xattr_permission() does not have any restrictions
on security.*, system.*  and in some cases trusted.* from VFS and
the decision is left to the underlying filesystem. See comments in
xattr_permission() for more details.

This patch checks if the root is read-only before performing the set
xattr operation.

Testcase:

  DEV=/dev/vdb
  MNT=/mnt

  mkfs.btrfs -f $DEV
  mount $DEV $MNT
  echo "file one" > $MNT/f1

  setfattr -n "security.one" -v 2 $MNT/f1
  btrfs property set /mnt ro true

  setfattr -n "security.one" -v 1 $MNT/f1

  umount $MNT

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

1da7a8cf

btrfs: add info when mount fails due to stale replace target · 5086bc02

由 Anand Jain 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit dcac6293f57136370a6f2016346f7c134fdf005d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dcac6293f57136370a6f2016346f7c134fdf005d

--------------------------------

commit f2c3bec2 upstream.

If the replace target device reappears after the suspended replace is
cancelled, it blocks the mount operation as it can't find the matching
replace-item in the metadata. As shown below,

   BTRFS error (device sda5): replace devid present without an active replace item

To overcome this situation, the user can run the command

   btrfs device scan --forget <replace target device>

and try the mount command again. And also, to avoid repeating the issue,
superblock on the devid=0 must be wiped.

   wipefs -a device-path-to-devid=0.

This patch adds some info when this situation occurs.
Reported-by: NSamuel Greiner <samuel@balkonien.org>
Link: https://lore.kernel.org/linux-btrfs/b4f62b10-b295-26ea-71f9-9a5c9299d42c@balkonien.org/T/
CC: stable@vger.kernel.org # 5.0+
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

5086bc02

btrfs: replace: drop assert for suspended replace · 832fcd42

由 Anand Jain 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit b2d352ed4d489f9fdfe5caf7d5e62bd2c310f0a8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b2d352ed4d489f9fdfe5caf7d5e62bd2c310f0a8

--------------------------------

commit 59a39919 upstream.

If the filesystem mounts with the replace-operation in a suspended state
and try to cancel the suspended replace-operation, we hit the assert. The
assert came from the commit fe97e2e1 ("btrfs: dev-replace: replace's
scrub must not be running in suspended state") that was actually not
required. So just remove it.

 $ mount /dev/sda5 /btrfs

    BTRFS info (device sda5): cannot continue dev_replace, tgtdev is missing
    BTRFS info (device sda5): you may cancel the operation after 'mount -o degraded'

 $ mount -o degraded /dev/sda5 /btrfs <-- success.

 $ btrfs replace cancel /btrfs

    kernel: assertion failed: ret != -ENOTCONN, in fs/btrfs/dev-replace.c:1131
    kernel: ------------[ cut here ]------------
    kernel: kernel BUG at fs/btrfs/ctree.h:3750!

After the patch:

 $ btrfs replace cancel /btrfs

    BTRFS info (device sda5): suspended dev_replace from /dev/sda5 (devid 1) to <missing disk> canceled

Fixes: fe97e2e1 ("btrfs: dev-replace: replace's scrub must not be running in suspended state")
CC: stable@vger.kernel.org # 5.0+
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

832fcd42

btrfs: fix silent failure when deleting root reference · 8391b95e

由 Filipe Manana 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 2fc3c168d5b66ff8e673c9b6b7763dd783089134
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2fc3c168d5b66ff8e673c9b6b7763dd783089134

--------------------------------

commit 47bf225a upstream.

At btrfs_del_root_ref(), if btrfs_search_slot() returns an error, we end
up returning from the function with a value of 0 (success). This happens
because the function returns the value stored in the variable 'err',
which is 0, while the error value we got from btrfs_search_slot() is
stored in the 'ret' variable.

So fix it by setting 'err' with the error value.

Fixes: 8289ed9f ("btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling")
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

8391b95e

ionic: fix up issues with handling EAGAIN on FW cmds · 228168c4

由 Shannon Nelson 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 3a351b567e204036da00db319cf0838146ea64b4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3a351b567e204036da00db319cf0838146ea64b4

--------------------------------

[ Upstream commit 0fc4dd45 ]

In looping on FW update tests we occasionally see the
FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
waiting for the FW activate step to finsh inside the FW.  The
firmware is complaining that the done bit is set when a new
dev_cmd is going to be processed.

Doing a clean on the cmd registers and doorbell before exiting
the wait-for-done and cleaning the done bit before the sleep
prevents this from occurring.

Fixes: fbfb8031 ("ionic: Add hardware init and device commands")
Signed-off-by: NShannon Nelson <snelson@pensando.io>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

228168c4

rxrpc: Fix locking in rxrpc's sendmsg · a3a196bb

由 David Howells 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 79e2ca7aa96e80961828ab6312264633b66183cc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=79e2ca7aa96e80961828ab6312264633b66183cc

--------------------------------

[ Upstream commit b0f571ec ]

Fix three bugs in the rxrpc's sendmsg implementation:

 (1) rxrpc_new_client_call() should release the socket lock when returning
     an error from rxrpc_get_call_slot().

 (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
     held in the event that we're interrupted by a signal whilst waiting
     for tx space on the socket or relocking the call mutex afterwards.

     Fix this by: (a) moving the unlock/lock of the call mutex up to
     rxrpc_send_data() such that the lock is not held around all of
     rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
     whether we're return with the lock dropped.  Note that this means
     recvmsg() will not block on this call whilst we're waiting.

 (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
     to go and recheck the state of the tx_pending buffer and the
     tx_total_len check in case we raced with another sendmsg() on the same
     call.

Thinking on this some more, it might make sense to have different locks for
sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
that a call is dead before a sendmsg() to that call returns - but that can
currently happen anyway.

Without fix (2), something like the following can be induced:

	WARNING: bad unlock balance detected!
	5.16.0-rc6-syzkaller #0 Not tainted
	-------------------------------------
	syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
	[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
	but there are no more locks to release!

	other info that might help us debug this:
	no locks held by syz-executor011/3597.
	...
	Call Trace:
	 <TASK>
	 __dump_stack lib/dump_stack.c:88 [inline]
	 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
	 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
	 __lock_release kernel/locking/lockdep.c:5306 [inline]
	 lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
	 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
	 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
	 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
	 sock_sendmsg_nosec net/socket.c:704 [inline]
	 sock_sendmsg+0xcf/0x120 net/socket.c:724
	 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
	 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
	 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
	 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
	 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
cc: Hawkins Jiawei <yin31149@gmail.com>
cc: Khalid Masum <khalid.masum.92@gmail.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a3a196bb

ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter · 997c5686

由 Jacob Keller 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit c3a6e863d51bd7cacef6c68f92ccfd235c797428
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c3a6e863d51bd7cacef6c68f92ccfd235c797428

--------------------------------

[ Upstream commit 25d7a5f5 ]

The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
cyclecounter parameters need to be changed.

Since commit a9763f3c ("ixgbe: Update PTP to support X550EM_x
devices"), this function has cleared the SYSTIME registers and reset the
TSAUXC DISABLE_SYSTIME bit.

While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
them during ixgbe_ptp_start_cyclecounter. This function may be called
during both reset and link status change. When link changes, the SYSTIME
counter is still operating normally, but the cyclecounter should be updated
to account for the possibly changed parameters.

Clearing SYSTIME when link changes causes the timecounter to jump because
the cycle counter now reads zero.

Extract the SYSTIME initialization out to a new function and call this
during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
an unnecessary reset of the current time.

This also restores the original SYSTIME clearing that occurred during
ixgbe_ptp_reset before the commit above.
Reported-by: NSteve Payne <spayne@aurora.tech>
Reported-by: NIlya Evenbach <ievenbach@aurora.tech>
Fixes: a9763f3c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

997c5686

net: Fix a data-race around sysctl_somaxconn. · 417c1f2e

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 23cf93bb32e571cf1fc678c83888df9950749d64
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=23cf93bb32e571cf1fc678c83888df9950749d64

--------------------------------

[ Upstream commit 3c9ba81d ]

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

417c1f2e

net: Fix data-races around sysctl_devconf_inherit_init_net. · f086ac43

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 9fcc4f4066208b3383824ed8f0eba6bf47c23e87
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9fcc4f4066208b3383824ed8f0eba6bf47c23e87

--------------------------------

[ Upstream commit a5612ca1 ]

While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 856c395c ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

f086ac43

net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. · c0808a80

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 371a3bcf3144584c511f80e87d4c28ac2c75e9a7
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=371a3bcf3144584c511f80e87d4c28ac2c75e9a7

--------------------------------

[ Upstream commit af67508e ]

While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 79134e6c ("net: do not create fallback tunnels for non-default namespaces")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

c0808a80

net: Fix a data-race around netdev_budget_usecs. · dbdc517e

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit c3bda708e9c40d0a88f3c86c3d9103db44b38c0b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c3bda708e9c40d0a88f3c86c3d9103db44b38c0b

--------------------------------

[ Upstream commit fa45d484 ]

While reading netdev_budget_usecs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7acf8a1e ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

dbdc517e

net: Fix a data-race around netdev_budget. · b60ebae6

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 12a34d7f0463ebb6db01977091b5cda46d9060bc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=12a34d7f0463ebb6db01977091b5cda46d9060bc

--------------------------------

[ Upstream commit 2e0c4237 ]

While reading netdev_budget, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 51b0bded ("[NET]: Separate two usages of netdev_max_backlog.")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

b60ebae6

net: Fix a data-race around sysctl_net_busy_read. · 8e6b092e

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 410c88314ce351c9c77ec68da1d37bd91ddd76a2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=410c88314ce351c9c77ec68da1d37bd91ddd76a2

--------------------------------

[ Upstream commit e59ef36f ]

While reading sysctl_net_busy_read, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 2d48d67f ("net: poll/select low latency socket support")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

8e6b092e

net: Fix a data-race around sysctl_net_busy_poll. · 13dde690

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 2c7dae6c45112ee7ead62155fed375cb2e7d7cf8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2c7dae6c45112ee7ead62155fed375cb2e7d7cf8

--------------------------------

[ Upstream commit c42b7cdd ]

While reading sysctl_net_busy_poll, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 06021292 ("net: add low latency socket poll")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

13dde690

net: Fix a data-race around sysctl_tstamp_allow_data. · f2250eda

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 8db070463e3ebedf81dfcc1d8bc51f46e751872f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8db070463e3ebedf81dfcc1d8bc51f46e751872f

--------------------------------

[ Upstream commit d2154b0a ]

While reading sysctl_tstamp_allow_data, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: b245be1f ("net-timestamp: no-payload only sysctl")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

f2250eda

net: Fix data-races around sysctl_optmem_max. · 294ba9b6

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit ed48223f87c5c5928b6bf56c02c75dda0a2ab0dd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ed48223f87c5c5928b6bf56c02c75dda0a2ab0dd

--------------------------------

[ Upstream commit 7de6d09f ]

While reading sysctl_optmem_max, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

294ba9b6

bpf: Folding omem_charge() into sk_storage_charge() · bc869a5f

由 Martin KaFai Lau 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 27e8ade792655177cc74417a6ded25735f3cacf0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=27e8ade792655177cc74417a6ded25735f3cacf0

--------------------------------

[ Upstream commit 9e838b02 ]

sk_storage_charge() is the only user of omem_charge().
This patch simplifies it by folding omem_charge() into
sk_storage_charge().
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NKP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112211301.2586255-1-kafai@fb.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

bc869a5f

ratelimit: Fix data-races in ___ratelimit(). · 59ecf05b

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 4d4e39245dd58bb36f57f9b48a142b20d291c605
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4d4e39245dd58bb36f57f9b48a142b20d291c605

--------------------------------

[ Upstream commit 6bae8ceb ]

While reading rs->interval and rs->burst, they can be changed
concurrently via sysctl (e.g. net_ratelimit_state).  Thus, we
need to add READ_ONCE() to their readers.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

59ecf05b

net: Fix data-races around netdev_tstamp_prequeue. · 4eb92568

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit e73009ebc1233e447ae6503da51c402e4ed7ca4b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e73009ebc1233e447ae6503da51c402e4ed7ca4b

--------------------------------

[ Upstream commit 61adf447 ]

While reading netdev_tstamp_prequeue, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 3b098e2d ("net: Consistent skb timestamping")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

4eb92568

net: Fix data-races around netdev_max_backlog. · 8684f7f7

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 3850060352f41366bdc25b091baeeb5c144b4a9e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3850060352f41366bdc25b091baeeb5c144b4a9e

--------------------------------

[ Upstream commit 5dcd08cd ]

While reading netdev_max_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

While at it, we remove the unnecessary spaces in the doc.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

8684f7f7

net: Fix data-races around weight_p and dev_weight_[rt]x_bias. · e6d7bbfb

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit b498a1b0171e7152ce5a837c7f74b4c1a296586f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b498a1b0171e7152ce5a837c7f74b4c1a296586f

--------------------------------

[ Upstream commit bf955b5a ]

While reading weight_p, it can be changed concurrently.  Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time.  So, we
need to use READ_ONCE() and WRITE_ONCE() for its access.  Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53f ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e6d7bbfb

net: Fix data-races around sysctl_[rw]mem_(max|default). · a41540be

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit fb442c72db385695684ef5a2df930326a2c4f1b3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fb442c72db385695684ef5a2df930326a2c4f1b3

--------------------------------

[ Upstream commit 1227c177 ]

While reading sysctl_[rw]mem_(max|default), they can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a41540be

net: Fix data-races around sysctl_[rw]mem(_offset)?. · 05eb7170

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 613fd026209e6f6ee69afaf45d1bec2a11b09fea
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=613fd026209e6f6ee69afaf45d1bec2a11b09fea

--------------------------------

[ Upstream commit 02739545 ]

While reading these sysctl variables, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - .sysctl_rmem
  - .sysctl_rwmem
  - .sysctl_rmem_offset
  - .sysctl_wmem_offset
  - sysctl_tcp_rmem[1, 2]
  - sysctl_tcp_wmem[1, 2]
  - sysctl_decnet_rmem[1]
  - sysctl_decnet_wmem[1]
  - sysctl_tipc_rmem[1]

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

05eb7170

tcp: tweak len/truesize ratio for coalesce candidates · 071b81ed

由 Eric Dumazet 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit e73a29554f0b7adfb9411a0587ac56703ed23e2c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e73a29554f0b7adfb9411a0587ac56703ed23e2c

--------------------------------

[ Upstream commit 240bfd13 ]

tcp_grow_window() is using skb->len/skb->truesize to increase tp->rcv_ssthresh
which has a direct impact on advertized window sizes.

We added TCP coalescing in linux-3.4 & linux-3.5:

Instead of storing skbs with one or two MSS in receive queue (or OFO queue),
we try to append segments together to reduce memory overhead.

High performance network drivers tend to cook skb with 3 parts :

1) sk_buff structure (256 bytes)
2) skb->head contains room to copy headers as needed, and skb_shared_info
3) page fragment(s) containing the ~1514 bytes frame (or more depending on MTU)

Once coalesced into a previous skb, 1) and 2) are freed.

We can therefore tweak the way we compute len/truesize ratio knowing
that skb->truesize is inflated by 1) and 2) soon to be freed.

This is done only for in-order skb, or skb coalesced into OFO queue.

The result is that low rate flows no longer pay the memory price of having
low GRO aggregation factor. Same result for drivers not using GRO.

This is critical to allow a big enough receiver window,
typically tcp_rmem[2] / 2.

We have been using this at Google for about 5 years, it is due time
to make it upstream.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

071b81ed

netfilter: nf_tables: disallow jump to implicit chain from set element · 0e69efe0

由 Pablo Neira Ayuso 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 6301a73bd83d94b9b3eab8581adb04e40fb5f079
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6301a73bd83d94b9b3eab8581adb04e40fb5f079

--------------------------------

[ Upstream commit f323ef3a ]

Extend struct nft_data_desc to add a flag field that specifies
nft_data_init() is being called for set element data.

Use it to disallow jump to implicit chain from set element, only jump
to chain via immediate expression is allowed.

Fixes: d0e2c7de ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

 Conflicts:
	net/netfilter/nf_tables_api.c
Reviewed-by: NWei Li <liwei391@huawei.com>

0e69efe0

netfilter: nf_tables: upfront validation of data via nft_data_init() · a19bd572

由 Pablo Neira Ayuso 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 98827687593b579f20feb60c7ce3ac8a6fbb5f2e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=98827687593b579f20feb60c7ce3ac8a6fbb5f2e

--------------------------------

[ Upstream commit 341b6941 ]

Instead of parsing the data and then validate that type and length are
correct, pass a description of the expected data so it can be validated
upfront before parsing it to bail out earlier.

This patch adds a new .size field to specify the maximum size of the
data area. The .len field is optional and it is used as an input/output
field, it provides the specific length of the expected data in the input
path. If then .len field is not specified, then obtained length from the
netlink attribute is stored. This is required by cmp, bitwise, range and
immediate, which provide no netlink attribute that describes the data
length. The immediate expression uses the destination register type to
infer the expected data type.

Relying on opencoded validation of the expected data might lead to
subtle bugs as described in 7e6bc1f6 ("netfilter: nf_tables:
stricter validation of element data").
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a19bd572

netfilter: bitwise: improve error goto labels · 3cd80700

由 Jeremy Sowden 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 8790eecdea019fe9a94e69271dc1633dd26c9505
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8790eecdea019fe9a94e69271dc1633dd26c9505

--------------------------------

[ Upstream commit 00bd4352 ]

Replace two labels (`err1` and `err2`) with more informative ones.
Signed-off-by: NJeremy Sowden <jeremy@azazel.net>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

3cd80700

netfilter: nft_cmp: optimize comparison for 16-bytes · 94bc7bfd

由 Pablo Neira Ayuso 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 2267d38520c4aa74c9110c9ef2f6619aebc8446d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2267d38520c4aa74c9110c9ef2f6619aebc8446d

--------------------------------

[ Upstream commit 23f68d46 ]

Allow up to 16-byte comparisons with a new cmp fast version. Use two
64-bit words and calculate the mask representing the bits to be
compared. Make sure the comparison is 64-bit aligned and avoid
out-of-bound memory access on registers.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

94bc7bfd

netfilter: nf_tables: consolidate rule verdict trace call · cff1be88

由 Pablo Neira Ayuso 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 1d7d74a8240e5d4505f5f3f67417bb7334230c11
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1d7d74a8240e5d4505f5f3f67417bb7334230c11

--------------------------------

[ Upstream commit 4765473f ]

Add function to consolidate verdict tracing.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

cff1be88

netfilter: nftables: remove redundant assignment of variable err · 4734b465

由 Colin Ian King 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit cd962806c4493eec8a0da72d1df227dac28d3e15
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cd962806c4493eec8a0da72d1df227dac28d3e15

--------------------------------

[ Upstream commit 626899a0 ]

The variable err is being assigned a value that is never read,
the same error number is being returned at the error return
path via label err1.  Clean up the code by removing the assignment.

Addresses-Coverity: ("Unused value")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

4734b465

netfilter: nft_tunnel: restrict it to netdev family · 26b01d4b

由 Pablo Neira Ayuso 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 35519ce7bac9138b00126817c7ddb8f4ebdbc066
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=35519ce7bac9138b00126817c7ddb8f4ebdbc066

--------------------------------

[ Upstream commit 01e4092d ]

Only allow to use this expression from NFPROTO_NETDEV family.

Fixes: af308b94 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

26b01d4b

netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families · cf7071b3

由 Pablo Neira Ayuso 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 9a67c2c89c32bca7da6082c6b174457f8521121a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9a67c2c89c32bca7da6082c6b174457f8521121a

--------------------------------

[ Upstream commit 5f3b7aae ]

As it was originally intended, restrict extension to supported families.

Fixes: b96af92d ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

cf7071b3

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功