提交 · ac86f547ca1002aec2ef66b9e64d03f45bbbfbb9 · openeuler / Kernel

01 2月, 2023 4 次提交

mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath() · ac86f547

由 Kefeng Wang 提交于 1月 29, 2023

As commit 18365225 ("hwpoison, memcg: forcibly uncharge LRU pages"),
hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg
could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could
occurs a NULL pointer dereference, let's do not record the foreign
writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty() to
fix it.

Link: https://lkml.kernel.org/r/20230129040945.180629-1-wangkefeng.wang@huawei.com
Fixes: 97b27821 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: NMa Wupeng <mawupeng1@huawei.com>
Tested-by: NMiko Larsson <mikoxyzzz@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Ma Wupeng <mawupeng1@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

ac86f547

highmem: round down the address passed to kunmap_flush_on_unmap() · 88d7b120

由 Matthew Wilcox (Oracle) 提交于 1月 26, 2023

We already round down the address in kunmap_local_indexed() which is the
other implementation of __kunmap_local().  The only implementation of
kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned
address.  This may be causing PA-RISC to be flushing the wrong addresses
currently.

Link: https://lkml.kernel.org/r/20230126200727.1680362-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Fixes: 298fa1ad ("highmem: Provide generic variant of kmap_atomic*")
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

88d7b120

mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps · 3489dbb6

由 Mike Kravetz 提交于 1月 26, 2023

Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs".

This issue of mapcount in hugetlb pages referenced by shared PMDs was
discussed in [1].  The following two patches address user visible behavior
caused by this issue.

[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/


This patch (of 2):

A hugetlb page will have a mapcount of 1 if mapped by multiple processes
via a shared PMD.  This is because only the first process increases the
map count, and subsequent processes just add the shared PMD page to their
page table.

page_mapcount is being used to decide if a hugetlb page is shared or
private in /proc/PID/smaps.  Pages referenced via a shared PMD were
incorrectly being counted as private.

To fix, check for a shared PMD if mapcount is 1.  If a shared PMD is found
count the hugetlb page as shared.  A new helper to check for a shared PMD
is added.

[akpm@linux-foundation.org: simplification, per David]
[akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()]
Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com
Fixes: 25ee01a2 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps")
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NPeter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

3489dbb6

Revert "mm: add nodes= arg to memory.reclaim" · 55ab834a

由 Michal Hocko 提交于 12月 16, 2022

This reverts commit 12a5d395.

Although it is recognized that a finer grained pro-active reclaim is
something we need and want the semantic of this implementation is really
ambiguous.

In a follow up discussion it became clear that there are two essential
usecases here.  One is to use memory.reclaim to pro-actively reclaim
memory and expectation is that the requested and reported amount of memory
is uncharged from the memcg.  Another usecase focuses on pro-active
demotion when the memory is merely shuffled around to demotion targets
while the overall charged memory stays unchanged.

The current implementation considers demoted pages as reclaimed and that
break both usecases.  [1] has tried to address the reporting part but
there are more issues with that summarized in [2] and follow up emails.

Let's revert the nodemask based extension of the memcg pro-active
reclaim for now until we settle with a more robust semantic.

[1] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20221206023406.3182800-1-almasrymina@google.com
[2] http://lkml.kernel.org/r/Y5bsmpCyeryu3Zz1@dhcp22.suse.cz

Link: https://lkml.kernel.org/r/Y5xASNe1x8cusiTx@dhcp22.suse.cz
Fixes: 12a5d395 ("mm: add nodes= arg to memory.reclaim")
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Wei Xu <weixugc@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: zefan li <lizefan.x@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

55ab834a

14 1月, 2023 1 次提交

efi: tpm: Avoid READ_ONCE() for accessing the event log · d3f45053

由 Ard Biesheuvel 提交于 1月 09, 2023

Nathan reports that recent kernels built with LTO will crash when doing
EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
misaligned load from the TPM event log, which is annotated with
READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
which does not tolerate misaligned accesses.

Interestingly, this does not happen when booting the same kernel
straight from the UEFI shell, and so the fact that the event log may
appear misaligned in memory may be caused by a bug in GRUB or SHIM.

However, using READ_ONCE() to access firmware tables is slightly unusual
in any case, and here, we only need to ensure that 'event' is not
dereferenced again after it gets unmapped, but this is already taken
care of by the implicit barrier() semantics of the early_memunmap()
call.

Cc: <stable@vger.kernel.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Reported-by: NNathan Chancellor <nathan@kernel.org>
Tested-by: NNathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1782Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

d3f45053

13 1月, 2023 2 次提交

platform/x86: simatic-ipc: add another model · d348b1d7

由 Henning Schild 提交于 12月 22, 2022

Add IPC PX-39A support.
Signed-off-by: NHenning Schild <henning.schild@siemens.com>
Link: https://lore.kernel.org/r/20221222103720.8546-3-henning.schild@siemens.comReviewed-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NHans de Goede <hdegoede@redhat.com>

d348b1d7

platform/x86: simatic-ipc: correct name of a model · ed058eab

由 Henning Schild 提交于 12月 22, 2022

What we called IPC427G should be renamed to BX-39A to be more in line
with the actual product name.
Signed-off-by: NHenning Schild <henning.schild@siemens.com>
Link: https://lore.kernel.org/r/20221222103720.8546-2-henning.schild@siemens.comReviewed-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NHans de Goede <hdegoede@redhat.com>

ed058eab

12 1月, 2023 3 次提交

mm: update mmap_sem comments to refer to mmap_lock · 8651a137

由 Lorenzo Stoakes 提交于 1月 07, 2023

The rename from mm->mmap_sem to mm->mmap_lock was performed in commit
da1c55f1 ("mmap locking API: rename mmap_sem to mmap_lock") and commit
c1e8d7c6 ("map locking API: convert mmap_sem comments"), however some
incorrect comments remain.

This patch simply corrects those comments which are obviously incorrect
within mm itself.

Link: https://lkml.kernel.org/r/33fba04389ab63fc4980e7ba5442f521df6dc657.1673048927.git.lstoakes@gmail.comSigned-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

8651a137

include/linux/mm: fix release_pages_arg kernel doc comment · 0411d6ee

由 SeongJae Park 提交于 1月 06, 2023

Commit 449c7967 ("mm: teach release_pages() to take an array of
encoded page pointers too") added the kernel doc comment for
release_pages() on top of 'union release_pages_arg', so making 'make
htmldocs' complains as below:

    ./include/linux/mm.h:1268: warning: cannot understand function prototype: 'typedef union '

The kernel doc comment for the function is already on top of the
function's definition in mm/swap.c, and the new comment is actually not
for the function but indeed release_pages_arg.  Fixing the comment to
reflect the intent would be one option.  But, kernel doc cannot parse
the union as below due to the attribute.

    ./include/linux/mm.h:1272: error: Cannot parse struct or union!

Modify the comment to reflect the intent but do not mark it as a kernel
doc comment.

Link: https://lkml.kernel.org/r/20230106203331.127532-1-sj@kernel.org
Fixes: 449c7967 ("mm: teach release_pages() to take an array of encoded page pointers too")
Signed-off-by: NSeongJae Park <sj@kernel.org>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

0411d6ee

mm: fix vma->anon_name memory leak for anonymous shmem VMAs · a1193de5

由 Suren Baghdasaryan 提交于 1月 04, 2023

free_anon_vma_name() is missing a check for anonymous shmem VMA which
leads to a memory leak due to refcount not being dropped.  Fix this by
calling anon_vma_name_put() unconditionally.  It will free vma->anon_name
whenever it's non-NULL.

Link: https://lkml.kernel.org/r/20230105000241.1450843-1-surenb@google.com
Fixes: d09e8ca6 ("mm: anonymous shared memory naming")
Signed-off-by: NSuren Baghdasaryan <surenb@google.com>
Suggested-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reported-by: syzbot+91edf9178386a07d06a7@syzkaller.appspotmail.com
Cc: Hugh Dickins <hughd@google.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

a1193de5

11 1月, 2023 1 次提交

ACPI: Fix selecting wrong ACPI fwnode for the iGPU on some Dell laptops · f64e4275

由 Hans de Goede 提交于 1月 10, 2023

The Dell Latitude E6430 both with and without the optional NVidia dGPU
has a bug in its ACPI tables which is causing Linux to assign the wrong
ACPI fwnode / companion to the pci_device for the i915 iGPU.

Specifically under the PCI root bridge there are these 2 ACPI Device()s :

 Scope (_SB.PCI0)
 {
     Device (GFX0)
     {
         Name (_ADR, 0x00020000)  // _ADR: Address
     }

     ...

     Device (VID)
     {
         Name (_ADR, 0x00020000)  // _ADR: Address
         ...

         Method (_DOS, 1, NotSerialized)  // _DOS: Disable Output Switching
         {
             VDP8 = Arg0
             VDP1 (One, VDP8)
         }

         Method (_DOD, 0, NotSerialized)  // _DOD: Display Output Devices
         {
             ...
         }
         ...
     }
 }

The non-functional GFX0 ACPI device is a problem, because this gets
returned as ACPI companion-device by acpi_find_child_device() for the iGPU.

This is a long standing problem and the i915 driver does use the ACPI
companion for some things, but works fine without it.

However since commit 63f534b8 ("ACPI: PCI: Rework acpi_get_pci_dev()")
acpi_get_pci_dev() relies on the physical-node pointer in the acpi_device
and that is set on the wrong acpi_device because of the wrong
acpi_find_child_device() return. This breaks the ACPI video code,
leading to non working backlight control in some cases.

Add a type.backlight flag, mark ACPI video bus devices with this and make
find_child_checks() return a higher score for children with this flag set,
so that it picks the right companion-device.

Fixes: 63f534b8 ("ACPI: PCI: Rework acpi_get_pci_dev()")
Co-developed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Cc: 6.1+ <stable@vger.kernel.org> # 6.1+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f64e4275

10 1月, 2023 1 次提交

net/mlx5: Fix command stats access after free · da2e552b

由 Moshe Shemesh 提交于 11月 28, 2022

Command may fail while driver is reloading and can't accept FW commands
till command interface is reinitialized. Such command failure is being
logged to command stats. This results in NULL pointer access as command
stats structure is being freed and reallocated during mlx5 devlink
reload (see kernel log below).

Fix it by making command stats statically allocated on driver probe.

Kernel log:
[ 2394.808802] BUG: unable to handle kernel paging request at 000000000002a9c0
[ 2394.810610] PGD 0 P4D 0
[ 2394.811811] Oops: 0002 [#1] SMP NOPTI
...
[ 2394.815482] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
...
[ 2394.829505] Call Trace:
[ 2394.830667]  _raw_spin_lock_irq+0x23/0x26
[ 2394.831858]  cmd_status_err+0x55/0x110 [mlx5_core]
[ 2394.833020]  mlx5_access_reg+0xe7/0x150 [mlx5_core]
[ 2394.834175]  mlx5_query_port_ptys+0x78/0xa0 [mlx5_core]
[ 2394.835337]  mlx5e_ethtool_get_link_ksettings+0x74/0x590 [mlx5_core]
[ 2394.836454]  ? kmem_cache_alloc_trace+0x140/0x1c0
[ 2394.837562]  __rh_call_get_link_ksettings+0x33/0x100
[ 2394.838663]  ? __rtnl_unlock+0x25/0x50
[ 2394.839755]  __ethtool_get_link_ksettings+0x72/0x150
[ 2394.840862]  duplex_show+0x6e/0xc0
[ 2394.841963]  dev_attr_show+0x1c/0x40
[ 2394.843048]  sysfs_kf_seq_show+0x9b/0x100
[ 2394.844123]  seq_read+0x153/0x410
[ 2394.845187]  vfs_read+0x91/0x140
[ 2394.846226]  ksys_read+0x4f/0xb0
[ 2394.847234]  do_syscall_64+0x5b/0x1a0
[ 2394.848228]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 34f46ae0 ("net/mlx5: Add command failures data to debugfs")
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

da2e552b

07 1月, 2023 1 次提交

firmware/psci: Fix MEM_PROTECT_RANGE function numbers · f3dc61cd

由 Will Deacon 提交于 11月 25, 2022

PSCI v1.1 offers 32-bit and 64-bit variants of the MEM_PROTECT_RANGE
call using function identifier 20.

Fix the incorrect definitions of the MEM_PROTECT_CHECK_RANGE calls in
the PSCI UAPI header.

Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Fixes: 3137f2e6 ("firmware/psci: Add debugfs support to ease debugging")
Acked-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20221125101826.22404-1-will@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>

f3dc61cd

06 1月, 2023 9 次提交

rxrpc: Move client call connection to the I/O thread · 9d35d880

由 David Howells 提交于 10月 19, 2022

Move the connection setup of client calls to the I/O thread so that a whole
load of locking and barrierage can be eliminated.  This necessitates the
app thread waiting for connection to complete before it can begin
encrypting data.

This also completes the fix for a race that exists between call connection
and call disconnection whereby the data transmission code adds the call to
the peer error distribution list after the call has been disconnected (say
by the rxrpc socket getting closed).

The fix is to complete the process of moving call connection, data
transmission and call disconnection into the I/O thread and thus forcibly
serialising them.

Note that the issue may predate the overhaul to an I/O thread model that
were included in the merge window for v6.2, but the timing is very much
changed by the change given below.

Fixes: cf37b598 ("rxrpc: Move DATA transmission into call processor work item")
Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

9d35d880

rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parameters · 1bab27af

由 David Howells 提交于 10月 21, 2022

Use the information now stored in struct rxrpc_call to configure the
connection bundle and thence the connection, rather than using the
rxrpc_conn_parameters struct.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

1bab27af

rxrpc: Offload the completion of service conn security to the I/O thread · 2953d3b8

由 David Howells 提交于 10月 21, 2022

Offload the completion of the challenge/response cycle on a service
connection to the I/O thread.  After the RESPONSE packet has been
successfully decrypted and verified by the work queue, offloading the
changing of the call states to the I/O thread makes iteration over the
conn's channel list simpler.

Do this by marking the RESPONSE skbuff and putting it onto the receive
queue for the I/O thread to collect.  We put it on the front of the queue
as we've already received the packet for it.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

2953d3b8

rxrpc: Tidy up abort generation infrastructure · 57af281e

由 David Howells 提交于 10月 06, 2022

Tidy up the abort generation infrastructure in the following ways:

 (1) Create an enum and string mapping table to list the reasons an abort
     might be generated in tracing.

 (2) Replace the 3-char string with the values from (1) in the places that
     use that to log the abort source.  This gets rid of a memcpy() in the
     tracepoint.

 (3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint
     and use values from (1) to indicate the trace reason.

 (4) Always make a call to an abort function at the point of the abort
     rather than stashing the values into variables and using goto to get
     to a place where it reported.  The C optimiser will collapse the calls
     together as appropriate.  The abort functions return a value that can
     be returned directly if appropriate.

Note that this extends into afs also at the points where that generates an
abort.  To aid with this, the afs sources need to #define
RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header
because they don't have access to the rxrpc internal structures that some
of the tracepoints make use of.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

57af281e

rxrpc: Clean up connection abort · a00ce28b

由 David Howells 提交于 10月 20, 2022

Clean up connection abort, using the connection state_lock to gate access
to change that state, and use an rxrpc_call_completion value to indicate
the difference between local and remote aborts as these can be pasted
directly into the call state.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

a00ce28b

rxrpc: Implement a mechanism to send an event notification to a connection · f2cce89a

由 David Howells 提交于 10月 20, 2022

Provide a means by which an event notification can be sent to a connection
through such that the I/O thread can pick it up and handle it rather than
doing it in a separate workqueue.

This is then used to move the deferred final ACK of a call into the I/O
thread rather than a separate work queue as part of the drive to do all
transmission from the I/O thread.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

f2cce89a

rxrpc: Only disconnect calls in the I/O thread · 03fc55ad

由 David Howells 提交于 10月 12, 2022

Only perform call disconnection in the I/O thread to reduce the locking
requirement.

This is the first part of a fix for a race that exists between call
connection and call disconnection whereby the data transmission code adds
the call to the peer error distribution list after the call has been
disconnected (say by the rxrpc socket getting closed).

The fix is to complete the process of moving call connection, data
transmission and call disconnection into the I/O thread and thus forcibly
serialising them.

Note that the issue may predate the overhaul to an I/O thread model that
were included in the merge window for v6.2, but the timing is very much
changed by the change given below.

Fixes: cf37b598 ("rxrpc: Move DATA transmission into call processor work item")
Reported-by: syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

03fc55ad

rxrpc: Only set/transmit aborts in the I/O thread · a343b174

由 David Howells 提交于 10月 12, 2022

Only set the abort call completion state in the I/O thread and only
transmit ABORT packets from there.  rxrpc_abort_call() can then be made to
actually send the packet.

Further, ABORT packets should only be sent if the call has been exposed to
the network (ie. at least one attempted DATA transmission has occurred for
it).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

a343b174

rxrpc: Make the local endpoint hold a ref on a connected call · 5040011d

由 David Howells 提交于 11月 02, 2022

Make the local endpoint and it's I/O thread hold a reference on a connected
call until that call is disconnected.  Without this, we're reliant on
either the AF_RXRPC socket to hold a ref (which is dropped when the call is
released) or a queued work item to hold a ref (the work item is being
replaced with the I/O thread).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

5040011d

05 1月, 2023 5 次提交

elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size} · 19e183b5

由 Catalin Marinas 提交于 12月 22, 2022

A subsequent fix for arm64 will use this parameter to parse the vma
information from the snapshot created by dump_vma_snapshot() rather than
traversing the vma list without the mmap_lock.

Fixes: 6dd8b1a0 ("arm64: mte: Dump the MTE tags in the core file")
Cc: <stable@vger.kernel.org> # 5.18.x
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NSeth Jenkins <sethjenkins@google.com>
Suggested-by: NSeth Jenkins <sethjenkins@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221222181251.1345752-3-catalin.marinas@arm.comSigned-off-by: NWill Deacon <will@kernel.org>

19e183b5

Revert "pktcdvd: remove driver." · 4b83e99e

由 Jens Axboe 提交于 1月 04, 2023

This reverts commit f40eb998.

There are apparently still users out there of this driver. While we'd
love to remove it to ease the maintenance burden, let's reinstate it
for now until better (userspace) solutions can be developed.

Link: https://lore.kernel.org/lkml/20230104190115.ceglfefco475ev6c@pali/Reported-by: NPali Rohár <pali@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b83e99e

Revert "block: remove devnode callback from struct block_device_operations" · 050a4f34

由 Jens Axboe 提交于 1月 04, 2023

This reverts commit 85d6ce58.

We're reinstating the pktcdvd driver, which needs this API.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

050a4f34

Revert "block: bio_copy_data_iter" · ee4b4e22

由 Jens Axboe 提交于 1月 04, 2023

This reverts commit db1c7d77.

We're reinstating the pktcdvd driver, which needs this API.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ee4b4e22

io_uring: move 'poll_multi_queue' bool in io_ring_ctx · 59b745bb

由 Jens Axboe 提交于 1月 04, 2023

The cacheline section holding this variable has two gaps, where one is
caused by this bool not packing well with structs. This causes it to
blow into the next cacheline. Move the variable, shrinking io_ring_ctx
by a full cacheline in size.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

59b745bb

02 1月, 2023 3 次提交

netfilter: ipset: Rework long task execution when adding/deleting entries · 5e29dc36

由 Jozsef Kadlecsik 提交于 12月 30, 2022

When adding/deleting large number of elements in one step in ipset, it can
take a reasonable amount of time and can result in soft lockup errors. The
patch 5f7b51bf ("netfilter: ipset: Limit the maximal range of
consecutive elements to add/delete") tried to fix it by limiting the max
elements to process at all. However it was not enough, it is still possible
that we get hung tasks. Lowering the limit is not reasonable, so the
approach in this patch is as follows: rely on the method used at resizing
sets and save the state when we reach a smaller internal batch limit,
unlock/lock and proceed from the saved state. Thus we can avoid long
continuous tasks and at the same time removed the limit to add/delete large
number of elements in one step.

The nfnl mutex is held during the whole operation which prevents one to
issue other ipset commands in parallel.

Fixes: 5f7b51bf ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com
Signed-off-by: NJozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5e29dc36

ceph: avoid use-after-free in ceph_fl_release_lock() · 8e185871

由 Xiubo Li 提交于 11月 17, 2022

When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.

Will switch to use ceph dedicate lock info to track the inode.

And in ceph_fl_release_lock() we should skip all the operations if the
fl->fl_u.ceph.inode is not set, which should come from the request
file_lock. And we will set fl->fl_u.ceph.inode when inserting it to the
inode lock list, which is when copying the lock.

Link: https://tracker.ceph.com/issues/57986Signed-off-by: NXiubo Li <xiubli@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8e185871

mtd: cfi: allow building spi-intel standalone · d19ab1f7

由 Arnd Bergmann 提交于 12月 20, 2022

When MTD or MTD_CFI_GEOMETRY is disabled, the spi-intel driver
fails to build, as it includes the shared CFI header:

include/linux/mtd/cfi.h:62:2: error: #warning No CONFIG_MTD_CFI_Ix selected. No NOR chip support can work. [-Werror=cpp]
62 | #warning No CONFIG_MTD_CFI_Ix selected. No NOR chip support can work.

linux/mtd/spi-nor.h does not actually need to include cfi.h, so
remove the inclusion here to fix the warning. This uncovers a
missing #include in spi-nor/core.c so add that there to
prevent a different build issue.

Fixes: e23e5a05 ("mtd: spi-nor: intel-spi: Convert to SPI MEM")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: NTokunori Ikegami <ikegami.t@gmail.com>
Acked-by: NPratyush Yadav <pratyush@kernel.org>
Reviewed-by: NTudor Ambarus <tudor.ambarus@linaro.org>
Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20221220141352.1486360-1-arnd@kernel.org

d19ab1f7

01 1月, 2023 2 次提交

net: phy: Update documentation for get_rate_matching · 6d4cfcf9

由 Sean Anderson 提交于 12月 29, 2022

Now that phylink no longer calls phy_get_rate_matching with
PHY_INTERFACE_MODE_NA, phys no longer need to support it. Remove the
documentation mandating support.

Fixes: 7642cc28 ("net: phylink: fix PHY validation with rate adaption")
Signed-off-by: NSean Anderson <sean.anderson@seco.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d4cfcf9

net: dsa: tag_qca: fix wrong MGMT_DATA2 size · d9dba91b

由 Christian Marangi 提交于 12月 29, 2022

It was discovered that MGMT_DATA2 can contain up to 28 bytes of data
instead of the 12 bytes written in the Documentation by accounting the
limit of 16 bytes declared in Documentation subtracting the first 4 byte
in the packet header.

Update the define with the real world value.
Tested-by: NRonald Wahl <ronald.wahl@raritan.com>
Fixes: c2ee8181 ("net: dsa: tag_qca: add define for handling mgmt Ethernet packet")
Signed-off-by: NChristian Marangi <ansuelsmth@gmail.com>
Cc: stable@vger.kernel.org # v5.18+
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9dba91b

30 12月, 2022 2 次提交

arch: fix broken BuildID for arm64 and riscv · 99cb0d91

由 Masahiro Yamada 提交于 12月 27, 2022

Dennis Gilmore reports that the BuildID is missing in the arm64 vmlinux
since commit 994b7ac1 ("arm64: remove special treatment for the
link order of head.o").

The issue is that the type of .notes section, which contains the BuildID,
changed from NOTES to PROGBITS.

Ard Biesheuvel figured out that whichever object gets linked first gets
to decide the type of a section. The PROGBITS type is the result of the
compiler emitting .note.GNU-stack as PROGBITS rather than NOTE.

While Ard provided a fix for arm64, I want to fix this globally because
the same issue is happening on riscv since commit 2348e6bf ("riscv:
remove special treatment for the link order of head.o"). This problem
will happen in general for other architectures if they start to drop
unneeded entries from scripts/head-object-list.txt.

Discard .note.GNU-stack in include/asm-generic/vmlinux.lds.h.

Link: https://lore.kernel.org/lkml/CAABkxwuQoz1CTbyb57n0ZX65eSYiTonFCU8-LCQc=74D=xE=rA@mail.gmail.com/
Fixes: 994b7ac1 ("arm64: remove special treatment for the link order of head.o")
Fixes: 2348e6bf ("riscv: remove special treatment for the link order of head.o")
Reported-by: NDennis Gilmore <dennis@ausil.us>
Suggested-by: NArd Biesheuvel <ardb@kernel.org>
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Acked-by: NPalmer Dabbelt <palmer@rivosinc.com>

99cb0d91

tcp: Add TIME_WAIT sockets in bhash2. · 936a192f

由 Kuniyuki Iwashima 提交于 12月 26, 2022

Jiri Slaby reported regression of bind() with a simple repro. [0]

The repro creates a TIME_WAIT socket and tries to bind() a new socket
with the same local address and port.  Before commit 28044fc1 ("net:
Add a bhash2 table hashed by port and address"), the bind() failed with
-EADDRINUSE, but now it succeeds.

The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
requests if the address is not a wildcard one.

The straight option is to move sk_bind2_node from struct sock to struct
sock_common to add twsk to bhash2 as implemented as RFC. [1]  However, the
binary layout change in the struct sock could affect performances moving
hot fields on different cachelines.

To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check
it while validating bind().

[0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
[1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/

Fixes: 28044fc1 ("net: Add a bhash2 table hashed by port and address")
Reported-by: NJiri Slaby <jirislaby@kernel.org>
Suggested-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: NJoanne Koong <joannelkoong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

936a192f

29 12月, 2022 3 次提交

net/mlx5: E-Switch, properly handle ingress tagged packets on VST · 1f0ae22a

由 Moshe Shemesh 提交于 12月 12, 2022

Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.

In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.

Fixes: 07bab950 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

1f0ae22a

nvme: consult the CSE log page for unprivileged passthrough · 6f99ac04

由 Christoph Hellwig 提交于 12月 13, 2022

Commands like Write Zeros can change the contents of a namespaces without
actually transferring data.  To protect against this, check the Commands
Supported and Effects log is supported by the controller for any
unprivileg command passthrough and refuse unprivileged passthrough if the
command has any effects that can change data or metadata.

Note: While the Commands Support and Effects log page has only been
mandatory since NVMe 2.0, it is widely supported because Windows requires
it for any command passthrough from userspace.

Fixes: e4fbcf32 ("nvme: identify-namespace without CAP_SYS_ADMIN")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>

6f99ac04

nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition · 685e6311

由 Christoph Hellwig 提交于 12月 21, 2022

3 << 16 does not generate the correct mask for bits 16, 17 and 18.
Use the GENMASK macro to generate the correct mask instead.

Fixes: 84fef62d ("nvme: check admin passthru command effects")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>

685e6311

28 12月, 2022 3 次提交

net/sched: fix retpoline wrapper compilation on configs without tc filters · 40cab44b

由 Pedro Tammela 提交于 12月 27, 2022

Rudi reports a compilation failure on x86_64 when CONFIG_NET_CLS or
CONFIG_NET_CLS_ACT is not set but CONFIG_RETPOLINE is set.
A misplaced '#endif' was causing the issue.

Fixes: 7f0e8102 ("net/sched: add retpoline wrapper for tc")
Tested-by: NRudi Heitbaum <rudi@heitbaum.com>
Signed-off-by: NPedro Tammela <pctammela@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40cab44b

vdpa: merge functionally duplicated dev_features attributes · b9e05399

由 Si-Wei Liu 提交于 10月 10, 2022

We can merge VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES with
VDPA_ATTR_DEV_FEATURES which is functionally equivalent.
While at it, tweak the comment in header file to make
user provioned device features distinguished from those
supported by the parent mgmtdev device: the former of
which can be inherited as a whole from the latter, or
can be a subset of the latter if explicitly specified.
Signed-off-by: NSi-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1665422823-18364-1-git-send-email-si-wei.liu@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

b9e05399

rxrpc: Fix a couple of potential use-after-frees · 0e50d999

由 David Howells 提交于 12月 24, 2022

At the end of rxrpc_recvmsg(), if a call is found, the call is put and then
a trace line is emitted referencing that call in a couple of places - but
the call may have been deallocated by the time those traces happen.

Fix this by stashing the call debug_id in a variable and passing that to
the tracepoint rather than the call pointer.

Fixes: 84997905 ("rxrpc: Add a tracepoint to follow what recvmsg does")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e50d999

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功