提交 · b84ba30b6c7a75babdf73b83bc3c7b59b944501a · openeuler / Kernel

29 11月, 2021 17 次提交

block: remove the gendisk argument to blk_execute_rq · b84ba30b

由 Christoph Hellwig 提交于 11月 26, 2021

Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait
given that it is unused now. Also convert the boolean at_head parameter
to actually use the bool type while touching the prototype.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b84ba30b

block: remove the ->rq_disk field in struct request · f3fa33ac

由 Christoph Hellwig 提交于 11月 26, 2021

Just use the disk attached to the request_queue instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f3fa33ac

block: remove get_io_context_active · 50569c24

由 Christoph Hellwig 提交于 11月 26, 2021

Fold it into it's only caller, and remove a lof of the debug checks
that are not needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

50569c24

block: mark put_io_context_active static · 33047425

由 Christoph Hellwig 提交于 11月 26, 2021

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

33047425

fork: move copy_io to block/blk-ioc.c · 88c9a2ce

由 Christoph Hellwig 提交于 11月 26, 2021

Move the copying of the I/O context to the block layer as that is where
we can use the proper low-level interfaces.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

88c9a2ce

blk-mq: Add blk_mq_complete_request_direct() · e8dc17e2

由 Sebastian Andrzej Siewior 提交于 10月 25, 2021

Add blk_mq_complete_request_direct() which completes the block request
directly instead deferring it to softirq for single queue devices.

This is useful for devices which complete the requests in preemptible
context and raising softirq from means scheduling ksoftirqd.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211025070658.1565848-2-bigeasy@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e8dc17e2

blk-crypto: remove blk_crypto_unregister() · 72cd9df2

由 Eric Biggers 提交于 11月 23, 2021

This function is trivial and is only used in one place. Having this
function is misleading because it implies that blk_crypto_register()
needs to be paired with blk_crypto_unregister(), which is not the case.
Just set disk->queue->crypto_profile to NULL directly.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211124013733.347612-1-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

72cd9df2

block: only allocate poll_stats if there's a user of them · 48b5c1fb

由 Jens Axboe 提交于 11月 13, 2021

This is essentially never used, yet it's about 1/3rd of the total
queue size. Allocate it when needed, and don't embed it in the queue.

Kill the queue flag for this while at it, since we can just check the
assigned pointer now.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

48b5c1fb

block: cleanup the GENHD_FL_* definitions · 430cc5d3

由 Christoph Hellwig 提交于 11月 22, 2021

Switch to an enum and tidy up the documentation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

430cc5d3

block: remove GENHD_FL_EXT_DEVT · 1ebe2e5f

由 Christoph Hellwig 提交于 11月 22, 2021

All modern drivers can support extra partitions using the extended
dev_t.  In fact except for the ioctl method drivers never even see
partitions in normal operation.

So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all
block devices that do support partitions, and require those that
do not support partitions to explicit disallow them using
GENHD_FL_NO_PART.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1ebe2e5f

block: remove GENHD_FL_SUPPRESS_PARTITION_INFO · 3b5149ac

由 Christoph Hellwig 提交于 11月 22, 2021

This flag is not set directly anywhere and only inherited from
GENHD_FL_HIDDEN. Just check for GENHD_FL_HIDDEN instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3b5149ac

block: rename GENHD_FL_NO_PART_SCAN to GENHD_FL_NO_PART · 46e7eac6

由 Christoph Hellwig 提交于 11月 22, 2021

The GENHD_FL_NO_PART_SCAN controls more than just partitions canning,
so rename it to GENHD_FL_NO_PART.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20211122130625.1136848-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

46e7eac6

block: remove GENHD_FL_CD · 1a827ce1

由 Christoph Hellwig 提交于 11月 22, 2021

GENHD_FL_CD marks a gendisk as a vaguely CD-ROM like device.
Besides being used internally inside of sunvdc.c an xen-blkfront it
is used by xen-blkback as a hint to claim a device exported to a
guest is a CD-ROM like device. Just check for disk->cdi instead
which is the right indicator for "real" CD-ROM or DVD drivers. This
will miss the paravirtualized guest drivers, but those make little
sense to report anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1a827ce1

block: move GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE to disk->event_flags · 1545e0b4

由 Christoph Hellwig 提交于 11月 22, 2021

GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE is all about the event reporting
mechanism, so move it to the event_flags field.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1545e0b4

block: move GENHD_FL_NATIVE_CAPACITY to disk->state · 86416916

由 Christoph Hellwig 提交于 11月 22, 2021

The flag to indicate an unlocked native capacity is dynamic state,
not a driver capability flag, so move it to disk->state.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

86416916

block: remove rq_flush_dcache_pages · 786d4e01

由 Christoph Hellwig 提交于 11月 17, 2021

This function is trivial, and flush_dcache_page is always defined, so
just open code it in the 2.5 callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

786d4e01

block: move blk_rq_err_bytes to scsi · 79478bf9

由 Christoph Hellwig 提交于 11月 17, 2021

blk_rq_err_bytes is only used by the scsi midlayer, so move it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

79478bf9

27 11月, 2021 2 次提交

net: ptp: add a definition for the UDP port for IEEE 1588 general messages · ec15baec

由 Vladimir Oltean 提交于 11月 26, 2021

As opposed to event messages (Sync, PdelayReq etc) which require
timestamping, general messages (Announce, FollowUp etc) do not.
In PTP they are part of different streams of data.

IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP
destination port assigned by IANA is 319 for event messages, and 320 for
general messages. Yet the kernel seems to be missing the definition for
general messages. This patch adds it.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

ec15baec

net: mscc: ocelot: create a function that replaces an existing VCAP filter · 95706be1

由 Vladimir Oltean 提交于 11月 26, 2021

VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
tc flower offload on ocelot, among other things. The ingress port mask
on which VCAP rules match is present as a bit field in the actual key of
the rule. This means that it is possible for a rule to be shared among
multiple source ports. When the rule is added one by one on each desired
port, that the ingress port mask of the key must be edited and rewritten
to hardware.

But the API in ocelot_vcap.c does not allow for this. For one thing,
ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
because ocelot_vcap_filter_add() works with a preallocated and
prepopulated filter and programs it to hardware, and
ocelot_vcap_filter_del() does both the job of removing the specified
filter from hardware, as well as kfreeing it. That is to say, the only
option of editing a filter in place, which is to delete it, modify the
structure and add it back, does not work because it results in
use-after-free.

This patch introduces ocelot_vcap_filter_replace, which trivially
reprograms a VCAP entry to hardware, at the exact same index at which it
existed before, without modifying any list or allocating any memory.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

95706be1

25 11月, 2021 1 次提交

Revert "virtio_ring: validate used buffer length" · f124034f

由 Michael S. Tsirkin 提交于 11月 24, 2021

This reverts commit 939779f5.

Attempts to validate length in the core did not work out: there turn out
to exist multiple broken devices, and in particular legacy devices are
known to be broken in this respect.

We have ideas for handling this better in the next version but for now
let's revert to a known good state to make sure drivers work for people.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

f124034f

24 11月, 2021 2 次提交

xen: add "not_essential" flag to struct xenbus_driver · 37a72b08

由 Juergen Gross 提交于 10月 22, 2021

When booting the xenbus driver will wait for PV devices to have
connected to their backends before continuing. The timeout is different
between essential and non-essential devices.

Non-essential devices are identified by their nodenames directly in the
xenbus driver, which requires to update this list in case a new device
type being non-essential is added (this was missed for several types
in the past).

In order to avoid this problem, add a "not_essential" flag to struct
xenbus_driver which can be set to "true" by the respective frontend.

Set this flag for the frontends currently regarded to be not essential
(vkbs and vfb) and use it for testing in the xenbus driver.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20211022064800.14978-2-jgross@suse.comReviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

37a72b08

ACPI: Make acpi_node_get_parent() local · 985e9ece

由 Sakari Ailus 提交于 11月 17, 2021

acpi_node_get_parent() isn't used outside drivers/acpi/property.c.

Make it local.
Signed-off-by: NSakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

985e9ece

22 11月, 2021 1 次提交

net: ipv6: add fib6_nh_release_dsts stub · 8837cbbf

由 Nikolay Aleksandrov 提交于 11月 22, 2021

We need a way to release a fib6_nh's per-cpu dsts when replacing
nexthops otherwise we can end up with stale per-cpu dsts which hold net
device references, so add a new IPv6 stub called fib6_nh_release_dsts.
It must be used after an RCU grace period, so no new dsts can be created
through a group's nexthop entry.
Similar to fib6_nh_release it shouldn't be used if fib6_nh_init has failed
so it doesn't need a dummy stub when IPv6 is not enabled.

Fixes: 7bf4796d ("nexthops: add support for replace")
Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8837cbbf

21 11月, 2021 2 次提交

hugetlb: fix hugetlb cgroup refcounting during mremap · afe041c2

由 Bui Quang Minh 提交于 11月 19, 2021

When hugetlb_vm_op_open() is called during copy_vma(), we may take the
reference to resv_map->css.  Later, when clearing the reservation
pointer of old_vma after transferring it to new_vma, we forget to drop
the reference to resv_map->css.  This leads to a reference leak of css.

Fixes this by adding a check to drop reservation css reference in
clear_vma_resv_huge_pages()

Link: https://lkml.kernel.org/r/20211113154412.91134-1-minhquangbui99@gmail.com
Fixes: 550a7d60 ("mm, hugepages: add mremap() support for hugepage backed vma")
Signed-off-by: NBui Quang Minh <minhquangbui99@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NMina Almasry <almasrymina@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

afe041c2

shm: extend forced shm destroy to support objects from several IPC nses · 85b6d246

由 Alexander Mikhalitsyn 提交于 11月 19, 2021

Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.

This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).

This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.

To achieve that we do several things:

1. add a namespace (non-refcounted) pointer to the struct shmid_kernel

2. during new shm object creation (newseg()/shmget syscall) we
   initialize this pointer by current task IPC ns

3. exit_shm() fully reworked such that it traverses over all shp's in
   task->sysvshm.shm_clist and gets IPC namespace not from current task
   as it was before but from shp's object itself, then call
   shm_destroy(shp, ns).

Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed.  To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".

Q/A

Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
   lifetime, so, if we get shp object from the task->sysvshm.shm_clist
   while holding task_lock(task) nobody can steal our namespace.

Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
   namespace without getting task->sysvshm.shm_clist list cleaned up.

Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

85b6d246

19 11月, 2021 2 次提交

signal: Replace force_fatal_sig with force_exit_sig when in doubt · fcb116bc

由 Eric W. Biederman 提交于 11月 18, 2021

Recently to prevent issues with SECCOMP_RET_KILL and similar signals
being changed before they are delivered SA_IMMUTABLE was added.

Unfortunately this broke debuggers[1][2] which reasonably expect
to be able to trap synchronous SIGTRAP and SIGSEGV even when
the target process is not configured to handle those signals.

Add force_exit_sig and use it instead of force_fatal_sig where
historically the code has directly called do_exit.  This has the
implementation benefits of going through the signal exit path
(including generating core dumps) without the danger of allowing
userspace to ignore or change these signals.

This avoids userspace regressions as older kernels exited with do_exit
which debuggers also can not intercept.

In the future is should be possible to improve the quality of
implementation of the kernel by changing some of these force_exit_sig
calls to force_fatal_sig.  That can be done where it matters on
a case-by-case basis with careful analysis.
Reported-by: NKyle Huey <me@kylehuey.com>
Reported-by: Nkernel test robot <oliver.sang@intel.com>
[1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com
[2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020
Fixes: 00b06da2 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Fixes: a3616a3c ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die")
Fixes: 83a1f27a ("signal/powerpc: On swapcontext failure force SIGSEGV")
Fixes: 9bc508cf ("signal/s390: Use force_sigsegv in default_trap_handler")
Fixes: 086ec444 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig")
Fixes: c317d306 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails")
Fixes: 695dd0d6 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit")
Fixes: 1fbd60df ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.")
Fixes: 941edc5b ("exit/syscall_user_dispatch: Send ordinary signals on failure")
Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.orgReviewed-by: NKees Cook <keescook@chromium.org>
Tested-by: NKees Cook <keescook@chromium.org>
Tested-by: NKyle Huey <khuey@kylehuey.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

fcb116bc

mm: Add functions to zero portions of a folio · c0357139

由 Matthew Wilcox (Oracle) 提交于 11月 05, 2021

These functions are wrappers around zero_user_segments(), which means
that zero_user_segments() can now be called for compound pages even when
CONFIG_TRANSPARENT_HUGEPAGE is disabled.

Use 'xend' as the name of the parameter to indicate that this is an
excluded end, not the more usual included end.  Excluding the end makes
more sense to the callers, but can cause confusion to readers who are
more used to seeing included ends.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>

c0357139

18 11月, 2021 4 次提交

page_pool: Revert "page_pool: disable dma mapping support..." · f915b75b

由 Yunsheng Lin 提交于 11月 17, 2021

This reverts commit d00e60ee.

As reported by Guillaume in [1]:
Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT
in 32-bit systems, which breaks the bootup proceess when a
ethernet driver is using page pool with PP_FLAG_DMA_MAP flag.
As we were hoping we had no active consumers for such system
when we removed the dma mapping support, and LPAE seems like
a common feature for 32 bits system, so revert it.

1. https://www.spinics.net/lists/netdev/msg779890.html

Fixes: d00e60ee ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA")
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Reported-by: N"kernelci.org bot" <bot@kernelci.org>
Tested-by: N"kernelci.org bot" <bot@kernelci.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f915b75b

KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache · 357a18ad

由 David Woodhouse 提交于 11月 15, 2021

In commit 7e2175eb ("KVM: x86: Fix recording of guest steal time /
preempted status") I removed the only user of these functions because
it was basically impossible to use them safely.

There are two stages to the GFN->PFN mapping; first through the KVM
memslots to a userspace HVA and then through the page tables to
translate that HVA to an underlying PFN. Invalidations of the former
were being handled correctly, but no attempt was made to use the MMU
notifiers to invalidate the cache when the HVA->GFN mapping changed.

As a prelude to reinventing the gfn_to_pfn_cache with more usable
semantics, rip it out entirely and untangle the implementation of
the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it.

All current users of kvm_vcpu_map() also look broken right now, and
will be dealt with separately. They broadly fall into two classes:

* Those which map, access the data and immediately unmap. This is
  mostly gratuitous and could just as well use the existing user
  HVA, and could probably benefit from a gfn_to_hva_cache as they
  do so.

* Those which keep the mapping around for a longer time, perhaps
  even using the PFN directly from the guest. These will need to
  be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map()
  can be removed too.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211115165030.7422-8-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

357a18ad

NFC: add NCI_UNREG flag to eliminate the race · 48b71a9e

由 Lin Ma 提交于 11月 16, 2021

There are two sites that calls queue_work() after the
destroy_workqueue() and lead to possible UAF.

The first site is nci_send_cmd(), which can happen after the
nci_close_device as below

nfcmrvl_nci_unregister_dev   |  nfc_genl_dev_up
  nci_close_device           |
    flush_workqueue          |
    del_timer_sync           |
  nci_unregister_device      |    nfc_get_device
    destroy_workqueue        |    nfc_dev_up
    nfc_unregister_device    |      nci_dev_up
      device_del             |        nci_open_device
                             |          __nci_request
                             |            nci_send_cmd
                             |              queue_work !!!

Another site is nci_cmd_timer, awaked by the nci_cmd_work from the
nci_send_cmd.

  ...                        |  ...
  nci_unregister_device      |  queue_work
    destroy_workqueue        |
    nfc_unregister_device    |  ...
      device_del             |  nci_cmd_work
                             |  mod_timer
                             |  ...
                             |  nci_cmd_timer
                             |    queue_work !!!

For the above two UAF, the root cause is that the nfc_dev_up can race
between the nci_unregister_device routine. Therefore, this patch
introduce NCI_UNREG flag to easily eliminate the possible race. In
addition, the mutex_lock in nci_close_device can act as a barrier.
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Fixes: 6a2968aa ("NFC: basic NCI protocol implementation")
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>

48b71a9e

sunrpc: fix header include guard in trace header · 268bb038

由 Thiago Rafael Becker 提交于 11月 17, 2021

rpcgss.h include protection was protecting against the define for
rpcrdma.h.
Signed-off-by: NThiago Rafael Becker <trbecker@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

268bb038

17 11月, 2021 9 次提交

fs: Rename AS_THP_SUPPORT and mapping_thp_support · ed2145c4

由 Matthew Wilcox (Oracle) 提交于 8月 29, 2021

These are now indicators of large folio support, not THP support.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>

ed2145c4

fs: Remove FS_THP_SUPPORT · ff36da69

由 Matthew Wilcox (Oracle) 提交于 8月 29, 2021

Instead of setting a bit in the fs_flags to set a bit in the
address_space, set the bit in the address_space directly.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>

ff36da69

mm: Remove folio_test_single · a1efe484

由 Matthew Wilcox (Oracle) 提交于 11月 16, 2021

There's no need for this predicate; callers can just use
!folio_test_large().
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>

a1efe484

M
mm: Rename folio_test_multi to folio_test_large · 9c325215
由 Matthew Wilcox (Oracle) 提交于 11月 16, 2021
```
This is a better name.  Also add kernel-doc.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
```
9c325215

Add linux/cacheflush.h · 522a0032

由 Matthew Wilcox (Oracle) 提交于 11月 06, 2021

Many architectures do not include asm-generic/cacheflush.h, so turn
the includes on their head and add linux/cacheflush.h which includes
asm/cacheflush.h.

Move the flush_dcache_folio() declaration from asm-generic/cacheflush.h
to linux/cacheflush.h and change linux/highmem.h to include
linux/cacheflush.h instead of asm/cacheflush.h so that all necessary
places will see flush_dcache_folio().

More functions should have their default implementations moved in the
future, but those are for follow-on patches.  This fixes csky, sparc and
sparc64 which were missed in the commit which added flush_dcache_folio().

Fixes: 08b0b005 ("mm: Add flush_dcache_folio()")
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>

522a0032

net: virtio_net_hdr_to_skb: count transport header in UFO · cf9acc90

由 Jonathan Davies 提交于 11月 16, 2021

virtio_net_hdr_to_skb does not set the skb's gso_size and gso_type
correctly for UFO packets received via virtio-net that are a little over
the GSO size. This can lead to problems elsewhere in the networking
stack, e.g. ovs_vport_send dropping over-sized packets if gso_size is
not set.

This is due to the comparison

  if (skb->len - p_off > gso_size)

not properly accounting for the transport layer header.

p_off includes the size of the transport layer header (thlen), so
skb->len - p_off is the size of the TCP/UDP payload.

gso_size is read from the virtio-net header. For UFO, fragmentation
happens at the IP level so does not need to include the UDP header.

Hence the calculation could be comparing a TCP/UDP payload length with
an IP payload length, causing legitimate virtio-net packets to have
lack gso_type/gso_size information.

Example: a UDP packet with payload size 1473 has IP payload size 1481.
If the guest used UFO, it is not fragmented and the virtio-net header's
flags indicate that it is a GSO frame (VIRTIO_NET_HDR_GSO_UDP), with
gso_size = 1480 for an MTU of 1500.  skb->len will be 1515 and p_off
will be 42, so skb->len - p_off = 1473.  Hence the comparison fails, and
shinfo->gso_size and gso_type are not set as they should be.

Instead, add the UDP header length before comparing to gso_size when
using UFO. In this way, it is the size of the IP payload that is
compared to gso_size.

Fixes: 6dd912f8 ("net: check untrusted gso_size at kernel entry")
Signed-off-by: NJonathan Davies <jonathan.davies@nutanix.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf9acc90

net/mlx5: E-Switch, Fix resetting of encap mode when entering switchdev · d7751d64

由 Paul Blakey 提交于 5月 20, 2021

E-Switch encap mode is relevant only when in switchdev mode.
The RDMA driver can query the encap configuration via
mlx5_eswitch_get_encap_mode(). Make sure it returns the currently
used mode and not the set one.

This reverts the cited commit which reset the encap mode
on entering switchdev and fixes the original issue properly.

Fixes: 9a64144d ("net/mlx5: E-Switch, Fix default encap mode")
Signed-off-by: NPaul Blakey <paulb@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

d7751d64

RDMA/netlink: Add __maybe_unused to static inline in C file · 83dde749

由 Leon Romanovsky 提交于 11月 07, 2021

Like other commits in the tree add __maybe_unused to a static inline in a
C file because some clang compilers will complain about unused code:

>> drivers/infiniband/core/nldev.c:2543:1: warning: unused function '__chk_RDMA_NL_NLDEV'
   MODULE_ALIAS_RDMA_NETLINK(RDMA_NL_NLDEV, 5);
   ^

Fixes: e3bf14bd ("rdma: Autoload netlink client modules")
Link: https://lore.kernel.org/r/4a8101919b765e01d7fde6f27fd572c958deeb4a.1636267207.git.leonro@nvidia.comReported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

83dde749

net: ieee802154: handle iftypes as u32 · 451dc48c

由 Alexander Aring 提交于 11月 11, 2021

This patch fixes an issue that an u32 netlink value is handled as a
signed enum value which doesn't fit into the range of u32 netlink type.
If it's handled as -1 value some BIT() evaluation ends in a
shift-out-of-bounds issue. To solve the issue we set the to u32 max which
is s32 "-1" value to keep backwards compatibility and let the followed enum
values start counting at 0. This brings the compiler to never handle the
enum as signed and a check if the value is above NL802154_IFTYPE_MAX should
filter -1 out.

Fixes: f3ea5e44 ("ieee802154: add new interface command")
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20211112030916.685793-1-aahringo@redhat.comSigned-off-by: NStefan Schmidt <stefan@datenfreihafen.org>

451dc48c

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功