- 29 11月, 2021 17 次提交
-
-
由 Christoph Hellwig 提交于
Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait given that it is unused now. Also convert the boolean at_head parameter to actually use the bool type while touching the prototype. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just use the disk attached to the request_queue instead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Fold it into it's only caller, and remove a lof of the debug checks that are not needed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Move the copying of the I/O context to the block layer as that is where we can use the proper low-level interfaces. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
Add blk_mq_complete_request_direct() which completes the block request directly instead deferring it to softirq for single queue devices. This is useful for devices which complete the requests in preemptible context and raising softirq from means scheduling ksoftirqd. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211025070658.1565848-2-bigeasy@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Eric Biggers 提交于
This function is trivial and is only used in one place. Having this function is misleading because it implies that blk_crypto_register() needs to be paired with blk_crypto_unregister(), which is not the case. Just set disk->queue->crypto_profile to NULL directly. Signed-off-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211124013733.347612-1-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This is essentially never used, yet it's about 1/3rd of the total queue size. Allocate it when needed, and don't embed it in the queue. Kill the queue flag for this while at it, since we can just check the assigned pointer now. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch to an enum and tidy up the documentation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
All modern drivers can support extra partitions using the extended dev_t. In fact except for the ioctl method drivers never even see partitions in normal operation. So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all block devices that do support partitions, and require those that do not support partitions to explicit disallow them using GENHD_FL_NO_PART. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This flag is not set directly anywhere and only inherited from GENHD_FL_HIDDEN. Just check for GENHD_FL_HIDDEN instead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The GENHD_FL_NO_PART_SCAN controls more than just partitions canning, so rename it to GENHD_FL_NO_PART. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NUlf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20211122130625.1136848-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
GENHD_FL_CD marks a gendisk as a vaguely CD-ROM like device. Besides being used internally inside of sunvdc.c an xen-blkfront it is used by xen-blkback as a hint to claim a device exported to a guest is a CD-ROM like device. Just check for disk->cdi instead which is the right indicator for "real" CD-ROM or DVD drivers. This will miss the paravirtualized guest drivers, but those make little sense to report anyway. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE is all about the event reporting mechanism, so move it to the event_flags field. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The flag to indicate an unlocked native capacity is dynamic state, not a driver capability flag, so move it to disk->state. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This function is trivial, and flush_dcache_page is always defined, so just open code it in the 2.5 callers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
blk_rq_err_bytes is only used by the scsi midlayer, so move it there. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 11月, 2021 2 次提交
-
-
由 Vladimir Oltean 提交于
As opposed to event messages (Sync, PdelayReq etc) which require timestamping, general messages (Announce, FollowUp etc) do not. In PTP they are part of different streams of data. IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP destination port assigned by IANA is 319 for event messages, and 320 for general messages. Yet the kernel seems to be missing the definition for general messages. This patch adds it. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NRichard Cochran <richardcochran@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind tc flower offload on ocelot, among other things. The ingress port mask on which VCAP rules match is present as a bit field in the actual key of the rule. This means that it is possible for a rule to be shared among multiple source ports. When the rule is added one by one on each desired port, that the ingress port mask of the key must be edited and rewritten to hardware. But the API in ocelot_vcap.c does not allow for this. For one thing, ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric, because ocelot_vcap_filter_add() works with a preallocated and prepopulated filter and programs it to hardware, and ocelot_vcap_filter_del() does both the job of removing the specified filter from hardware, as well as kfreeing it. That is to say, the only option of editing a filter in place, which is to delete it, modify the structure and add it back, does not work because it results in use-after-free. This patch introduces ocelot_vcap_filter_replace, which trivially reprograms a VCAP entry to hardware, at the exact same index at which it existed before, without modifying any list or allocating any memory. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NRichard Cochran <richardcochran@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 25 11月, 2021 1 次提交
-
-
由 Michael S. Tsirkin 提交于
This reverts commit 939779f5. Attempts to validate length in the core did not work out: there turn out to exist multiple broken devices, and in particular legacy devices are known to be broken in this respect. We have ideas for handling this better in the next version but for now let's revert to a known good state to make sure drivers work for people. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 24 11月, 2021 2 次提交
-
-
由 Juergen Gross 提交于
When booting the xenbus driver will wait for PV devices to have connected to their backends before continuing. The timeout is different between essential and non-essential devices. Non-essential devices are identified by their nodenames directly in the xenbus driver, which requires to update this list in case a new device type being non-essential is added (this was missed for several types in the past). In order to avoid this problem, add a "not_essential" flag to struct xenbus_driver which can be set to "true" by the respective frontend. Set this flag for the frontends currently regarded to be not essential (vkbs and vfb) and use it for testing in the xenbus driver. Signed-off-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20211022064800.14978-2-jgross@suse.comReviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Sakari Ailus 提交于
acpi_node_get_parent() isn't used outside drivers/acpi/property.c. Make it local. Signed-off-by: NSakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 22 11月, 2021 1 次提交
-
-
由 Nikolay Aleksandrov 提交于
We need a way to release a fib6_nh's per-cpu dsts when replacing nexthops otherwise we can end up with stale per-cpu dsts which hold net device references, so add a new IPv6 stub called fib6_nh_release_dsts. It must be used after an RCU grace period, so no new dsts can be created through a group's nexthop entry. Similar to fib6_nh_release it shouldn't be used if fib6_nh_init has failed so it doesn't need a dummy stub when IPv6 is not enabled. Fixes: 7bf4796d ("nexthops: add support for replace") Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 11月, 2021 2 次提交
-
-
由 Bui Quang Minh 提交于
When hugetlb_vm_op_open() is called during copy_vma(), we may take the reference to resv_map->css. Later, when clearing the reservation pointer of old_vma after transferring it to new_vma, we forget to drop the reference to resv_map->css. This leads to a reference leak of css. Fixes this by adding a check to drop reservation css reference in clear_vma_resv_huge_pages() Link: https://lkml.kernel.org/r/20211113154412.91134-1-minhquangbui99@gmail.com Fixes: 550a7d60 ("mm, hugepages: add mremap() support for hugepage backed vma") Signed-off-by: NBui Quang Minh <minhquangbui99@gmail.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NMina Almasry <almasrymina@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Mikhalitsyn 提交于
Currently, the exit_shm() function not designed to work properly when task->sysvshm.shm_clist holds shm objects from different IPC namespaces. This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it leads to use-after-free (reproducer exists). This is an attempt to fix the problem by extending exit_shm mechanism to handle shm's destroy from several IPC ns'es. To achieve that we do several things: 1. add a namespace (non-refcounted) pointer to the struct shmid_kernel 2. during new shm object creation (newseg()/shmget syscall) we initialize this pointer by current task IPC ns 3. exit_shm() fully reworked such that it traverses over all shp's in task->sysvshm.shm_clist and gets IPC namespace not from current task as it was before but from shp's object itself, then call shm_destroy(shp, ns). Note: We need to be really careful here, because as it was said before (1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we using special helper get_ipc_ns_not_zero() which allows to get IPC ns refcounter only if IPC ns not in the "state of destruction". Q/A Q: Why can we access shp->ns memory using non-refcounted pointer? A: Because shp object lifetime is always shorther than IPC namespace lifetime, so, if we get shp object from the task->sysvshm.shm_clist while holding task_lock(task) nobody can steal our namespace. Q: Does this patch change semantics of unshare/setns/clone syscalls? A: No. It's just fixes non-covered case when process may leave IPC namespace without getting task->sysvshm.shm_clist list cleaned up. Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com Fixes: ab602f79 ("shm: make exit_shm work proportional to task activity") Co-developed-by: NManfred Spraul <manfred@colorfullife.com> Signed-off-by: NManfred Spraul <manfred@colorfullife.com> Signed-off-by: NAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: Vasily Averin <vvs@virtuozzo.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 11月, 2021 2 次提交
-
-
由 Eric W. Biederman 提交于
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added. Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals. Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals. This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept. In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis. Reported-by: NKyle Huey <me@kylehuey.com> Reported-by: Nkernel test robot <oliver.sang@intel.com> [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpjw@mail.gmail.com [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-9020 Fixes: 00b06da2 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: a3616a3c ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: 83a1f27a ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: 086ec444 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: c317d306 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: 695dd0d6 ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 941edc5b ("exit/syscall_user_dispatch: Send ordinary signals on failure") Link: https://lkml.kernel.org/r/871r3dqfv8.fsf_-_@email.froward.int.ebiederm.orgReviewed-by: NKees Cook <keescook@chromium.org> Tested-by: NKees Cook <keescook@chromium.org> Tested-by: NKyle Huey <khuey@kylehuey.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Matthew Wilcox (Oracle) 提交于
These functions are wrappers around zero_user_segments(), which means that zero_user_segments() can now be called for compound pages even when CONFIG_TRANSPARENT_HUGEPAGE is disabled. Use 'xend' as the name of the parameter to indicate that this is an excluded end, not the more usual included end. Excluding the end makes more sense to the callers, but can cause confusion to readers who are more used to seeing included ends. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
-
- 18 11月, 2021 4 次提交
-
-
由 Yunsheng Lin 提交于
This reverts commit d00e60ee. As reported by Guillaume in [1]: Enabling LPAE always enables CONFIG_ARCH_DMA_ADDR_T_64BIT in 32-bit systems, which breaks the bootup proceess when a ethernet driver is using page pool with PP_FLAG_DMA_MAP flag. As we were hoping we had no active consumers for such system when we removed the dma mapping support, and LPAE seems like a common feature for 32 bits system, so revert it. 1. https://www.spinics.net/lists/netdev/msg779890.html Fixes: d00e60ee ("page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA") Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Reported-by: N"kernelci.org bot" <bot@kernelci.org> Tested-by: N"kernelci.org bot" <bot@kernelci.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Woodhouse 提交于
In commit 7e2175eb ("KVM: x86: Fix recording of guest steal time / preempted status") I removed the only user of these functions because it was basically impossible to use them safely. There are two stages to the GFN->PFN mapping; first through the KVM memslots to a userspace HVA and then through the page tables to translate that HVA to an underlying PFN. Invalidations of the former were being handled correctly, but no attempt was made to use the MMU notifiers to invalidate the cache when the HVA->GFN mapping changed. As a prelude to reinventing the gfn_to_pfn_cache with more usable semantics, rip it out entirely and untangle the implementation of the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it. All current users of kvm_vcpu_map() also look broken right now, and will be dealt with separately. They broadly fall into two classes: * Those which map, access the data and immediately unmap. This is mostly gratuitous and could just as well use the existing user HVA, and could probably benefit from a gfn_to_hva_cache as they do so. * Those which keep the mapping around for a longer time, perhaps even using the PFN directly from the guest. These will need to be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map() can be removed too. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211115165030.7422-8-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lin Ma 提交于
There are two sites that calls queue_work() after the destroy_workqueue() and lead to possible UAF. The first site is nci_send_cmd(), which can happen after the nci_close_device as below nfcmrvl_nci_unregister_dev | nfc_genl_dev_up nci_close_device | flush_workqueue | del_timer_sync | nci_unregister_device | nfc_get_device destroy_workqueue | nfc_dev_up nfc_unregister_device | nci_dev_up device_del | nci_open_device | __nci_request | nci_send_cmd | queue_work !!! Another site is nci_cmd_timer, awaked by the nci_cmd_work from the nci_send_cmd. ... | ... nci_unregister_device | queue_work destroy_workqueue | nfc_unregister_device | ... device_del | nci_cmd_work | mod_timer | ... | nci_cmd_timer | queue_work !!! For the above two UAF, the root cause is that the nfc_dev_up can race between the nci_unregister_device routine. Therefore, this patch introduce NCI_UNREG flag to easily eliminate the possible race. In addition, the mutex_lock in nci_close_device can act as a barrier. Signed-off-by: NLin Ma <linma@zju.edu.cn> Fixes: 6a2968aa ("NFC: basic NCI protocol implementation") Reviewed-by: NJakub Kicinski <kuba@kernel.org> Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Thiago Rafael Becker 提交于
rpcgss.h include protection was protecting against the define for rpcrdma.h. Signed-off-by: NThiago Rafael Becker <trbecker@gmail.com> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
- 17 11月, 2021 9 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
These are now indicators of large folio support, not THP support. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
Instead of setting a bit in the fs_flags to set a bit in the address_space, set the bit in the address_space directly. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Matthew Wilcox (Oracle) 提交于
There's no need for this predicate; callers can just use !folio_test_large(). Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
This is a better name. Also add kernel-doc. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
由 Matthew Wilcox (Oracle) 提交于
Many architectures do not include asm-generic/cacheflush.h, so turn the includes on their head and add linux/cacheflush.h which includes asm/cacheflush.h. Move the flush_dcache_folio() declaration from asm-generic/cacheflush.h to linux/cacheflush.h and change linux/highmem.h to include linux/cacheflush.h instead of asm/cacheflush.h so that all necessary places will see flush_dcache_folio(). More functions should have their default implementations moved in the future, but those are for follow-on patches. This fixes csky, sparc and sparc64 which were missed in the commit which added flush_dcache_folio(). Fixes: 08b0b005 ("mm: Add flush_dcache_folio()") Suggested-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
-
由 Jonathan Davies 提交于
virtio_net_hdr_to_skb does not set the skb's gso_size and gso_type correctly for UFO packets received via virtio-net that are a little over the GSO size. This can lead to problems elsewhere in the networking stack, e.g. ovs_vport_send dropping over-sized packets if gso_size is not set. This is due to the comparison if (skb->len - p_off > gso_size) not properly accounting for the transport layer header. p_off includes the size of the transport layer header (thlen), so skb->len - p_off is the size of the TCP/UDP payload. gso_size is read from the virtio-net header. For UFO, fragmentation happens at the IP level so does not need to include the UDP header. Hence the calculation could be comparing a TCP/UDP payload length with an IP payload length, causing legitimate virtio-net packets to have lack gso_type/gso_size information. Example: a UDP packet with payload size 1473 has IP payload size 1481. If the guest used UFO, it is not fragmented and the virtio-net header's flags indicate that it is a GSO frame (VIRTIO_NET_HDR_GSO_UDP), with gso_size = 1480 for an MTU of 1500. skb->len will be 1515 and p_off will be 42, so skb->len - p_off = 1473. Hence the comparison fails, and shinfo->gso_size and gso_type are not set as they should be. Instead, add the UDP header length before comparing to gso_size when using UFO. In this way, it is the size of the IP payload that is compared to gso_size. Fixes: 6dd912f8 ("net: check untrusted gso_size at kernel entry") Signed-off-by: NJonathan Davies <jonathan.davies@nutanix.com> Reviewed-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Blakey 提交于
E-Switch encap mode is relevant only when in switchdev mode. The RDMA driver can query the encap configuration via mlx5_eswitch_get_encap_mode(). Make sure it returns the currently used mode and not the set one. This reverts the cited commit which reset the encap mode on entering switchdev and fixes the original issue properly. Fixes: 9a64144d ("net/mlx5: E-Switch, Fix default encap mode") Signed-off-by: NPaul Blakey <paulb@nvidia.com> Reviewed-by: NMark Bloch <mbloch@nvidia.com> Reviewed-by: NMaor Dickman <maord@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Leon Romanovsky 提交于
Like other commits in the tree add __maybe_unused to a static inline in a C file because some clang compilers will complain about unused code: >> drivers/infiniband/core/nldev.c:2543:1: warning: unused function '__chk_RDMA_NL_NLDEV' MODULE_ALIAS_RDMA_NETLINK(RDMA_NL_NLDEV, 5); ^ Fixes: e3bf14bd ("rdma: Autoload netlink client modules") Link: https://lore.kernel.org/r/4a8101919b765e01d7fde6f27fd572c958deeb4a.1636267207.git.leonro@nvidia.comReported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
-
由 Alexander Aring 提交于
This patch fixes an issue that an u32 netlink value is handled as a signed enum value which doesn't fit into the range of u32 netlink type. If it's handled as -1 value some BIT() evaluation ends in a shift-out-of-bounds issue. To solve the issue we set the to u32 max which is s32 "-1" value to keep backwards compatibility and let the followed enum values start counting at 0. This brings the compiler to never handle the enum as signed and a check if the value is above NL802154_IFTYPE_MAX should filter -1 out. Fixes: f3ea5e44 ("ieee802154: add new interface command") Signed-off-by: NAlexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20211112030916.685793-1-aahringo@redhat.comSigned-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
-