- 23 12月, 2021 3 次提交
-
-
由 Geliang Tang 提交于
The variable 'ctrl' became useless since the code using it was dropped from nvme_setup_cmd() in the commit 292ddf67bbd5 ("nvme: increment request genctr on completion"). Fix it to get rid of this compilation warning in the nvme-5.17 branch: drivers/nvme/host/core.c: In function ‘nvme_setup_cmd’: drivers/nvme/host/core.c:993:20: warning: unused variable ‘ctrl’ [-Wunused-variable] struct nvme_ctrl *ctrl = nvme_req(req)->ctrl; ^~~~ Fixes: 292ddf67bbd5 ("nvme: increment request genctr on completion") Signed-off-by: NGeliang Tang <geliang.tang@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
The nvme request generation counter is intended to catch duplicate completions. Incrementing the counter on submission means duplicates can only be caught if the request tag is reallocated and dispatched prior to the driver observing the corrupted CQE. Incrementing on completion removes this window, making it possible to detect duplicate completions in consecutive entries. Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Currently applications have a hard time figuring out which nvme-over-fabrics arguments are supported for any given kernel; the ioctl will return an error code on failure, and the application has to guess whether this was due to an invalid argument or due to a connection or controller error. With this patch applications can read a list of supported arguments by simply reading from /dev/nvme-fabrics, allowing them to validate the connection string. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 17 12月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
This driver was for rare and shortlived high end enterprise hardware and hasn't been maintained since 2014, which also means it never got converted to use blk-mq. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 12月, 2021 6 次提交
-
-
由 Bjorn Helgaas 提交于
The rsxx driver doesn't support device suspend, so remove rsxx_pci_suspend(), the legacy PCI .suspend() method, completely. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211208192449.146076-5-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Vaibhav Gupta 提交于
Convert mtip32xx from legacy PCI power management to the generic power management framework. Previously, mtip32xx used legacy PCI power management, where mtip_pci_suspend() and mtip_pci_resume() were responsible for both device-specific things and generic PCI things: mtip_pci_suspend mtip_block_suspend(dd) <-- device-specific pci_save_state(pdev) <-- generic PCI pci_set_power_state(pdev, pci_choose_state(pdev, state)) mtip_pci_resume pci_set_power_state(PCI_D0) <-- generic PCI pci_restore_state(pdev) <-- generic PCI pcim_enable_device(pdev) <-- generic PCI pci_set_master(pdev) <-- generic PCI mtip_block_resume(dd) <-- device-specific With generic power management, the PCI bus PM methods do the generic PCI things, and the driver needs only the device-specific part, i.e., suspend_devices_and_enter dpm_suspend_start(PMSG_SUSPEND) pci_pm_suspend # PCI bus .suspend() method mtip_pci_suspend # dev->driver->pm->suspend mtip_block_suspend <-- device-specific suspend_enter dpm_suspend_noirq(PMSG_SUSPEND) pci_pm_suspend_noirq # PCI bus .suspend_noirq() method pci_save_state <-- generic PCI pci_prepare_to_sleep <-- generic PCI pci_set_power_state ... dpm_resume_end(PMSG_RESUME) pci_pm_resume # PCI bus .resume() method pci_restore_standard_config pci_set_power_state(PCI_D0) <-- generic PCI pci_restore_state <-- generic PCI mtip_pci_resume # dev->driver->pm->resume mtip_block_resume <-- device-specific [bhelgaas: commit log] Link: https://lore.kernel.org/r/20210114115423.52414-2-vaibhavgupta40@gmail.comSigned-off-by: NVaibhav Gupta <vaibhavgupta40@gmail.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211208192449.146076-4-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bjorn Helgaas 提交于
Previously we passed a struct pci_dev * to mtip_check_surprise_removal(), which immediately looked up the driver_data. But all callers already have the driver_data pointer, so just pass it directly and skip the extra lookup. No functional change intended. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211208192449.146076-3-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bjorn Helgaas 提交于
The .suspend() and .resume() methods are only called after the .probe() method (mtip_pci_probe()) has set the drvdata and returned success. Therefore, if we get to mtip_pci_suspend() or mtip_pci_resume(), the drvdata must be valid. Drop the unnecessary checking. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211208192449.146076-2-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kees Cook 提交于
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add a struct_group() for the algs so that memset() can correctly reason about the size. Signed-off-by: NKees Cook <keescook@chromium.org> Reviewed-by: NGustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20211118203712.1288866-1-keescook@chromium.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tetsuo Handa 提交于
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for commit 87579e9b ("loop: use worker per cgroup instead of kworker") is calling destroy_workqueue() with disk->open_mutex held. This circular dependency cannot be broken unless we call __loop_clr_fd() without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from lo_release() to a WQ context. Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1] Reported-by: Nsyzbot <syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com> Suggested-by: NChristoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@suse.cz> Tested-by: syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 12月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
kernel test robot reports that sparse now triggers a warning on null_blk: >> drivers/block/null_blk/main.c:1577:55: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int ioerror @@ got restricted blk_status_t [usertype] error @@ drivers/block/null_blk/main.c:1577:55: sparse: expected int ioerror drivers/block/null_blk/main.c:1577:55: sparse: got restricted blk_status_t [usertype] error because blk_mq_add_to_batch() takes an integer instead of a blk_status_t. Just cast this to an integer to silence it, null_blk is the odd one out here since the command status is the "right" type. If we change the function type, then we'll have do that for other callers too (existing and future ones). Fixes: 2385ebf3 ("block: null_blk: batched complete poll requests") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 12月, 2021 1 次提交
-
-
由 NeilBrown 提交于
The bdi congestion framework isn't widely used and should be deprecated. pktdvd makes use of it to track congestion, but this can be done entirely internally to pktdvd, so it doesn't need to use the framework. So introduce a "congested" flag. When waiting for bio_queue_size to drop, set this flag and a var_waitqueue() to wait for it. When bio_queue_size does drop and this flag is set, clear the flag and call wake_up_var(). We don't use a wait_var_event macro for the waiting as we need to set the flag and drop the spinlock before calling schedule() and while that is possible with __wait_var_event(), result is not easy to read. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NNeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/163910843527.9928.857338663717630212@noble.neil.brown.nameSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 12月, 2021 4 次提交
-
-
由 Ming Lei 提交于
Complete poll requests via blk_mq_add_to_batch() and blk_mq_end_request_batch(), so that we can cover batched complete code path by running null_blk test. Meantime this way shows ~14% IOPS boost on 't/io_uring /dev/nullb0' in my test. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203081703.3506020-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Xiongwei Song 提交于
We need to check the max request size that is from user space before allocating pages. If the request size exceeds the limit, return -EINVAL. This check can avoid the warning below from page allocator. WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 current_gfp_context include/linux/sched/mm.h:195 [inline] WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 __alloc_pages+0x45d/0x500 mm/page_alloc.c:5356 Modules linked in: CPU: 3 PID: 16525 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:__alloc_pages+0x45d/0x500 mm/page_alloc.c:5344 Code: be c9 00 00 00 48 c7 c7 20 4a 97 89 c6 05 62 32 a7 0b 01 e8 74 9a 42 07 e9 6a ff ff ff 0f 0b e9 a0 fd ff ff 40 80 e5 3f eb 88 <0f> 0b e9 18 ff ff ff 4c 89 ef 44 89 e6 45 31 ed e8 1e 76 ff ff e9 RSP: 0018:ffffc90023b87850 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff92004770f0b RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000033 RDI: 0000000000010cc1 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff81bb4686 R11: 0000000000000001 R12: ffffffff902c1960 R13: 0000000000000033 R14: 0000000000000000 R15: ffff88804cf64a30 FS: 0000000000000000(0000) GS:ffff88802cd00000(0063) knlGS:00000000f44b4b40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 000000002c921000 CR3: 000000004f507000 CR4: 0000000000150ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191 __get_free_pages+0x8/0x40 mm/page_alloc.c:5418 raw_cmd_copyin drivers/block/floppy.c:3113 [inline] raw_cmd_ioctl drivers/block/floppy.c:3160 [inline] fd_locked_ioctl+0x12e5/0x2820 drivers/block/floppy.c:3528 fd_ioctl drivers/block/floppy.c:3555 [inline] fd_compat_ioctl+0x891/0x1b60 drivers/block/floppy.c:3869 compat_blkdev_ioctl+0x3b8/0x810 block/ioctl.c:662 __do_compat_sys_ioctl+0x1c7/0x290 fs/ioctl.c:972 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Reported-by: syzbot+23a02c7df2cf2bc93fa2@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20211116131033.27685-1-sxwjean@me.comSigned-off-by: NXiongwei Song <sxwjean@gmail.com> Signed-off-by: NDenis Efremov <efremov@linux.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tasos Sahanidis 提交于
When the watchdog detects a disk change, it calls cancel_activity(), which in turn tries to cancel the fd_timer delayed work. In the above scenario, fd_timer_fn is set to fd_watchdog(), meaning it is trying to cancel its own work. This results in a hang as cancel_delayed_work_sync() is waiting for the watchdog (itself) to return, which never happens. This can be reproduced relatively consistently by attempting to read a broken floppy, and ejecting it while IO is being attempted and retried. To resolve this, this patch calls cancel_delayed_work() instead, which cancels the work without waiting for the watchdog to return and finish. Before this regression was introduced, the code in this section used del_timer(), and not del_timer_sync() to delete the watchdog timer. Link: https://lore.kernel.org/r/399e486c-6540-db27-76aa-7a271b061f76@tasossah.com Fixes: 070ad7e7 ("floppy: convert to delayed work and single-thread wq") Signed-off-by: NTasos Sahanidis <tasos@tasossah.com> Signed-off-by: NDenis Efremov <efremov@linux.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
There isn't any reason to not allow zero poll queues from user viewpoint. Also sometimes we need to compare io poll between poll mode and irq mode, so not allowing poll queues is bad. Fixes: 15dfc662 ("null_blk: Fix handling of submit_queues and poll_queues attributes") Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203023935.3424042-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 11月, 2021 16 次提交
-
-
由 Tetsuo Handa 提交于
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for commit 87579e9b ("loop: use worker per cgroup instead of kworker") is calling destroy_workqueue() with lo->lo_mutex held. Since all functions where lo->lo_state matters are already checking lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g. ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex inside __loop_clr_fd() only when asserting/updating lo->lo_state. Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test, and convert __loop_clr_fd() into a void function. Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1] Reported-by: Nsyzbot <syzbot+63614029dfb79abd4383@syzkaller.appspotmail.com> Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Now that blk_execute_rq does not take a gendisk argument there is no need to pass it through the scsi_ioctl callchain either. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait given that it is unused now. Also convert the boolean at_head parameter to actually use the bool type while touching the prototype. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just use the disk attached to the request_queue instead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The block layer already performs this check, no need to duplicate it in the driver. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMiquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Add the proper module prefix to avoid conflicts with a function in the scheduler. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211126115817.2087431-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
The completion callback for the sdhci-pci device is invoked from a kworker. I couldn't identify in which context is mmc_blk_mq_req_done() invoke but the remaining caller are from invoked from preemptible context. Here it would make sense to complete the request directly instead scheduling ksoftirqd for its completion. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org> Acked-by: NAdrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20211025070658.1565848-3-bigeasy@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Set up GENHD_FL_REMOVABLE together with the rest of the gendisk fields. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
All modern drivers can support extra partitions using the extended dev_t. In fact except for the ioctl method drivers never even see partitions in normal operation. So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all block devices that do support partitions, and require those that do not support partitions to explicit disallow them using GENHD_FL_NO_PART. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This manually reverts 07b652cdbec3 ("mmc: card: Don't show eMMC RPMB and BOOT areas in /proc/partitions"). Based on the commit description that change was purely cosmetic. mmc is the last driver that sets this flag and thus prevents it from being removed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NUlf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20211122130625.1136848-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This manually reverts commit 27290b469051 ("null_blk: suppress invalid partition info"). The message in that commit log can't appearch as the flag is never checked during probing, and there is no good reason to treat null_blk special in /proc/partitions. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The GENHD_FL_NO_PART_SCAN controls more than just partitions canning, so rename it to GENHD_FL_NO_PART. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NUlf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20211122130625.1136848-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
GENHD_FL_CD marks a gendisk as a vaguely CD-ROM like device. Besides being used internally inside of sunvdc.c an xen-blkfront it is used by xen-blkback as a hint to claim a device exported to a guest is a CD-ROM like device. Just check for disk->cdi instead which is the right indicator for "real" CD-ROM or DVD drivers. This will miss the paravirtualized guest drivers, but those make little sense to report anyway. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE is all about the event reporting mechanism, so move it to the event_flags field. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This function is trivial, and flush_dcache_page is always defined, so just open code it in the 2.5 callers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
blk_rq_err_bytes is only used by the scsi midlayer, so move it there. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20211117061404.331732-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 11月, 2021 1 次提交
-
-
由 Guenter Roeck 提交于
Use the architecture independent Kconfig option PAGE_SIZE_LESS_THAN_64KB to indicate that VMXNET3 requires a page size smaller than 64kB. Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 11月, 2021 7 次提交
-
-
由 Alex Williamson 提交于
When supporting only the .map and .unmap callbacks of iommu_ops, the IOMMU driver can make assumptions about the size and alignment used for mappings based on the driver provided pgsize_bitmap. VT-d previously used essentially PAGE_MASK for this bitmap as any power of two mapping was acceptably filled by native page sizes. However, with the .map_pages and .unmap_pages interface we're now getting page-size and count arguments. If we simply combine these as (page-size * count) and make use of the previous map/unmap functions internally, any size and alignment assumptions are very different. As an example, a given vfio device assignment VM will often create a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that does not support IOMMU super pages, the unmap_pages interface will ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level() will recurse down to level 2 of the page table where the first half of the pfn range exactly matches the entire pte level. We clear the pte, increment the pfn by the level size, but (oops) the next pte is on a new page, so we exit the loop an pop back up a level. When we then update the pfn based on that higher level, we seem to assume that the previous pfn value was at the start of the level. In this case the level size is 256K pfns, which we add to the base pfn and get a results of 0x7fe00, which is clearly greater than 0x401ff, so we're done. Meanwhile we never cleared the ptes for the remainder of the range. When the VM remaps this range, we're overwriting valid ptes and the VT-d driver complains loudly, as reported by the user report linked below. The fix for this seems relatively simple, if each iteration of the loop in dma_pte_clear_level() is assumed to clear to the end of the level pte page, then our next pfn should be calculated from level_pfn rather than our working pfn. Fixes: 3f34f125 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback") Reported-by: NAjay Garg <ajaygargnsit@gmail.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Tested-by: NGiovanni Cabiddu <giovanni.cabiddu@intel.com> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/ Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omenSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
由 Christophe JAILLET 提交于
If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious. Go through the end of the function instead. This way, the missing 'rcu_read_unlock()' is called. Fixes: 7afd7f6a ("iommu/vt-d: Check FL and SL capability sanity in scalable mode") Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.frSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
由 Alex Bee 提交于
With the submission of iommu driver for RK3568 a subtle bug was introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be the other way arround - that leads to random errors, especially when addresses beyond 32 bit are used. Fix it. Fixes: c55356c5 ("iommu: rockchip: Add support for iommu v2") Signed-off-by: NAlex Bee <knaerzche@gmail.com> Tested-by: NPeter Geis <pgwipeout@gmail.com> Reviewed-by: NHeiko Stuebner <heiko@sntech.de> Tested-by: NDan Johansen <strit@manjaro.org> Reviewed-by: NBenjamin Gaignard <benjamin.gaignard@collabora.com> Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
由 Joerg Roedel 提交于
The messages printed on the initialization of the AMD IOMMUv2 driver have caused some confusion in the past. Clarify the messages to lower the confusion in the future. Cc: stable@vger.kernel.org Signed-off-by: NJoerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org
-
由 Oleksij Rempel 提交于
Current driver version is able to handle only one bridge at time. Configuring two bridges on two different ports would end up shorting this bridges by HW. To reproduce it: ip l a name br0 type bridge ip l a name br1 type bridge ip l s dev br0 up ip l s dev br1 up ip l s lan1 master br0 ip l s dev lan1 up ip l s lan2 master br1 ip l s dev lan2 up Ping on lan1 and get response on lan2, which should not happen. This happened, because current driver version is storing one global "Port VLAN Membership" and applying it to all ports which are members of any bridge. To solve this issue, we need to handle each port separately. This patch is dropping the global port member storage and calculating membership dynamically depending on STP state and bridge participation. Note: STP support was broken before this patch and should be fixed separately. Fixes: c2e86691 ("net: dsa: microchip: break KSZ9477 DSA driver into two files") Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: NVladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
The driver doesn't support RX timestamping for non-PTP packets, but it declares that it does. Restrict the reported RX filters to PTP v2 over L2 and over L4. Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
IEEE 1588 support was declared too soon for the Ocelot switch. Out of reset, this switch does not apply any special treatment for PTP packets, i.e. when an event message is received, the natural tendency is to forward it by MAC DA/VLAN ID. This poses a problem when the ingress port is under a bridge, since user space application stacks (written primarily for endpoint ports, not switches) like ptp4l expect that PTP messages are always received on AF_PACKET / AF_INET sockets (depending on the PTP transport being used), and never being autonomously forwarded. Any forwarding, if necessary (for example in Transparent Clock mode) is handled in software by ptp4l. Having the hardware forward these packets too will cause duplicates which will confuse endpoints connected to these switches. So PTP over L2 barely works, in the sense that PTP packets reach the CPU port, but they reach it via flooding, and therefore reach lots of other unwanted destinations too. But PTP over IPv4/IPv6 does not work at all. This is because the Ocelot switch have a separate destination port mask for unknown IP multicast (which PTP over IP is) flooding compared to unknown non-IP multicast (which PTP over L2 is) flooding. Specifically, the driver allows the CPU port to be in the PGID_MC port group, but not in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from Allan Nielsen which explain that the embedded MIPS CPU on Ocelot switches is not very powerful at all, so every penny they could save by not allowing flooding to the CPU port module matters. Unknown IP multicast did not make it. The de facto consensus is that when a switch is PTP-aware and an application stack for PTP is running, switches should have some sort of trapping mechanism for PTP packets, to extract them from the hardware data path. This avoids both problems: (a) PTP packets are no longer flooded to unwanted destinations (b) PTP over IP packets are no longer denied from reaching the CPU since they arrive there via a trap and not via flooding It is not the first time when this change is attempted. Last time, the feedback from Allan Nielsen and Andrew Lunn was that the traps should not be installed by default, and that PTP-unaware switching may be desired for some use cases: https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/ To address that feedback, the present patch adds the necessary packet traps according to the RX filter configuration transmitted by user space through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we keep 5 filters, which are amended each time RX timestamping is enabled or disabled on a port: - 1 for PTP over L2 - 2 for PTP over IPv4 (UDP ports 319 and 320) - 2 for PTP over IPv6 (UDP ports 319 and 320) The cookie by which these filters (invisible to tc) are identified is strategically chosen such that it does not collide with the filters used for the ocelot-8021q tagging protocol by the Felix driver, or with the MRP traps set up by the Ocelot library. Other alternatives were considered, like patching user space to do something, but there are so many ways in which PTP packets could be made to reach the CPU, generically speaking, that "do what?" is a very valid question. The ptp4l program from the linuxptp stack already attempts to do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into a dev_mc_add() on the interface, in the kernel: https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73 https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c Reality shows that this is not sufficient in case the interface belongs to a switchdev driver, as dev_mc_add() does not show the intention to trap a packet to the CPU, but rather the intention to not drop it (it is strictly for RX filtering, same as promiscuous does not mean to send all traffic to the CPU, but to not drop traffic with unknown MAC DA). This topic is a can of worms in itself, and it would be great if user space could just stay out of it. On the other hand, setting up PTP traps privately within the driver is not new by any stretch of the imagination: https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833 https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050 https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21 So this is the approach taken here as well. The difference here being that we prepare and destroy the traps per port, dynamically at runtime, as opposed to driver init time, because apparently, PTP-unaware forwarding is a use case. Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support") Reported-by: NPo Liu <po.liu@nxp.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NRichard Cochran <richardcochran@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-