提交 · 3a605e32a7f8f78d844b4272c257029c337a4352 · openeuler / Kernel

23 12月, 2021 3 次提交

nvme: drop unused variable ctrl in nvme_setup_cmd · 3a605e32

由 Geliang Tang 提交于 12月 22, 2021

The variable 'ctrl' became useless since the code using it was dropped
from nvme_setup_cmd() in the commit 292ddf67bbd5 ("nvme: increment
request genctr on completion"). Fix it to get rid of this compilation
warning in the nvme-5.17 branch:

 drivers/nvme/host/core.c: In function ‘nvme_setup_cmd’:
 drivers/nvme/host/core.c:993:20: warning: unused variable ‘ctrl’ [-Wunused-variable]
   struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
                     ^~~~

Fixes: 292ddf67bbd5 ("nvme: increment request genctr on completion")
Signed-off-by: NGeliang Tang <geliang.tang@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3a605e32

nvme: increment request genctr on completion · e4fdb2b1

由 Keith Busch 提交于 12月 13, 2021

The nvme request generation counter is intended to catch duplicate
completions. Incrementing the counter on submission means duplicates can
only be caught if the request tag is reallocated and dispatched prior to
the driver observing the corrupted CQE. Incrementing on completion
removes this window, making it possible to detect duplicate completions
in consecutive entries.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e4fdb2b1

nvme-fabrics: print out valid arguments when reading from /dev/nvme-fabrics · f18ee3d9

由 Hannes Reinecke 提交于 12月 07, 2021

Currently applications have a hard time figuring out which
nvme-over-fabrics arguments are supported for any given kernel;
the ioctl will return an error code on failure, and the application
has to guess whether this was due to an invalid argument or due
to a connection or controller error.
With this patch applications can read a list of supported
arguments by simply reading from /dev/nvme-fabrics, allowing
them to validate the connection string.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f18ee3d9

17 12月, 2021 1 次提交

block: remove the rsxx driver · 3427f2b2

由 Christoph Hellwig 提交于 12月 16, 2021

This driver was for rare and shortlived high end enterprise hardware
and hasn't been maintained since 2014, which also means it never got
converted to use blk-mq.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3427f2b2

14 12月, 2021 6 次提交

rsxx: Drop PCI legacy power management · ac6f6548

由 Bjorn Helgaas 提交于 12月 08, 2021

The rsxx driver doesn't support device suspend, so remove
rsxx_pci_suspend(), the legacy PCI .suspend() method, completely.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-5-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

ac6f6548

mtip32xx: convert to generic power management · cd97b7e0

由 Vaibhav Gupta 提交于 12月 08, 2021

Convert mtip32xx from legacy PCI power management to the generic power
management framework.

Previously, mtip32xx used legacy PCI power management, where
mtip_pci_suspend() and mtip_pci_resume() were responsible for both
device-specific things and generic PCI things:

  mtip_pci_suspend
    mtip_block_suspend(dd)              <-- device-specific
    pci_save_state(pdev)                <-- generic PCI
    pci_set_power_state(pdev, pci_choose_state(pdev, state))

  mtip_pci_resume
    pci_set_power_state(PCI_D0)         <-- generic PCI
    pci_restore_state(pdev)             <-- generic PCI
    pcim_enable_device(pdev)            <-- generic PCI
    pci_set_master(pdev)                <-- generic PCI
    mtip_block_resume(dd)               <-- device-specific

With generic power management, the PCI bus PM methods do the generic PCI
things, and the driver needs only the device-specific part, i.e.,

  suspend_devices_and_enter
    dpm_suspend_start(PMSG_SUSPEND)
      pci_pm_suspend                    # PCI bus .suspend() method
        mtip_pci_suspend                # dev->driver->pm->suspend
          mtip_block_suspend            <-- device-specific
    suspend_enter
      dpm_suspend_noirq(PMSG_SUSPEND)
        pci_pm_suspend_noirq            # PCI bus .suspend_noirq() method
          pci_save_state                <-- generic PCI
          pci_prepare_to_sleep          <-- generic PCI
            pci_set_power_state
    ...
    dpm_resume_end(PMSG_RESUME)
      pci_pm_resume                     # PCI bus .resume() method
        pci_restore_standard_config
          pci_set_power_state(PCI_D0)   <-- generic PCI
          pci_restore_state             <-- generic PCI
        mtip_pci_resume                 # dev->driver->pm->resume
          mtip_block_resume             <-- device-specific

[bhelgaas: commit log]

Link: https://lore.kernel.org/r/20210114115423.52414-2-vaibhavgupta40@gmail.comSigned-off-by: NVaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-4-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

cd97b7e0

mtip32xx: remove pointless drvdata lookups · 9e541f14

由 Bjorn Helgaas 提交于 12月 08, 2021

Previously we passed a struct pci_dev * to mtip_check_surprise_removal(),
which immediately looked up the driver_data. But all callers already have
the driver_data pointer, so just pass it directly and skip the extra
lookup. No functional change intended.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-3-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

9e541f14

mtip32xx: remove pointless drvdata checking · 2920417c

由 Bjorn Helgaas 提交于 12月 08, 2021

The .suspend() and .resume() methods are only called after the .probe()
method (mtip_pci_probe()) has set the drvdata and returned success.

Therefore, if we get to mtip_pci_suspend() or mtip_pci_resume(), the
drvdata must be valid. Drop the unnecessary checking.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-2-helgaas@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

2920417c

drbd: Use struct_group() to zero algs · 52a0cab3

由 Kees Cook 提交于 11月 18, 2021

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the algs so that memset() can correctly reason
about the size.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20211118203712.1288866-1-keescook@chromium.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

52a0cab3

loop: make autoclear operation asynchronous · 322c4293

由 Tetsuo Handa 提交于 12月 13, 2021

syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with disk->open_mutex held.

This circular dependency cannot be broken unless we call __loop_clr_fd()
without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from
lo_release() to a WQ context.

Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1]
Reported-by: Nsyzbot <syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com>
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Tested-by: syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>

322c4293

11 12月, 2021 1 次提交

null_blk: cast command status to integer · c5eafd79

由 Jens Axboe 提交于 12月 10, 2021

kernel test robot reports that sparse now triggers a warning on null_blk:

>> drivers/block/null_blk/main.c:1577:55: sparse: sparse: incorrect type in argument 3 (different base types) @@     expected int ioerror @@     got restricted blk_status_t [usertype] error @@
   drivers/block/null_blk/main.c:1577:55: sparse:     expected int ioerror
   drivers/block/null_blk/main.c:1577:55: sparse:     got restricted blk_status_t [usertype] error

because blk_mq_add_to_batch() takes an integer instead of a blk_status_t.
Just cast this to an integer to silence it, null_blk is the odd one out
here since the command status is the "right" type. If we change the
function type, then we'll have do that for other callers too (existing and
future ones).

Fixes: 2385ebf3 ("block: null_blk: batched complete poll requests")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c5eafd79

10 12月, 2021 1 次提交

pktdvd: stop using bdi congestion framework. · db67097a

由 NeilBrown 提交于 12月 09, 2021

The bdi congestion framework isn't widely used and should be
deprecated.

pktdvd makes use of it to track congestion, but this can be done
entirely internally to pktdvd, so it doesn't need to use the framework.

So introduce a "congested" flag.  When waiting for bio_queue_size to
drop, set this flag and a var_waitqueue() to wait for it.  When
bio_queue_size does drop and this flag is set, clear the flag and call
wake_up_var().

We don't use a wait_var_event macro for the waiting as we need to set
the flag and drop the spinlock before calling schedule() and while that
is possible with __wait_var_event(), result is not easy to read.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NNeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/163910843527.9928.857338663717630212@noble.neil.brown.nameSigned-off-by: NJens Axboe <axboe@kernel.dk>

db67097a

03 12月, 2021 4 次提交

block: null_blk: batched complete poll requests · 2385ebf3

由 Ming Lei 提交于 12月 03, 2021

Complete poll requests via blk_mq_add_to_batch() and
blk_mq_end_request_batch(), so that we can cover batched complete
code path by running null_blk test.

Meantime this way shows ~14% IOPS boost on 't/io_uring /dev/nullb0'
in my test.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203081703.3506020-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2385ebf3

floppy: Add max size check for user space request · 545a3249

由 Xiongwei Song 提交于 11月 16, 2021

We need to check the max request size that is from user space before
allocating pages. If the request size exceeds the limit, return -EINVAL.
This check can avoid the warning below from page allocator.

WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 current_gfp_context include/linux/sched/mm.h:195 [inline]
WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 __alloc_pages+0x45d/0x500 mm/page_alloc.c:5356
Modules linked in:
CPU: 3 PID: 16525 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:__alloc_pages+0x45d/0x500 mm/page_alloc.c:5344
Code: be c9 00 00 00 48 c7 c7 20 4a 97 89 c6 05 62 32 a7 0b 01 e8 74 9a 42 07 e9 6a ff ff ff 0f 0b e9 a0 fd ff ff 40 80 e5 3f eb 88 <0f> 0b e9 18 ff ff ff 4c 89 ef 44 89 e6 45 31 ed e8 1e 76 ff ff e9
RSP: 0018:ffffc90023b87850 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 1ffff92004770f0b RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000033 RDI: 0000000000010cc1
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff81bb4686 R11: 0000000000000001 R12: ffffffff902c1960
R13: 0000000000000033 R14: 0000000000000000 R15: ffff88804cf64a30
FS:  0000000000000000(0000) GS:ffff88802cd00000(0063) knlGS:00000000f44b4b40
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 000000002c921000 CR3: 000000004f507000 CR4: 0000000000150ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
 __get_free_pages+0x8/0x40 mm/page_alloc.c:5418
 raw_cmd_copyin drivers/block/floppy.c:3113 [inline]
 raw_cmd_ioctl drivers/block/floppy.c:3160 [inline]
 fd_locked_ioctl+0x12e5/0x2820 drivers/block/floppy.c:3528
 fd_ioctl drivers/block/floppy.c:3555 [inline]
 fd_compat_ioctl+0x891/0x1b60 drivers/block/floppy.c:3869
 compat_blkdev_ioctl+0x3b8/0x810 block/ioctl.c:662
 __do_compat_sys_ioctl+0x1c7/0x290 fs/ioctl.c:972
 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
 __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203
 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Reported-by: syzbot+23a02c7df2cf2bc93fa2@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20211116131033.27685-1-sxwjean@me.comSigned-off-by: NXiongwei Song <sxwjean@gmail.com>
Signed-off-by: NDenis Efremov <efremov@linux.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

545a3249

floppy: Fix hang in watchdog when disk is ejected · fb48febc

由 Tasos Sahanidis 提交于 9月 03, 2021

When the watchdog detects a disk change, it calls cancel_activity(),
which in turn tries to cancel the fd_timer delayed work.

In the above scenario, fd_timer_fn is set to fd_watchdog(), meaning
it is trying to cancel its own work.
This results in a hang as cancel_delayed_work_sync() is waiting for the
watchdog (itself) to return, which never happens.

This can be reproduced relatively consistently by attempting to read a
broken floppy, and ejecting it while IO is being attempted and retried.

To resolve this, this patch calls cancel_delayed_work() instead, which
cancels the work without waiting for the watchdog to return and finish.

Before this regression was introduced, the code in this section used
del_timer(), and not del_timer_sync() to delete the watchdog timer.

Link: https://lore.kernel.org/r/399e486c-6540-db27-76aa-7a271b061f76@tasossah.com
Fixes: 070ad7e7 ("floppy: convert to delayed work and single-thread wq")
Signed-off-by: NTasos Sahanidis <tasos@tasossah.com>
Signed-off-by: NDenis Efremov <efremov@linux.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fb48febc

null_blk: allow zero poll queues · 2bfdbe8b

由 Ming Lei 提交于 12月 03, 2021

There isn't any reason to not allow zero poll queues from user
viewpoint.

Also sometimes we need to compare io poll between poll mode and irq
mode, so not allowing poll queues is bad.

Fixes: 15dfc662 ("null_blk: Fix handling of submit_queues and poll_queues attributes")
Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203023935.3424042-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2bfdbe8b

29 11月, 2021 16 次提交

loop: don't hold lo_mutex during __loop_clr_fd() · 6050fa4c

由 Tetsuo Handa 提交于 11月 24, 2021

syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with lo->lo_mutex held.

Since all functions where lo->lo_state matters are already checking
lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g.
ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either
ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered
as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex
inside __loop_clr_fd() only when asserting/updating lo->lo_state.

Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid
lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or
ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test,
and convert __loop_clr_fd() into a void function.

Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1]
Reported-by: Nsyzbot <syzbot+63614029dfb79abd4383@syzkaller.appspotmail.com>
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakura.ne.jpSigned-off-by: NJens Axboe <axboe@kernel.dk>

6050fa4c

scsi: remove the gendisk argument to scsi_ioctl · a30e3441

由 Christoph Hellwig 提交于 11月 26, 2021

Now that blk_execute_rq does not take a gendisk argument there is no need
to pass it through the scsi_ioctl callchain either.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a30e3441

block: remove the gendisk argument to blk_execute_rq · b84ba30b

由 Christoph Hellwig 提交于 11月 26, 2021

Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait
given that it is unused now. Also convert the boolean at_head parameter
to actually use the bool type while touching the prototype.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b84ba30b

block: remove the ->rq_disk field in struct request · f3fa33ac

由 Christoph Hellwig 提交于 11月 26, 2021

Just use the disk attached to the request_queue instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f3fa33ac

mtd_blkdevs: remove the sector out of range check in do_blktrans_request · 82baa324

由 Christoph Hellwig 提交于 11月 26, 2021

The block layer already performs this check, no need to duplicate it in
the driver.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMiquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

82baa324

RDMA/qib: rename copy_io to qib_copy_io · e92a559e

由 Christoph Hellwig 提交于 11月 26, 2021

Add the proper module prefix to avoid conflicts with a function
in the scheduler.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e92a559e

mmc: core: Use blk_mq_complete_request_direct(). · 639d3531

由 Sebastian Andrzej Siewior 提交于 10月 25, 2021

The completion callback for the sdhci-pci device is invoked from a
kworker.
I couldn't identify in which context is mmc_blk_mq_req_done() invoke but
the remaining caller are from invoked from preemptible context. Here it
would make sense to complete the request directly instead scheduling
ksoftirqd for its completion.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20211025070658.1565848-3-bigeasy@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

639d3531

sr: set GENHD_FL_REMOVABLE earlier · a4561f9f

由 Christoph Hellwig 提交于 11月 22, 2021

Set up GENHD_FL_REMOVABLE together with the rest of the gendisk fields.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a4561f9f

block: remove GENHD_FL_EXT_DEVT · 1ebe2e5f

由 Christoph Hellwig 提交于 11月 22, 2021

All modern drivers can support extra partitions using the extended
dev_t.  In fact except for the ioctl method drivers never even see
partitions in normal operation.

So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all
block devices that do support partitions, and require those that
do not support partitions to explicit disallow them using
GENHD_FL_NO_PART.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1ebe2e5f

mmc: don't set GENHD_FL_SUPPRESS_PARTITION_INFO · 79b0f79a

由 Christoph Hellwig 提交于 11月 22, 2021

This manually reverts 07b652cdbec3 ("mmc: card: Don't show eMMC RPMB and
BOOT areas in /proc/partitions"). Based on the commit description that
change was purely cosmetic. mmc is the last driver that sets this
flag and thus prevents it from being removed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20211122130625.1136848-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

79b0f79a

null_blk: don't suppress partitioning information · 94b49c3d

由 Christoph Hellwig 提交于 11月 22, 2021

This manually reverts commit 27290b469051 ("null_blk: suppress invalid
partition info"). The message in that commit log can't appearch as
the flag is never checked during probing, and there is no good reason
to treat null_blk special in /proc/partitions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

94b49c3d

block: rename GENHD_FL_NO_PART_SCAN to GENHD_FL_NO_PART · 46e7eac6

由 Christoph Hellwig 提交于 11月 22, 2021

The GENHD_FL_NO_PART_SCAN controls more than just partitions canning,
so rename it to GENHD_FL_NO_PART.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20211122130625.1136848-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

46e7eac6

block: remove GENHD_FL_CD · 1a827ce1

由 Christoph Hellwig 提交于 11月 22, 2021

GENHD_FL_CD marks a gendisk as a vaguely CD-ROM like device.
Besides being used internally inside of sunvdc.c an xen-blkfront it
is used by xen-blkback as a hint to claim a device exported to a
guest is a CD-ROM like device. Just check for disk->cdi instead
which is the right indicator for "real" CD-ROM or DVD drivers. This
will miss the paravirtualized guest drivers, but those make little
sense to report anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1a827ce1

block: move GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE to disk->event_flags · 1545e0b4

由 Christoph Hellwig 提交于 11月 22, 2021

GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE is all about the event reporting
mechanism, so move it to the event_flags field.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1545e0b4

block: remove rq_flush_dcache_pages · 786d4e01

由 Christoph Hellwig 提交于 11月 17, 2021

This function is trivial, and flush_dcache_page is always defined, so
just open code it in the 2.5 callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

786d4e01

block: move blk_rq_err_bytes to scsi · 79478bf9

由 Christoph Hellwig 提交于 11月 17, 2021

blk_rq_err_bytes is only used by the scsi midlayer, so move it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

79478bf9

28 11月, 2021 1 次提交

vmxnet3: Use generic Kconfig option for page size limit · 00169a92

由 Guenter Roeck 提交于 11月 27, 2021

Use the architecture independent Kconfig option PAGE_SIZE_LESS_THAN_64KB
to indicate that VMXNET3 requires a page size smaller than 64kB.
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00169a92

27 11月, 2021 7 次提交

iommu/vt-d: Fix unmap_pages support · 86dc40c7

由 Alex Williamson 提交于 11月 26, 2021

When supporting only the .map and .unmap callbacks of iommu_ops,
the IOMMU driver can make assumptions about the size and alignment
used for mappings based on the driver provided pgsize_bitmap. VT-d
previously used essentially PAGE_MASK for this bitmap as any power
of two mapping was acceptably filled by native page sizes.

However, with the .map_pages and .unmap_pages interface we're now
getting page-size and count arguments. If we simply combine these
as (page-size * count) and make use of the previous map/unmap
functions internally, any size and alignment assumptions are very
different.

As an example, a given vfio device assignment VM will often create
a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
does not support IOMMU super pages, the unmap_pages interface will
ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
will recurse down to level 2 of the page table where the first half
of the pfn range exactly matches the entire pte level. We clear the
pte, increment the pfn by the level size, but (oops) the next pte is
on a new page, so we exit the loop an pop back up a level. When we
then update the pfn based on that higher level, we seem to assume
that the previous pfn value was at the start of the level. In this
case the level size is 256K pfns, which we add to the base pfn and
get a results of 0x7fe00, which is clearly greater than 0x401ff,
so we're done. Meanwhile we never cleared the ptes for the remainder
of the range. When the VM remaps this range, we're overwriting valid
ptes and the VT-d driver complains loudly, as reported by the user
report linked below.

The fix for this seems relatively simple, if each iteration of the
loop in dma_pte_clear_level() is assumed to clear to the end of the
level pte page, then our next pfn should be calculated from level_pfn
rather than our working pfn.

Fixes: 3f34f125 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
Reported-by: NAjay Garg <ajaygargnsit@gmail.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Tested-by: NGiovanni Cabiddu <giovanni.cabiddu@intel.com>
Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omenSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

86dc40c7

iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock() · 4e5973dd

由 Christophe JAILLET 提交于 11月 26, 2021

If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious.
Go through the end of the function instead. This way, the missing
'rcu_read_unlock()' is called.

Fixes: 7afd7f6a ("iommu/vt-d: Check FL and SL capability sanity in scalable mode")
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.frSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

4e5973dd

iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568 · f7ff3cff

由 Alex Bee 提交于 11月 24, 2021

With the submission of iommu driver for RK3568 a subtle bug was
introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be
the other way arround - that leads to random errors, especially when
addresses beyond 32 bit are used.

Fix it.

Fixes: c55356c5 ("iommu: rockchip: Add support for iommu v2")
Signed-off-by: NAlex Bee <knaerzche@gmail.com>
Tested-by: NPeter Geis <pgwipeout@gmail.com>
Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
Tested-by: NDan Johansen <strit@manjaro.org>
Reviewed-by: NBenjamin Gaignard <benjamin.gaignard@collabora.com>
Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

f7ff3cff

iommu/amd: Clarify AMD IOMMUv2 initialization messages · 717e88aa

由 Joerg Roedel 提交于 11月 23, 2021

The messages printed on the initialization of the AMD IOMMUv2 driver
have caused some confusion in the past. Clarify the messages to lower
the confusion in the future.

Cc: stable@vger.kernel.org
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org

717e88aa

net: dsa: microchip: implement multi-bridge support · b3612ccd

由 Oleksij Rempel 提交于 11月 26, 2021

Current driver version is able to handle only one bridge at time.
Configuring two bridges on two different ports would end up shorting this
bridges by HW. To reproduce it:

	ip l a name br0 type bridge
	ip l a name br1 type bridge
	ip l s dev br0 up
	ip l s dev br1 up
	ip l s lan1 master br0
	ip l s dev lan1 up
	ip l s lan2 master br1
	ip l s dev lan2 up

	Ping on lan1 and get response on lan2, which should not happen.

This happened, because current driver version is storing one global "Port VLAN
Membership" and applying it to all ports which are members of any
bridge.
To solve this issue, we need to handle each port separately.

This patch is dropping the global port member storage and calculating
membership dynamically depending on STP state and bridge participation.

Note: STP support was broken before this patch and should be fixed
separately.

Fixes: c2e86691 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>

b3612ccd

net: mscc: ocelot: correctly report the timestamping RX filters in ethtool · c49a35ee

由 Vladimir Oltean 提交于 11月 26, 2021

The driver doesn't support RX timestamping for non-PTP packets, but it
declares that it does. Restrict the reported RX filters to PTP v2 over
L2 and over L4.

Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

c49a35ee

net: mscc: ocelot: set up traps for PTP packets · 96ca08c0

由 Vladimir Oltean 提交于 11月 26, 2021

IEEE 1588 support was declared too soon for the Ocelot switch. Out of
reset, this switch does not apply any special treatment for PTP packets,
i.e. when an event message is received, the natural tendency is to
forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
is under a bridge, since user space application stacks (written
primarily for endpoint ports, not switches) like ptp4l expect that PTP
messages are always received on AF_PACKET / AF_INET sockets (depending
on the PTP transport being used), and never being autonomously
forwarded. Any forwarding, if necessary (for example in Transparent
Clock mode) is handled in software by ptp4l. Having the hardware forward
these packets too will cause duplicates which will confuse endpoints
connected to these switches.

So PTP over L2 barely works, in the sense that PTP packets reach the CPU
port, but they reach it via flooding, and therefore reach lots of other
unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
This is because the Ocelot switch have a separate destination port mask
for unknown IP multicast (which PTP over IP is) flooding compared to
unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
the driver allows the CPU port to be in the PGID_MC port group, but not
in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
switches is not very powerful at all, so every penny they could save by
not allowing flooding to the CPU port module matters. Unknown IP
multicast did not make it.

The de facto consensus is that when a switch is PTP-aware and an
application stack for PTP is running, switches should have some sort of
trapping mechanism for PTP packets, to extract them from the hardware
data path. This avoids both problems:
(a) PTP packets are no longer flooded to unwanted destinations
(b) PTP over IP packets are no longer denied from reaching the CPU since
    they arrive there via a trap and not via flooding

It is not the first time when this change is attempted. Last time, the
feedback from Allan Nielsen and Andrew Lunn was that the traps should
not be installed by default, and that PTP-unaware switching may be
desired for some use cases:
https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/

To address that feedback, the present patch adds the necessary packet
traps according to the RX filter configuration transmitted by user space
through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
keep 5 filters, which are amended each time RX timestamping is enabled
or disabled on a port:
- 1 for PTP over L2
- 2 for PTP over IPv4 (UDP ports 319 and 320)
- 2 for PTP over IPv6 (UDP ports 319 and 320)

The cookie by which these filters (invisible to tc) are identified is
strategically chosen such that it does not collide with the filters used
for the ocelot-8021q tagging protocol by the Felix driver, or with the
MRP traps set up by the Ocelot library.

Other alternatives were considered, like patching user space to do
something, but there are so many ways in which PTP packets could be made
to reach the CPU, generically speaking, that "do what?" is a very valid
question. The ptp4l program from the linuxptp stack already attempts to
do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
a dev_mc_add() on the interface, in the kernel:
https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73
https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c

Reality shows that this is not sufficient in case the interface belongs
to a switchdev driver, as dev_mc_add() does not show the intention to
trap a packet to the CPU, but rather the intention to not drop it (it is
strictly for RX filtering, same as promiscuous does not mean to send all
traffic to the CPU, but to not drop traffic with unknown MAC DA). This
topic is a can of worms in itself, and it would be great if user space
could just stay out of it.

On the other hand, setting up PTP traps privately within the driver is
not new by any stretch of the imagination:
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050
https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21

So this is the approach taken here as well. The difference here being
that we prepare and destroy the traps per port, dynamically at runtime,
as opposed to driver init time, because apparently, PTP-unaware
forwarding is a use case.

Fixes: 4e3b0468 ("net: mscc: PTP Hardware Clock (PHC) support")
Reported-by: NPo Liu <po.liu@nxp.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

96ca08c0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功