提交 · 5eff363838654790f67f4bd564c5782967f67bcc · openeuler / Kernel

11 12月, 2021 5 次提交

Revert "mtd_blkdevs: don't scan partitions for plain mtdblock" · 5eff3638

由 Jens Axboe 提交于 3年前

This reverts commit 776b54e9.

Looks like a last minute edit snuck into this patch, and as a result,
it doesn't even compile. Revert the change for now.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5eff3638

block: fix ioprio_get(IOPRIO_WHO_PGRP) vs setuid(2) · e6a59aac

由 Davidlohr Bueso 提交于 3年前

do_each_pid_thread(PIDTYPE_PGID) can race with a concurrent
change_pid(PIDTYPE_PGID) that can move the task from one hlist
to another while iterating. Serialize ioprio_get to take
the tasklist_lock in this case, just like it's set counterpart.

Fixes: d69b78ba (ioprio: grab rcu_read_lock in sys_ioprio_{set,get}())
Acked-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Link: https://lore.kernel.org/r/20211210182058.43417-1-dave@stgolabs.netSigned-off-by: NJens Axboe <axboe@kernel.dk>

e6a59aac

Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.16 · a5c24552

由 Jens Axboe 提交于 3年前

Pull MD fixes from Song.

* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: fix double free of mddev->private in autorun_array()
  md: fix update super 1.0 on rdev size change

a5c24552

md: fix double free of mddev->private in autorun_array() · 07641b5f

由 zhangyue 提交于 3年前

In driver/md/md.c, if the function autorun_array() is called,
the problem of double free may occur.

In function autorun_array(), when the function do_md_run() returns an
error, the function do_md_stop() will be called.

The function do_md_run() called function md_run(), but in function
md_run(), the pointer mddev->private may be freed.

The function do_md_stop() called the function __md_stop(), but in
function __md_stop(), the pointer mddev->private also will be freed
without judging null.

At this time, the pointer mddev->private will be double free, so it
needs to be judged null or not.
Signed-off-by: Nzhangyue <zhangyue1@kylinos.cn>
Signed-off-by: NSong Liu <songliubraving@fb.com>

07641b5f

md: fix update super 1.0 on rdev size change · 55df1ce0

由 Markus Hochholdinger 提交于 3年前

The superblock of version 1.0 doesn't get moved to the new position on a
device size change. This leads to a rdev without a superblock on a known
position, the raid can't be re-assembled.

The line was removed by mistake and is re-added by this patch.

Fixes: d9c0fa50 ("md: fix max sectors calculation for super 1.0")
Cc: stable@vger.kernel.org
Signed-off-by: NMarkus Hochholdinger <markus@hochholdinger.net>
Reviewed-by: NXiao Ni <xni@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

55df1ce0

10 12月, 2021 1 次提交

Merge tag 'nvme-5.16-2021-12-10' of git://git.infradead.org/nvme into block-5.16 · 091f06d9

由 Jens Axboe 提交于 3年前

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.16

 - set ana_log_size to 0 after freeing ana_log_buf (Hou Tao)
 - show subsys nqn for duplicate cntlids (Keith Busch)
 - disable namespace access for unsupported metadata (Keith Busch)
 - report write pointer for a full zone as zone start + zone len
   (Niklas Cassel)
 - fix use after free when disconnecting a reconnecting ctrl
   (Ruozhu Li)
 - fix a list corruption in nvmet-tcp (Sagi Grimberg)"

* tag 'nvme-5.16-2021-12-10' of git://git.infradead.org/nvme:
  nvmet-tcp: fix possible list corruption for unexpected command failure
  nvme: fix use after free when disconnecting a reconnecting ctrl
  nvme-multipath: set ana_log_size to 0 after free ana_log_buf
  nvme: report write pointer for a full zone as zone start + zone len
  nvme: disable namespace access for unsupported metadata
  nvme: show subsys nqn for duplicate cntlids

091f06d9

08 12月, 2021 4 次提交

nvmet-tcp: fix possible list corruption for unexpected command failure · 30e32f30

由 Sagi Grimberg 提交于 3年前

nvmet_tcp_handle_req_failure needs to understand weather to prepare
for incoming data or the next pdu. However if we misidentify this, we
will wait for 0-length data, and queue the response although nvmet_req_init
already did that.

The particular command was namespace management command with no data,
which was incorrectly categorized as a command with incapsule data.

Also, add a code comment of what we are trying to do here.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

30e32f30

block: fix single bio async DIO error handling · 75feae73

由 Pavel Begunkov 提交于 3年前

BUG: KASAN: use-after-free in io_submit_one+0x496/0x2fe0 fs/aio.c:1882
CPU: 2 PID: 15100 Comm: syz-executor873 Not tainted 5.16.0-rc1-syzk #1
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
  [...]
  refcount_dec_and_test include/linux/refcount.h:333 [inline]
  iocb_put fs/aio.c:1161 [inline]
  io_submit_one+0x496/0x2fe0 fs/aio.c:1882
  __do_sys_io_submit fs/aio.c:1938 [inline]
  __se_sys_io_submit fs/aio.c:1908 [inline]
  __x64_sys_io_submit+0x1c7/0x4a0 fs/aio.c:1908
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

__blkdev_direct_IO_async() returns errors from bio_iov_iter_get_pages()
directly, in which case upper layers won't be expecting ->ki_complete
to be called by the block layer and will terminate the request. However,
there is also bio_endio() leading to a second ->ki_complete and a double
free.

Fixes: 54a88eb8 ("block: add single bio async direct IO helper")
Reported-by: NGeorge Kennedy <george.kennedy@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c9eb786f6cef041e159e6287de131bec0719ad5c.1638907997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

75feae73

nvme: fix use after free when disconnecting a reconnecting ctrl · 8b77fa6f

由 Ruozhu Li 提交于 3年前

A crash happens when trying to disconnect a reconnecting ctrl:

 1) The network was cut off when the connection was just established,
    scan work hang there waiting for some IOs complete.  Those I/Os were
    retried because we return BLK_STS_RESOURCE to blk in reconnecting.
 2) After a while, I tried to disconnect this connection.  This
    procedure also hangs because it tried to obtain ctrl->scan_lock.
    It should be noted that now we have switched the controller state
    to NVME_CTRL_DELETING.
 3) In nvme_check_ready(), we always return true when ctrl->state is
    NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom
    device which was already freed.

To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom
device only when queue state is live.  If not, return host path error to
the block layer
Signed-off-by: NRuozhu Li <liruozhu@huawei.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8b77fa6f

nvme-multipath: set ana_log_size to 0 after free ana_log_buf · c7c15ae3

由 Hou Tao 提交于 3年前

Set ana_log_size to 0 when ana_log_buf is freed to make sure
nvme_mpath_init_identify will do the right thing when retrying
after an earlier failure.
Signed-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c7c15ae3

07 12月, 2021 1 次提交

mtd_blkdevs: don't scan partitions for plain mtdblock · 776b54e9

由 Christoph Hellwig 提交于 3年前

mtdblock / mtdblock_ro set part_bits to 0 and thus nevever scanned
partitions. Restore that behavior by setting the GENHD_FL_NO_PART flag.

Fixes: 1ebe2e5f ("block: remove GENHD_FL_EXT_DEVT")
Reported-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20211206070409.2836165-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

776b54e9

06 12月, 2021 3 次提交

nvme: report write pointer for a full zone as zone start + zone len · 793fcab8

由 Niklas Cassel 提交于 3年前

The write pointer in NVMe ZNS is invalid for a zone in zone state full.
The same also holds true for ZAC/ZBC.

The current behavior for NVMe is to simply propagate the wp reported by
the drive, even for full zones. Since the wp is invalid for a full zone,
the wp reported by the drive may be any value.

The way that the sd_zbc driver handles a full zone is to always report
the wp as zone start + zone len, regardless of what the drive reported.
null_blk also follows this convention.

Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write
pointer for a full zone in a consistent way, regardless of the interface
of the underlying zoned block device.

blkzone report before patch:
start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8
reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]

blkzone report after patch:
start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0
non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

793fcab8

nvme: disable namespace access for unsupported metadata · d39ad2a4

由 Keith Busch 提交于 3年前

The only fabrics target that supports metadata handling through the
separate integrity buffer is RDMA. It is currently usable only if the
size is 8B per block and formatted for protection information. If an
rdma target were to export a namespace with a different format (ex:
4k+64B), the driver will not be able to submit valid read/write commands
for that namespace.

Suppress setting the metadata feature in the namespace so that the
gendisk capacity will be set to 0. This will prevent read/write access
through the block stack, but will continue to allow ioctl passthrough
commands.

Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d39ad2a4

nvme: show subsys nqn for duplicate cntlids · 16cc33b2

由 Keith Busch 提交于 3年前

The driver assigned nvme handle isn't persistent across reboots, so is
not enough information to match up where the collisions are occuring.
Add the subsys nqn string to the output so that it can more easily be
identified later.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215099Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

16cc33b2

29 11月, 2021 1 次提交

loop: Use pr_warn_once() for loop_control_remove() warning · e3f9387a

由 Tetsuo Handa 提交于 3年前

kernel test robot reported that RCU stall via printk() flooding is
possible [1] when stress testing.

Link: https://lkml.kernel.org/r/20211129073709.GA18483@xsang-OptiPlex-9020 [1]
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e3f9387a

27 11月, 2021 2 次提交

zram: only make zram_wb_devops for CONFIG_ZRAM_WRITEBACK · d422f401

由 Jens Axboe 提交于 3年前

If writeback isn't configured, then we get the following warning when
compiling zram:

drivers/block/zram/zram_drv.c:1824:45: warning: unused variable 'zram_wb_devops' [-Wunused-const-variable]

Make sure we only define the block_device_operations if that option is
enabled.

Link: https://lore.kernel.org/lkml/202111261614.gCJMqcyh-lkp@intel.com/Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d422f401

block: call rq_qos_done() before ref check in batch completions · 98b26a0e

由 Jens Axboe 提交于 3年前

We need to call rq_qos_done() regardless of whether or not we're freeing
the request or not, as the reference count doesn't cover the IO completion
tracking.

Fixes: f794f335 ("block: add support for blk_mq_end_request_batch()")
Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reported-by: NKenneth R. Crudup <kenny@panix.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

98b26a0e

26 11月, 2021 1 次提交

block: fix parameter not described warning · e30028ac

由 Yang Guang 提交于 3年前

The build warning:
block/blk-core.c:968: warning: Function parameter or member 'iob'
not described in 'bio_poll'.

Fixes: 5a72e899 ("block: add a struct io_comp_batch argument to fops->iopoll()")
Reported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: NYang Guang <yang.guang5@zte.com.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e30028ac

25 11月, 2021 2 次提交

Merge tag 'nvme-5.16-2021-11-25' of git://git.infradead.org/nvme into block-5.16 · 3fd40fa2

由 Jens Axboe 提交于 3年前

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.16

 - add a NO APST quirk for a Kioxia device (Enzo Matsumiya)
 - fix write zeroes pi (Klaus Jensen)
 - various TCP transport fixes (Maurizio Lombardi and Varun Prakash)
 - ignore invalid fast_io_fail_tmo values (Maurizio Lombardi)
 - use IOCB_NOWAIT only if the filesystem supports it (Maurizio Lombardi)"

* tag 'nvme-5.16-2021-11-25' of git://git.infradead.org/nvme:
  nvmet: use IOCB_NOWAIT only if the filesystem supports it
  nvme: fix write zeroes pi
  nvme-fabrics: ignore invalid fast_io_fail_tmo values
  nvme-pci: add NO APST quirk for Kioxia device
  nvme-tcp: fix memory leak when freeing a queue
  nvme-tcp: validate R2T PDU in nvme_tcp_handle_r2t()
  nvmet-tcp: fix incomplete data digest send
  nvmet-tcp: fix memory leak when performing a controller reset
  nvmet-tcp: add an helper to free the cmd buffers
  nvmet-tcp: fix a race condition between release_queue and io_work

3fd40fa2

nvmet: use IOCB_NOWAIT only if the filesystem supports it · c024b226

由 Maurizio Lombardi 提交于 3年前

Submit I/O requests with the IOCB_NOWAIT flag set only if
the underlying filesystem supports it.

Fixes: 50a909db ("nvmet: use IOCB_NOWAIT for file-ns buffered I/O")
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c024b226

24 11月, 2021 9 次提交

nvme: fix write zeroes pi · 00b33cf3

由 Klaus Jensen 提交于 3年前

Write Zeroes sets PRACT when block integrity is enabled (as it should),
but neglects to also set the reftag which is expected by reads. This
causes protection errors on reads.

Fix this by setting the reftag for type 1 and 2 (for type 3, reads will
not check the reftag).
Signed-off-by: NKlaus Jensen <k.jensen@samsung.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

00b33cf3

nvme-fabrics: ignore invalid fast_io_fail_tmo values · 8e8aaf51

由 Maurizio Lombardi 提交于 3年前

Valid fast_io_fail_tmo values are integers >= 0 or -1 (disabled).
Prevent userspace from setting arbitrary negative values.
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8e8aaf51

nvme-pci: add NO APST quirk for Kioxia device · 5a6254d5

由 Enzo Matsumiya 提交于 3年前

This particular Kioxia device times out and aborts I/O during any load,
but it's more easily observable with discards (fstrim).

The device gets to a state that is also not possible to use
"nvme set-feature" to disable APST.
Booting with nvme_core.default_ps_max_latency=0 solves the issue.

We had a dozen or so of these devices behaving this same way in
customer environments.
Signed-off-by: NEnzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5a6254d5

nvme-tcp: fix memory leak when freeing a queue · a5053c92

由 Maurizio Lombardi 提交于 3年前

Release the page frag cache when tearing down the io queues
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a5053c92

nvme-tcp: validate R2T PDU in nvme_tcp_handle_r2t() · 1d3ef9c3

由 Varun Prakash 提交于 3年前

If maxh2cdata < r2t_length then driver will form multiple
H2CData PDUs, validate R2T PDU in nvme_tcp_handle_r2t() to
reuse nvme_tcp_setup_h2c_data_pdu().

Also set req->state to NVME_TCP_SEND_H2C_PDU in
nvme_tcp_setup_h2c_data_pdu().
Signed-off-by: NVarun Prakash <varun@chelsio.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1d3ef9c3

nvmet-tcp: fix incomplete data digest send · 102110ef

由 Varun Prakash 提交于 3年前

Current nvmet_try_send_ddgst() code does not check whether
all data digest bytes are transmitted, fix this by returning
-EAGAIN if all data digest bytes are not transmitted.

Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: NVarun Prakash <varun@chelsio.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

102110ef

nvmet-tcp: fix memory leak when performing a controller reset · af21250b

由 Maurizio Lombardi 提交于 3年前

If a reset controller is executed while the initiator
is performing some I/O the driver may leak the memory allocated
for the commands' iovec.

Make sure that nvmet_tcp_uninit_data_in_cmds() releases
all the memory.
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

af21250b

nvmet-tcp: add an helper to free the cmd buffers · 69b85e1f

由 Maurizio Lombardi 提交于 3年前

Makes the code easier to read and to debug.

Sets the freed pointers to NULL, it will be useful
when destroying the queues to understand if the commands'
buffers have been released already or not.
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

69b85e1f

nvmet-tcp: fix a race condition between release_queue and io_work · a208fc56

由 Maurizio Lombardi 提交于 3年前

If the initiator executes a reset controller operation while
performing I/O, the target kernel will crash because of a race condition
between release_queue and io_work;
nvmet_tcp_uninit_data_in_cmds() may be executed while io_work
is running, calling flush_work() was not sufficient to
prevent this because io_work could requeue itself.

Fix this bug by using cancel_work_sync() to prevent io_work
from requeuing itself and set rcv_state to NVMET_TCP_RECV_ERR to
make sure we don't receive any more data from the socket.
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a208fc56

23 11月, 2021 1 次提交

block: avoid to touch unloaded module instance when opening bdev · efcf5932

由 Ming Lei 提交于 3年前

disk->fops->owner is grabbed in blkdev_get_no_open() after the disk
kobject refcount is increased. This way can't make sure that
disk->fops->owner is still alive since del_gendisk() still can move
on if the kobject refcount of disk is grabbed by open() and
disk->fops->open() isn't called yet.

Fixes the issue by moving try_module_get() into blkdev_get_by_dev()
with ->open_mutex() held, then we can drain the in-progress open()
in del_gendisk(). Meantime new open() won't succeed because disk
becomes not alive.

This way is reasonable because blkdev_get_no_open() needn't to touch
disk->fops or defined callbacks.

Cc: Christoph Hellwig <hch@lst.de>
Cc: czhong@redhat.com
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211111020343.316126-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

efcf5932

19 11月, 2021 2 次提交

blk-mq: don't insert FUA request with data into scheduler queue · 2b504bd4

由 Ming Lei 提交于 3年前

We never insert flush request into scheduler queue before.

Recently commit d92ca9d8 ("blk-mq: don't handle non-flush requests in
blk_insert_flush") tries to handle FUA data request as normal request.
This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
case of kyber since RQF_ELVPRIV isn't set for flush request, then
->finish_request won't be called.

Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert()
when the device supports FUA, just like what we did before.

[1] https://lore.kernel.org/linux-block/CAHj4cs-_vkTW=dAzbZYGxpEWSpzpcmaNeY1R=vH311+9vMUSdg@mail.gmail.com/Reported-by: NYi Zhang <yi.zhang@redhat.com>
Fixes: d92ca9d8 ("blk-mq: don't handle non-flush requests in blk_insert_flush")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20211118153041.2163228-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2b504bd4

blk-cgroup: fix missing put device in error path from blkg_conf_pref() · 15c30104

由 Yu Kuai 提交于 3年前

If blk_queue_enter() failed due to queue is dying, the
blkdev_put_no_open() is needed because blkcg_conf_open_bdev() succeeded.

Fixes: 0c9d338c ("blk-cgroup: synchronize blkg creation against policy deactivation")
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Acked-by: NTejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20211102020705.2321858-1-yukuai3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

15c30104

17 11月, 2021 2 次提交

block: avoid to quiesce queue in elevator_init_mq · 245a489e

由 Ming Lei 提交于 3年前

elevator_init_mq() is only called before adding disk, when there isn't
any FS I/O, only passthrough requests can be queued, so freezing queue
plus canceling dispatch work is enough to drain any dispatch activities,
then we can avoid synchronize_srcu() in blk_mq_quiesce_queue().

Long boot latency issue can be fixed in case of lots of disks added
during booting.

Fixes: 737eb78e ("block: Delay default elevator initialization")
Reported-by: Nyangerkun <yangerkun@huawei.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211117115502.1600950-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

245a489e

Revert "mark pstore-blk as broken" · d1faacbf

由 Kees Cook 提交于 3年前

This reverts commit d07f3b08.

pstore-blk was fixed to avoid the unwanted APIs in commit 7bb9557b
("pstore/blk: Use the normal block device I/O path"), which landed in
the same release as the commit adding BROKEN.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211116181559.3975566-1-keescook@chromium.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

d1faacbf

16 11月, 2021 3 次提交

blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release() · 2a19b28f

由 Ming Lei 提交于 3年前

For avoiding to slow down queue destroy, we don't call
blk_mq_quiesce_queue() in blk_cleanup_queue(), instead of delaying to
cancel dispatch work in blk_release_queue().

However, this way has caused kernel oops[1], reported by Changhui. The log
shows that scsi_device can be freed before running blk_release_queue(),
which is expected too since scsi_device is released after the scsi disk
is closed and the scsi_device is removed.

Fixes the issue by canceling blk-mq dispatch work in both blk_cleanup_queue()
and disk_release():

1) when disk_release() is run, the disk has been closed, and any sync
dispatch activities have been done, so canceling dispatch work is enough to
quiesce filesystem I/O dispatch activity.

2) in blk_cleanup_queue(), we only focus on passthrough request, and
passthrough request is always explicitly allocated & freed by
its caller, so once queue is frozen, all sync dispatch activity
for passthrough request has been done, then it is enough to just cancel
dispatch work for avoiding any dispatch activity.

[1] kernel panic log
[12622.769416] BUG: kernel NULL pointer dereference, address: 0000000000000300
[12622.777186] #PF: supervisor read access in kernel mode
[12622.782918] #PF: error_code(0x0000) - not-present page
[12622.788649] PGD 0 P4D 0
[12622.791474] Oops: 0000 [#1] PREEMPT SMP PTI
[12622.796138] CPU: 10 PID: 744 Comm: kworker/10:1H Kdump: loaded Not tainted 5.15.0+ #1
[12622.804877] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
[12622.813321] Workqueue: kblockd blk_mq_run_work_fn
[12622.818572] RIP: 0010:sbitmap_get+0x75/0x190
[12622.823336] Code: 85 80 00 00 00 41 8b 57 08 85 d2 0f 84 b1 00 00 00 45 31 e4 48 63 cd 48 8d 1c 49 48 c1 e3 06 49 03 5f 10 4c 8d 6b 40 83 f0 01 <48> 8b 33 44 89 f2 4c 89 ef 0f b6 c8 e8 fa f3 ff ff 83 f8 ff 75 58
[12622.844290] RSP: 0018:ffffb00a446dbd40 EFLAGS: 00010202
[12622.850120] RAX: 0000000000000001 RBX: 0000000000000300 RCX: 0000000000000004
[12622.858082] RDX: 0000000000000006 RSI: 0000000000000082 RDI: ffffa0b7a2dfe030
[12622.866042] RBP: 0000000000000004 R08: 0000000000000001 R09: ffffa0b742721334
[12622.874003] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000
[12622.881964] R13: 0000000000000340 R14: 0000000000000000 R15: ffffa0b7a2dfe030
[12622.889926] FS:  0000000000000000(0000) GS:ffffa0baafb40000(0000) knlGS:0000000000000000
[12622.898956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12622.905367] CR2: 0000000000000300 CR3: 0000000641210001 CR4: 00000000001706e0
[12622.913328] Call Trace:
[12622.916055]  <TASK>
[12622.918394]  scsi_mq_get_budget+0x1a/0x110
[12622.922969]  __blk_mq_do_dispatch_sched+0x1d4/0x320
[12622.928404]  ? pick_next_task_fair+0x39/0x390
[12622.933268]  __blk_mq_sched_dispatch_requests+0xf4/0x140
[12622.939194]  blk_mq_sched_dispatch_requests+0x30/0x60
[12622.944829]  __blk_mq_run_hw_queue+0x30/0xa0
[12622.949593]  process_one_work+0x1e8/0x3c0
[12622.954059]  worker_thread+0x50/0x3b0
[12622.958144]  ? rescuer_thread+0x370/0x370
[12622.962616]  kthread+0x158/0x180
[12622.966218]  ? set_kthread_struct+0x40/0x40
[12622.970884]  ret_from_fork+0x22/0x30
[12622.974875]  </TASK>
[12622.977309] Modules linked in: scsi_debug rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs sunrpc dm_multipath intel_rapl_msr intel_rapl_common dell_wmi_descriptor sb_edac rfkill video x86_pkg_temp_thermal intel_powerclamp dcdbas coretemp kvm_intel kvm mgag200 irqbypass i2c_algo_bit rapl drm_kms_helper ipmi_ssif intel_cstate intel_uncore syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr cec mei_me lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg ixgbe ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata megaraid_sas ghash_clmulni_intel tg3 wdat_wdt mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
Reported-by: NChanghuiZhong <czhong@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211116014343.610501-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2a19b28f

block: fix missing queue put in error path · 95febeb6

由 Jens Axboe 提交于 3年前

If we fail the submission queue checks, we don't put the queue afterwards.
This can cause various issues like stalls on scheduler switch or failure
to remove the device, or like in the original bug report, timeout waiting
for the device on reboot/restart.

While in there, fix a few whitespace discrepancies in the surrounding
code.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215039
Fixes: b637108a ("blk-mq: fix filesystem I/O request allocation")
Reported-and-tested-by: NStephen Smith <stephenmsmith@blueyonder.co.uk>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95febeb6

block: Check ADMIN before NICE for IOPRIO_CLASS_RT · 94c4b4fd

由 Alistair Delva 提交于 3年前

Booting to Android userspace on 5.14 or newer triggers the following
SELinux denial:

avc: denied { sys_nice } for comm="init" capability=23
     scontext=u:r:init:s0 tcontext=u:r:init:s0 tclass=capability
     permissive=0

Init is PID 0 running as root, so it already has CAP_SYS_ADMIN. For
better compatibility with older SEPolicy, check ADMIN before NICE.

Fixes: 9d3a39a5 ("block: grant IOPRIO_CLASS_RT to CAP_SYS_NICE")
Signed-off-by: NAlistair Delva <adelva@google.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: selinux@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: stable@vger.kernel.org # v5.14+
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NSerge Hallyn <serge@hallyn.com>
Link: https://lore.kernel.org/r/20211115181655.3608659-1-adelva@google.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

94c4b4fd

15 11月, 2021 3 次提交

L

Linux 5.16-rc1 · fa55b7dc
由 Linus Torvalds 提交于 3年前

fa55b7dc

kconfig: Add support for -Wimplicit-fallthrough · dee2b702

由 Gustavo A. R. Silva 提交于 3年前

Add Kconfig support for -Wimplicit-fallthrough for both GCC and Clang.

The compiler option is under configuration CC_IMPLICIT_FALLTHROUGH,
which is enabled by default.

Special thanks to Nathan Chancellor who fixed the Clang bug[1][2]. This
bugfix only appears in Clang 14.0.0, so older versions still contain
the bug and -Wimplicit-fallthrough won't be enabled for them, for now.

This concludes a long journey and now we are finally getting rid
of the unintentional fallthrough bug-class in the kernel, entirely. :)

Link: https://github.com/llvm/llvm-project/commit/9ed4a94d6451046a51ef393cd62f00710820a7e8 [1]
Link: https://bugs.llvm.org/show_bug.cgi?id=51094 [2]
Link: https://github.com/KSPP/linux/issues/115
Link: https://github.com/ClangBuiltLinux/linux/issues/236Co-developed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Co-developed-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: NNathan Chancellor <nathan@kernel.org>
Tested-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dee2b702

Merge tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · ce49bfc8

由 Linus Torvalds 提交于 3年前

Pull xfs cleanups from Darrick Wong:
 "The most 'exciting' aspect of this branch is that the xfsprogs
  maintainer and I have worked through the last of the code
  discrepancies between kernel and userspace libxfs such that there are
  no code differences between the two except for #includes.

  IOWs, diff suffices to demonstrate that the userspace tools behave the
  same as the kernel, and kernel-only bits are clearly marked in the
  /kernel/ source code instead of just the userspace source.

  Summary:

   - Clean up open-coded swap() calls.

   - A little bit of #ifdef golf to complete the reunification of the
     kernel and userspace libxfs source code"

* tag 'xfs-5.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: sync xfs_btree_split macros with userspace libxfs
  xfs: #ifdef out perag code for userspace
  xfs: use swap() to make dabtree code cleaner

ce49bfc8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功