提交 · df24560d058d11f02b7493bdfc553131ef60b23d · openeuler / Kernel

09 11月, 2022 3 次提交

由 Sagi Grimberg 提交于 11月 09, 2022

We need to also free the dhchap_ctrl_secret when releasing nvmet_host.
kmemleak complaint:
--
unreferenced object 0xffff99b1cbca5140 (size 64):
  comm "check", pid 4864, jiffies 4305092436 (age 2913.583s)
  hex dump (first 32 bytes):
    44 48 48 43 2d 31 3a 30 30 3a 65 36 2b 41 63 44  DHHC-1:00:e6+AcD
    39 76 47 4d 52 57 59 78 67 54 47 44 51 59 47 78  9vGMRWYxgTGDQYGx
  backtrace:
    [<00000000c07d369d>] kstrdup+0x2e/0x60
    [<000000001372171c>] 0xffffffffc0cceec6
    [<0000000010dbf50b>] 0xffffffffc0cc6783
    [<000000007465e93c>] configfs_write_iter+0xb1/0x120
    [<0000000039c23f62>] vfs_write+0x2be/0x3c0
    [<000000002da4351c>] ksys_write+0x5f/0xe0
    [<00000000d5011e32>] do_syscall_64+0x38/0x90
    [<00000000503870cf>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: db1312dd ("nvmet: implement basic In-Band Authentication")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e65fdf53

nvmet: fix memory leak in nvmet_subsys_attr_model_store_locked · becc4cac

由 Aleksandr Miloserdov 提交于 10月 26, 2022

Since model_number is allocated before it needs to be freed before
kmemdump_nul.
Reviewed-by: NKonstantin Shelekhin <k.shelekhin@yadro.com>
Reviewed-by: NDmitriy Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: NAleksandr Miloserdov <a.miloserdov@yadro.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

becc4cac

nvme: quiet user passthrough command errors · d7ac8dca

由 Keith Busch 提交于 10月 28, 2022

The driver is spamming the kernel logs for entirely harmless errors from
user space submitting unsupported commands. Just silence the errors.
The application has direct access to command status, so there's no need
to log these.

And since every passthrough command now uses the quiet flag, move the
setting to the common initializer.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NAlan Adamson <alan.adamson@oracle.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Tested-by: NAlan Adamson <alan.adamson@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d7ac8dca

31 10月, 2022 4 次提交

ublk_drv: add ublk_queue_cmd() for cleanup · fee32f31

由 Ming Lei 提交于 10月 29, 2022

Add helper of ublk_queue_cmd() so that both ublk_queue_rq()
and ublk_handle_need_get_data() can reuse this helper.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NZiyangZhang <ZiyangZhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221029010432.598367-5-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

fee32f31

ublk_drv: avoid to touch io_uring cmd in blk_mq io path · 3ab6e94c

由 Ming Lei 提交于 10月 29, 2022

io_uring cmd is supposed to be used in ubq daemon context mainly,
and we should try to avoid to touch it in ublk io submission context,
otherwise this data could become shared between the two contexts,
and performance is hurt.

So link request into one per-queue list, and use same batching policy
of io_uring command, just avoid to touch ucmd in blk-mq io context.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NZiyangZhang <ZiyangZhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221029010432.598367-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3ab6e94c

ublk_drv: comment on ublk_driver entry of Kconfig · d57c2c6c

由 Ming Lei 提交于 10月 29, 2022

Add help info for choosing to build ublk_drv as module or builtin.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NZiyangZhang <ZiyangZhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221029010432.598367-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d57c2c6c

ublk_drv: return flag of UBLK_F_URING_CMD_COMP_IN_TASK in case of module · 224e858f

由 Ming Lei 提交于 10月 29, 2022

UBLK_F_URING_CMD_COMP_IN_TASK needs to be set and returned to userspace
if ublk driver is built as module, otherwise userspace may get wrong
flags shown.

Fixes: 71f28f31 ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NZiyangZhang <ZiyangZhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221029010432.598367-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

224e858f

27 10月, 2022 1 次提交

rbd: fix possible memory leak in rbd_sysfs_init() · 7f21735f

由 Yang Yingliang 提交于 10月 27, 2022

If device_register() returns error in rbd_sysfs_init(), name of kobject
which is allocated in dev_set_name() called in device_add() is leaked.

As comment of device_add() says, it should call put_device() to drop
the reference count that was set in device_initialize() when it fails,
so the name can be freed in kobject_cleanup().

Fault injection test can trigger this problem:

unreferenced object 0xffff88810173aa78 (size 8):
  comm "modprobe", pid 247, jiffies 4294714278 (age 31.789s)
  hex dump (first 8 bytes):
    72 62 64 00 81 88 ff ff                          rbd.....
  backtrace:
    [<00000000f58fae56>] __kmalloc_node_track_caller+0x44/0x1b0
    [<00000000bdd44fe7>] kstrdup+0x3a/0x70
    [<00000000f7844d0b>] kstrdup_const+0x63/0x80
    [<000000001b0a0eeb>] kvasprintf_const+0x10b/0x190
    [<00000000a47bd894>] kobject_set_name_vargs+0x56/0x150
    [<00000000d5edbf18>] dev_set_name+0xab/0xe0
    [<00000000f5153e80>] device_add+0x106/0x1f20

Fixes: dfc5606d ("rbd: replace the rbd sysfs interface")
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20221027091918.2294132-1-yangyingliang@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7f21735f

25 10月, 2022 3 次提交

nvme-multipath: set queue dma alignment to 3 · fe8714b0

由 Keith Busch 提交于 10月 24, 2022

NVMe spec requires all transports support dword aligned addresses, which
is already set in the namespace request_queue. Set the same limit in the
multipath device's request_queue as well.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

fe8714b0

nvme-tcp: fix possible circular locking when deleting a controller under memory pressure · 83e1226b

由 Sagi Grimberg 提交于 10月 23, 2022

When destroying a queue, when calling sock_release, the network stack
might need to allocate an skb to send a FIN/RST. When that happens
during memory pressure, there is a need to reclaim memory, which
in turn may ask the nvme-tcp device to write out dirty pages, however
this is not possible due to a ctrl teardown that is going on.

Set PF_MEMALLOC to the task that releases the socket to grant access
to PF_MEMALLOC reserves. In addition, do the same for the nvme-tcp
thread as this may also originate from the swap itself and should
be more resilient to memory pressure situations.

This fixes the following lockdep complaint:
--
======================================================
 WARNING: possible circular locking dependency detected
 6.0.0-rc2+ #25 Tainted: G        W
 ------------------------------------------------------
 kswapd0/92 is trying to acquire lock:
 ffff888114003240 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: tcp_sendpage+0x23/0xa0

 but task is already holding lock:
 ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (fs_reclaim){+.+.}-{0:0}:
        fs_reclaim_acquire+0x11e/0x160
        kmem_cache_alloc_node+0x44/0x530
        __alloc_skb+0x158/0x230
        tcp_send_active_reset+0x7e/0x730
        tcp_disconnect+0x1272/0x1ae0
        __tcp_close+0x707/0xd90
        tcp_close+0x26/0x80
        inet_release+0xfa/0x220
        sock_release+0x85/0x1a0
        nvme_tcp_free_queue+0x1fd/0x470 [nvme_tcp]
        nvme_do_delete_ctrl+0x130/0x13d [nvme_core]
        nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
        kernfs_fop_write_iter+0x356/0x530
        vfs_write+0x4e8/0xce0
        ksys_write+0xfd/0x1d0
        do_syscall_64+0x58/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd

 -> #0 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
        __lock_acquire+0x2a0c/0x5690
        lock_acquire+0x18e/0x4f0
        lock_sock_nested+0x37/0xc0
        tcp_sendpage+0x23/0xa0
        inet_sendpage+0xad/0x120
        kernel_sendpage+0x156/0x440
        nvme_tcp_try_send+0x48a/0x2630 [nvme_tcp]
        nvme_tcp_queue_rq+0xefb/0x17e0 [nvme_tcp]
        __blk_mq_try_issue_directly+0x452/0x660
        blk_mq_plug_issue_direct.constprop.0+0x207/0x700
        blk_mq_flush_plug_list+0x6f5/0xc70
        __blk_flush_plug+0x264/0x410
        blk_finish_plug+0x4b/0xa0
        shrink_lruvec+0x1263/0x1ea0
        shrink_node+0x736/0x1a80
        balance_pgdat+0x740/0x10d0
        kswapd+0x5f2/0xaf0
        kthread+0x256/0x2f0
        ret_from_fork+0x1f/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(sk_lock-AF_INET-NVME);
                               lock(fs_reclaim);
  lock(sk_lock-AF_INET-NVME);

 *** DEADLOCK ***

3 locks held by kswapd0/92:
 #0: ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0
 #1: ffff88811f21b0b0 (q->srcu){....}-{0:0}, at: blk_mq_flush_plug_list+0x6b3/0xc70
 #2: ffff888170b11470 (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0xeb9/0x17e0 [nvme_tcp]

Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
Reported-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

83e1226b

nvme-tcp: replace sg_init_marker() with sg_init_table() · 5fa9add6

由 Nam Cao 提交于 10月 22, 2022

In nvme_tcp_ddgst_update(), sg_init_marker() is called with an
uninitialized scatterlist. This is probably fine, but gcc complains:

  CC [M]  drivers/nvme/host/tcp.o
In file included from ./include/linux/dma-mapping.h:10,
                 from ./include/linux/skbuff.h:31,
                 from ./include/net/net_namespace.h:43,
                 from ./include/linux/netdevice.h:38,
                 from ./include/net/sock.h:46,
                 from drivers/nvme/host/tcp.c:12:
In function ‘sg_mark_end’,
    inlined from ‘sg_init_marker’ at ./include/linux/scatterlist.h:356:2,
    inlined from ‘nvme_tcp_ddgst_update’ at drivers/nvme/host/tcp.c:390:2:
./include/linux/scatterlist.h:234:11: error: ‘sg.page_link’ is used uninitialized [-Werror=uninitialized]
  234 |         sg->page_link |= SG_END;
      |         ~~^~~~~~~~~~~
drivers/nvme/host/tcp.c: In function ‘nvme_tcp_ddgst_update’:
drivers/nvme/host/tcp.c:388:28: note: ‘sg’ declared here
  388 |         struct scatterlist sg;
      |                            ^~
cc1: all warnings being treated as errors

Use sg_init_table() instead, which basically memset the scatterlist to
zero first before calling sg_init_marker().
Signed-off-by: NNam Cao <namcaov@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5fa9add6

20 10月, 2022 2 次提交

drbd: only clone bio if we have a backing device · 6d42ddf7

由 Christoph Böhmwalder 提交于 10月 20, 2022

Commit c347a787 (drbd: set ->bi_bdev in drbd_req_new) moved a
bio_set_dev call (which has since been removed) to "earlier", from
drbd_request_prepare to drbd_req_new.

The problem is that this accesses device->ldev->backing_bdev, which is
not NULL-checked at this point. When we don't have an ldev (i.e. when
the DRBD device is diskless), this leads to a null pointer deref.

So, only allocate the private_bio if we actually have a disk. This is
also a small optimization, since we don't clone the bio to only to
immediately free it again in the diskless case.

Fixes: c347a787 ("drbd: set ->bi_bdev in drbd_req_new")
Co-developed-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
Co-developed-by: NJoel Colledge <joel.colledge@linbit.com>
Signed-off-by: NJoel Colledge <joel.colledge@linbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221020085205.129090-1-christoph.boehmwalder@linbit.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6d42ddf7

ublk_drv: use flexible-array member instead of zero-length array · 72495b5a

由 Yushan Zhou 提交于 10月 18, 2022

Eliminate the following coccicheck warning:
./drivers/block/ublk_drv.c:127:16-19: WARNING use flexible-array member instead
Signed-off-by: NYushan Zhou <katrinzhou@tencent.com>
Link: https://lore.kernel.org/r/20221018100132.355393-1-zys.zljxml@gmail.comReviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

72495b5a

19 10月, 2022 7 次提交

nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show · 94f5a068

由 Daniel Wagner 提交于 10月 07, 2022

The item passed into nvmet_subsys_attr_qid_max_show is not a member of
struct nvmet_port, it is part of nvmet_subsys. Hence, don't try to
dereference it as struct nvme_ctrl pointer.

Fixes: 3e980f59 ("nvmet: Expose max queues to configfs")
Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20220913064203.133536-1-dwagner@suse.deSigned-off-by: NDaniel Wagner <dwagner@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

94f5a068

nvmet: fix workqueue MEM_RECLAIM flushing dependency · ddd2b8de

由 Sagi Grimberg 提交于 9月 28, 2022

The keep alive timer needs to stay on nvmet_wq, and not
modified to reschedule on the system_wq.

This fixes a warning:
------------[ cut here ]------------
workqueue: WQ_MEM_RECLAIM
nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing
!WQ_MEM_RECLAIM events:nvmet_keep_alive_timer [nvmet]
WARNING: CPU: 3 PID: 1086 at kernel/workqueue.c:2628
check_flush_dependency+0x16c/0x1e0
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Fixes: 8832cf92 ("nvmet: use a private workqueue instead of the system workqueue")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ddd2b8de

nvme-hwmon: kmalloc the NVME SMART log buffer · c94b7f9b

由 Serge Semin 提交于 10月 18, 2022

Recent commit 52fde2c0 ("nvme: set dma alignment to dword") has
caused a regression on our platform.

It turned out that the nvme_get_log() method invocation caused the
nvme_hwmon_data structure instance corruption. In particular the
nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
garbage. After some research we discovered that the problem happened
even before the actual NVME DMA execution, but during the buffer mapping.
Since our platform is DMA-noncoherent, the mapping implied the cache-line
invalidations or write-backs depending on the DMA-direction parameter.
In case of the NVME SMART log getting the DMA was performed
from-device-to-memory, thus the cache-invalidation was activated during
the buffer mapping. Since the log-buffer isn't cache-line aligned, the
cache-invalidation caused the neighbour data to be discarded. The
neighbouring data turned to be the data surrounding the buffer in the
framework of the nvme_hwmon_data structure.

In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. One of
the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
rest of the NVME core driver prefer that method it has been chosen to fix
this problem too.

Note after a deeper researches we found out that the denoted commit wasn't
a root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.

[1] Documentation/core-api/dma-api-howto.rst

Fixes: 400b6a7b ("nvme: Add hardware monitoring support")
Signed-off-by: NSerge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c94b7f9b

nvme-hwmon: consistently ignore errors from nvme_hwmon_init · 6b8cf940

由 Christoph Hellwig 提交于 10月 18, 2022

An NVMe controller works perfectly fine even when the hwmon
initialization fails.  Stop returning errors that do not come from a
controller reset from nvme_hwmon_init to handle this case consistently.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Reviewed-by: NSerge Semin <fancer.lancer@gmail.com>

6b8cf940

nvme-apple: don't limit DMA segement size · d622f847

由 Russell King (Oracle) 提交于 10月 12, 2022

NVMe uses PRPs for data transfers and has no specific limit for a single
DMA segement.  Limiting the size will cause problems because the block
layer assumes PRP-ish devices using a virt boundary mask don't have a
segment limit.  And while this is true, we also really need to tell the
DMA mapping layer about it, otherwise dma-debug will trip over it.

Fixes: 5bd2927a ("nvme-apple: Add initial Apple SoC NVMe driver")
Suggested-by: NSven Peter <sven@svenpeter.dev>
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
[hch: rewrote the commit message based on the PCIe commit]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Curtin <ecurtin@redhat.com>
Reviewed-by: NSven Peter <sven@svenpeter.dev>

d622f847

nvme-pci: disable write zeroes on various Kingston SSD · ac9b57d4

由 Xander Li 提交于 10月 11, 2022

Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to
process.  The firmware version is locked by these SSDs, we can not expect
firmware improvement, so disable Write_Zeroes cmd.
Signed-off-by: NXander Li <xander_li@kingston.com.tw>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ac9b57d4

nvme: fix error pointer dereference in error handling · 4739824e

由 Dan Carpenter 提交于 10月 15, 2022

There is typo here so it releases the wrong variable.  "ctrl->admin_q"
was intended instead of "ctrl->fabrics_q".

Fixes: fe60e8c5 ("nvme: add common helpers to allocate and free tagsets")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4739824e

12 10月, 2022 5 次提交

nvme-multipath: fix possible hang in live ns resize with ANA access · 72e3b888

由 Sagi Grimberg 提交于 9月 29, 2022

When we revalidate paths as part of ns size change (as of commit
e7d65803), it is possible that during the path revalidation, the
only paths that is IO capable (i.e. optimized/non-optimized) are the
ones that ns resize was not yet informed to the host, which will cause
inflight requests to be requeued (as we have available paths but none
are IO capable). These requests on the requeue list are waiting for
someone to resubmit them at some point.

The IO capable paths will eventually notify the ns resize change to the
host, but there is nothing that will kick the requeue list to resubmit
the queued requests.

Fix this by always kicking the requeue list, and if no IO capable path
exists, these requests will be queued again.

A typical log that indicates that IOs are requeued:
--
nvme nvme1: creating 4 I/O queues.
nvme nvme1: new ctrl: "testnqn1"
nvme nvme2: creating 4 I/O queues.
nvme nvme2: mapped 4/0/0 default/read/poll queues.
nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
nvme nvme1: rescanning namespaces.
nvme1n1: detected capacity change from 2097152 to 4194304
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
nvme nvme2: rescanning namespaces.
--
Reported-by: NYogev Cohen <yogev@lightbitslabs.com>
Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: NChristoph Hellwig <hch@lst.de>

72e3b888

nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs · d5d3c100

由 Xi Ruoyao 提交于 9月 28, 2022

ZHITAI TiPro5000 SSDs has the same APST sleep problem as its cousin,
TiPro7000.  The quirk for TiPro7000 has been added in
commit 6b961bce ("nvme-pci: avoid the deepest sleep state on
ZHITAI TiPro7000 SSDs"), use the same quirk for TiPro5000.

The ASPT data from "nvme id-ctrl /dev/nvme1":

vid       : 0x1e49
ssvid     : 0x1e49
sn        : ZTA21T0KA2227304LM
mn        : ZHITAI TiPlus5000 1TB
fr        : ZTA09139
[...]
ps    0 : mp:6.50W operational enlat:0 exlat:0 rrt:0 rrl:0
         rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:5.80W operational enlat:0 exlat:0 rrt:1 rrl:1
         rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:3.60W operational enlat:0 exlat:0 rrt:2 rrl:2
         rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0500W non-operational enlat:5000 exlat:10000 rrt:3 rrl:3
         rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0025W non-operational enlat:8000 exlat:45000 rrt:4 rrl:4
         rwt:4 rwl:4 idle_power:- active_power:-
Reported-and-tested-by: NChang Feng <flukehn@gmail.com>
Signed-off-by: NXi Ruoyao <xry111@xry111.site>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d5d3c100

nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760 · 80b26240

由 Abhijit 提交于 10月 10, 2022

Add a quirk to fix Lexar NM760 SSD drives reporting duplicate nsids.
Signed-off-by: NAbhijit <abhijit@abhijittomar.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

80b26240

nvme-tcp: fix possible hang caused during ctrl deletion · c4abd875

由 Sagi Grimberg 提交于 9月 28, 2022

When we delete a controller, we execute the following:
1. nvme_stop_ctrl() - stop some work elements that may be
	inflight or scheduled (specifically also .stop_ctrl
	which cancels ctrl error recovery work)
2. nvme_remove_namespaces() - which first flushes scan_work
	to avoid competing ns addition/removal
3. continue to teardown the controller

However, if err_work was scheduled to run in (1), it is designed to
cancel any inflight I/O, particularly I/O that is originating from ns
scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
prevent forward progress of (2) as ns scanning is blocking on I/O
(that will never be cancelled).

The race is:
1. transport layer error observed -> err_work is scheduled
2. scan_work executes, discovers ns, generate I/O to it
3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
   - err_work never executed
4. nvme_remove_namespaces() -> flush_work(scan_work)
--> deadlock, because scan_work is blocked on I/O that was supposed
to be cancelled by err_work, but was cancelled before executing (see
stack trace [1]).

Fix this by flushing err_work instead of cancelling it, to force it
to execute and cancel all inflight I/O.

[1]:
--
Call Trace:
 <TASK>
 __schedule+0x390/0x910
 ? scan_shadow_nodes+0x40/0x40
 schedule+0x55/0xe0
 io_schedule+0x16/0x40
 do_read_cache_page+0x55d/0x850
 ? __page_cache_alloc+0x90/0x90
 read_cache_page+0x12/0x20
 read_part_sector+0x3f/0x110
 amiga_partition+0x3d/0x3e0
 ? osf_partition+0x33/0x220
 ? put_partition+0x90/0x90
 bdev_disk_changed+0x1fe/0x4d0
 blkdev_get_whole+0x7b/0x90
 blkdev_get_by_dev+0xda/0x2d0
 device_add_disk+0x356/0x3b0
 nvme_mpath_set_live+0x13c/0x1a0 [nvme_core]
 ? nvme_parse_ana_log+0xae/0x1a0 [nvme_core]
 nvme_update_ns_ana_state+0x3a/0x40 [nvme_core]
 nvme_mpath_add_disk+0x120/0x160 [nvme_core]
 nvme_alloc_ns+0x594/0xa00 [nvme_core]
 nvme_validate_or_alloc_ns+0xb9/0x1a0 [nvme_core]
 ? __nvme_submit_sync_cmd+0x1d2/0x210 [nvme_core]
 nvme_scan_work+0x281/0x410 [nvme_core]
 process_one_work+0x1be/0x380
 worker_thread+0x37/0x3b0
 ? process_one_work+0x380/0x380
 kthread+0x12d/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x1f/0x30
 </TASK>
INFO: task nvme:6725 blocked for more than 491 seconds.
      Not tainted 5.15.65-f0.el7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:nvme            state:D
 stack:    0 pid: 6725 ppid:  1761 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x390/0x910
 ? sched_clock+0x9/0x10
 schedule+0x55/0xe0
 schedule_timeout+0x24b/0x2e0
 ? try_to_wake_up+0x358/0x510
 ? finish_task_switch+0x88/0x2c0
 wait_for_completion+0xa5/0x110
 __flush_work+0x144/0x210
 ? worker_attach_to_pool+0xc0/0xc0
 flush_work+0x10/0x20
 nvme_remove_namespaces+0x41/0xf0 [nvme_core]
 nvme_do_delete_ctrl+0x47/0x66 [nvme_core]
 nvme_sysfs_delete.cold.96+0x8/0xd [nvme_core]
 dev_attr_store+0x14/0x30
 sysfs_kf_write+0x38/0x50
 kernfs_fop_write_iter+0x146/0x1d0
 new_sync_write+0x114/0x1b0
 ? intel_pmu_handle_irq+0xe0/0x420
 vfs_write+0x18d/0x270
 ksys_write+0x61/0xe0
 __x64_sys_write+0x1a/0x20
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
--

Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
Reported-by: NJonathan Nicklin <jnicklin@blockbridge.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NJonathan Nicklin <jnicklin@blockbridge.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c4abd875

nvme-rdma: fix possible hang caused during ctrl deletion · a1ae8d4d

由 Sagi Grimberg 提交于 9月 28, 2022

When we delete a controller, we execute the following:
1. nvme_stop_ctrl() - stop some work elements that may be
        inflight or scheduled (specifically also .stop_ctrl
        which cancels ctrl error recovery work)
2. nvme_remove_namespaces() - which first flushes scan_work
        to avoid competing ns addition/removal
3. continue to teardown the controller

However, if err_work was scheduled to run in (1), it is designed to
cancel any inflight I/O, particularly I/O that is originating from ns
scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
prevent forward progress of (2) as ns scanning is blocking on I/O
(that will never be cancelled).

The race is:
1. transport layer error observed -> err_work is scheduled
2. scan_work executes, discovers ns, generate I/O to it
3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
   - err_work never executed
4. nvme_remove_namespaces() -> flush_work(scan_work)
--> deadlock, because scan_work is blocked on I/O that was supposed
to be cancelled by err_work, but was cancelled before executing.

Fix this by flushing err_work instead of cancelling it, to force it
to execute and cancel all inflight I/O.

Fixes: b435ecea ("nvme: Add .stop_ctrl to nvme ctrl ops")
Fixes: f6c8e432 ("nvme: flush namespace scanning work just before removing namespaces")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a1ae8d4d

07 10月, 2022 1 次提交

Revert "drm/sched: Use parent fence instead of finished" · bafaf67c

由 Dave Airlie 提交于 10月 07, 2022

This reverts commit e4dc45b1.

This is causing instability on Linus' desktop, and I'm seeing
oops with VK CTS runs.

netconsole got me the following oops:
[ 1234.778760] BUG: kernel NULL pointer dereference, address: 0000000000000088
[ 1234.778782] #PF: supervisor read access in kernel mode
[ 1234.778787] #PF: error_code(0x0000) - not-present page
[ 1234.778791] PGD 0 P4D 0
[ 1234.778798] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1234.778803] CPU: 7 PID: 805 Comm: systemd-journal Not tainted 6.0.0+ #2
[ 1234.778809] Hardware name: System manufacturer System Product
Name/PRIME X370-PRO, BIOS 5603 07/28/2020
[ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]
[ 1234.778828] Code: aa 0f 1d ce e9 57 ff ff ff 48 89 d7 e8 9d 8f 3f
ce e9 4a ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53
48 89 fb <48> 8b af 88 00 00 00 f0 ff 8d f0 00 00 00 48 8b 85 80 01 00
00 f0
[ 1234.778834] RSP: 0000:ffffabe680380de0 EFLAGS: 00010087
[ 1234.778839] RAX: ffffffffc04e9230 RBX: 0000000000000000 RCX: 0000000000000018
[ 1234.778897] RDX: 00000ba278e8977a RSI: ffff953fb288b460 RDI: 0000000000000000
[ 1234.778901] RBP: ffff953fb288b598 R08: 00000000000000e0 R09: ffff953fbd98b808
[ 1234.778905] R10: 0000000000000000 R11: ffffabe680380ff8 R12: ffffabe680380e00
[ 1234.778908] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff953fbd9ec458
[ 1234.778912] FS:  00007f35e7008580(0000) GS:ffff95428ebc0000(0000)
knlGS:0000000000000000
[ 1234.778916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1234.778919] CR2: 0000000000000088 CR3: 000000010147c000 CR4: 00000000003506e0
[ 1234.778924] Call Trace:
[ 1234.778981]  <IRQ>
[ 1234.778989]  dma_fence_signal_timestamp_locked+0x6a/0xe0
[ 1234.778999]  dma_fence_signal+0x2c/0x50
[ 1234.779005]  amdgpu_fence_process+0xc8/0x140 [amdgpu]
[ 1234.779234]  sdma_v3_0_process_trap_irq+0x70/0x80 [amdgpu]
[ 1234.779395]  amdgpu_irq_dispatch+0xa9/0x1d0 [amdgpu]
[ 1234.779609]  amdgpu_ih_process+0x80/0x100 [amdgpu]
[ 1234.779783]  amdgpu_irq_handler+0x1f/0x60 [amdgpu]
[ 1234.779940]  __handle_irq_event_percpu+0x46/0x190
[ 1234.779946]  handle_irq_event+0x34/0x70
[ 1234.779949]  handle_edge_irq+0x9f/0x240
[ 1234.779954]  __common_interrupt+0x66/0x100
[ 1234.779960]  common_interrupt+0xa0/0xc0
[ 1234.779965]  </IRQ>
[ 1234.779968]  <TASK>
[ 1234.779971]  asm_common_interrupt+0x22/0x40
[ 1234.779976] RIP: 0010:finish_mkwrite_fault+0x22/0x110
[ 1234.779981] Code: 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 41
54 55 48 89 fd 53 48 8b 07 f6 40 50 08 0f 84 eb 00 00 00 48 8b 45 30
48 8b 18 <48> 89 df e8 66 bd ff ff 48 85 c0 74 0d 48 89 c2 83 e2 01 48
83 ea
[ 1234.779985] RSP: 0000:ffffabe680bcfd78 EFLAGS: 00000202

Revert it for now and figure it out later.
Signed-off-by: NDave Airlie <airlied@redhat.com>

bafaf67c

06 10月, 2022 12 次提交

mailbox: qcom-ipcc: flag IRQ NO_THREAD · b8ae88e1

由 Eric Chanudet 提交于 10月 03, 2022

PREEMPT_RT forces qcom-ipcc's handler to be threaded with interrupts
enabled, which triggers a warning in __handle_irq_event_percpu():
irq 173 handler irq_default_primary_handler+0x0/0x10 enabled interrupts
WARNING: CPU: 0 PID: 77 at kernel/irq/handle.c:161 __handle_irq_event_percpu+0x4c4/0x4d0

Mark it IRQF_NO_THREAD to avoid running the handler in a threaded
context with threadirqs or PREEMPT_RT enabled.
Signed-off-by: NEric Chanudet <echanude@redhat.com>
Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

b8ae88e1

mailbox: pcc: Fix spelling mistake "Plaform" -> "Platform" · 8ac11110

由 Colin Ian King 提交于 9月 28, 2022

There is a spelling mistake in a pr_err message. Fix it.
Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

8ac11110

mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg · 6b207ce8

由 Jack Wang 提交于 8月 26, 2022

dma_map_sg return 0 on error, fix the error check, and return -EIO
to caller.

Fixes: dbc049ee ("mailbox: Add driver for Broadcom FlexRM ring manager")
Signed-off-by: NJack Wang <jinpu.wang@ionos.com>
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

6b207ce8

mailbox: qcom-apcs-ipc: add IPQ8074 APSS clock support · f5fe925d

由 Robert Marko 提交于 8月 19, 2022

IPQ8074 has the APSS clock controller utilizing the same register space as
the APCS, so provide access to the APSS utilizing a child device like
IPQ6018.

IPQ6018 and IPQ8074 use the same controller and driver, so just utilize
IPQ6018 match data for IPQ8074.
Signed-off-by: NRobert Marko <robimarko@gmail.com>
Reviewed-by: NDmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

f5fe925d

mailbox: mpfs: account for mbox offsets while sending · 0d1aadfe

由 Conor Dooley 提交于 8月 24, 2022

The mailbox offset is not only used for receiving messages, but it is
also used by messages sent to the system controller by Linux that have a
payload, such as the "digital signature service". It is also overloaded
by certain other services (reprogramming of the FPGA fabric, see Link:)
to have a meaning other than the offset the system controller should
read from.
When the driver was written, no such services of the latter type were
in use & those of the former used an offset of zero so this has gone
un-noticed.

Link: https://www.microsemi.com/document-portal/doc_download/1245815-polarfire-fpga-and-polarfire-soc-fpga-system-services-user-guide # Section 5.2
Fixes: 83d7b156 ("mbox: add polarfire soc system controller mailbox")
Signed-off-by: NConor Dooley <conor.dooley@microchip.com>
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

0d1aadfe

mailbox: mpfs: fix handling of the reg property · 2e10289d

由 Conor Dooley 提交于 8月 24, 2022

The "data" region of the PolarFire SoC's system controller mailbox is
not one continuous register space - the system controller's QSPI sits
between the control and data registers. Split the "data" reg into two
parts: "data" & "control". Optionally get the "data" register address
from the 3rd reg property in the devicetree & fall back to using the
old base + MAILBOX_REG_OFFSET that the current code uses.

Fixes: 83d7b156 ("mbox: add polarfire soc system controller mailbox")
Signed-off-by: NConor Dooley <conor.dooley@microchip.com>
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

2e10289d

mailbox: imx: fix RST channel support · 7e5cd064

由 Peng Fan 提交于 9月 19, 2022

Because IMX_MU_xCR_MAX was increased to 5, some mu cfgs were not updated
to include the CR register. Add the missed CR register to xcr array.

Fixes: 82ab513b ("mailbox: imx: support RST channel")
Reported-by: NLiu Ying <victor.liu@nxp.com>
Signed-off-by: NPeng Fan <peng.fan@nxp.com>
Tested-by: Liu Ying <victor.liu@nxp.com> # i.MX8qm/qxp MEK boards boot
Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>

7e5cd064

power: supply: ab8500: remove unused static local variable · 189a2aae

由 Tom Rix 提交于 7月 18, 2022

cpp_check reports
[drivers/power/supply/ab8500_chargalg.c:493]: (style) Variable 'ab8500_chargalg_ex_ac_enable_toggle' is assigned a value that is never used.

From inspection, this variable is never used. So remove it.

Fixes: 6c50a08d ("power: supply: ab8500: Drop external charger leftovers")
Signed-off-by: NTom Rix <trix@redhat.com>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Reviewed-by: NChen Lifu <chenlifu@huawei.com>
Signed-off-by: NSebastian Reichel <sebastian.reichel@collabora.com>

189a2aae

cpufreq: amd-pstate: Add explanation for X86_AMD_PSTATE_UT · bf6430f8

由 Meng Li 提交于 9月 08, 2022

This kernel module is used for testing. It's safe to say M here.
It can also be built-in without X86_AMD_PSTATE enabled.
Currently, only tests for amd-pstate are supported. If X86_AMD_PSTATE
is set disabled, it can tell the users test can only run on amd-pstate
driver, please set X86_AMD_PSTATE enabled.
In the future, comparison tests will be added. It can set amd-pstate
disabled and set acpi-cpufreq enabled to run test cases, then compare
the test results.
Suggested-by: NShuah Khan <skhan@linuxfoundation.org>
Signed-off-by: NMeng Li <li.meng@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NShuah Khan <skhan@linuxfoundation.org>

bf6430f8

cpufreq: amd-pstate: modify type in argument 2 for filp_open · ce29a148

由 Meng Li 提交于 9月 06, 2022

Modify restricted FMODE_PREAD to experted int O_RDONLY to
fix the sparse warnings below:
sparse warnings: (new ones prefixed by >>)
>> drivers/cpufreq/amd-pstate-ut.c:74:40: sparse: sparse: incorrect type
>> in argument 2 (different base types) @@     expected int @@     got
>> restricted fmode_t [usertype] @@
   drivers/cpufreq/amd-pstate-ut.c:74:40: sparse:     expected int
   drivers/cpufreq/amd-pstate-ut.c:74:40: sparse:     got restricted
fmode_t [usertype]
Signed-off-by: NMeng Li <li.meng@amd.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NShuah Khan <skhan@linuxfoundation.org>

ce29a148

cpufreq: amd-pstate: Add test module for amd-pstate driver · 14eb1c96

由 Meng Li 提交于 8月 17, 2022

Add amd-pstate-ut test module, this module is used by kselftest
to unit test amd-pstate functionality. This module will be
expected by some of selftests to be present and loaded.
Signed-off-by: NMeng Li <li.meng@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NShuah Khan <skhan@linuxfoundation.org>
Signed-off-by: NShuah Khan <skhan@linuxfoundation.org>

14eb1c96

cpufreq: amd-pstate: Expose struct amd_cpudata · f1375ec1

由 Meng Li 提交于 8月 17, 2022

Expose struct amd_cpudata to AMD P-State unit test module.

This data struct will be used on the following AMD P-State unit test
(amd-pstate-ut) module. The amd-pstate-ut module can get some
AMD infomations by this data struct. For example: highest perf,
nominal perf, boost supported etc.
Signed-off-by: NMeng Li <li.meng@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Acked-by: NShuah Khan <skhan@linuxfoundation.org>
Signed-off-by: NShuah Khan <skhan@linuxfoundation.org>

f1375ec1

05 10月, 2022 2 次提交

remoteproc: virtio: Fix warning on bindings by removing the of_match_table · ccf22a48

由 Arnaud Pouliquen 提交于 10月 05, 2022

The checkpatch tool complains that "virtio,rproc" is not documented.
But it is not possible to probe the device "rproc-virtio" by declaring
it in the device tree. So documenting it in the bindings does not make
sense.
This commit solves the checkpatch warning by suppressing the useless
of_match_table.
Suggested-by: NRob Herring <robh@kernel.org>
Fixes: 1d7b61c0 ("remoteproc: virtio: Create platform device for the remoteproc_virtio")
Signed-off-by: NArnaud Pouliquen <arnaud.pouliquen@foss.st.com>
Reviewed-by: NRob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20221005081317.3411684-1-arnaud.pouliquen@foss.st.comSigned-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>

ccf22a48

clk: qcom: gcc-sm6375: Ensure unsigned long type · 39bc9b58

由 Stephen Boyd 提交于 10月 04, 2022

This PLL frequency needs a UL postfix to avoid compiler warnings on
32-bit architectures.

Fixes: 184fdd87 ("clk: qcom: Add global clock controller driver for SM6375")
Cc: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: NStephen Boyd <sboyd@kernel.org>

39bc9b58

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功