提交 · f4329d1f848ac35757d9cc5487669d19dfc5979c · openeuler / Kernel

31 3月, 2022 1 次提交

drbd: fix potential silent data corruption · f4329d1f

由 Lars Ellenberg 提交于 3月 30, 2022

Scenario:
---------

bio chain generated by blk_queue_split().
Some split bio fails and propagates its error status to the "parent" bio.
But then the (last part of the) parent bio itself completes without error.

We would clobber the already recorded error status with BLK_STS_OK,
causing silent data corruption.

Reproducer:
-----------

How to trigger this in the real world within seconds:

DRBD on top of degraded parity raid,
small stripe_cache_size, large read_ahead setting.
Drop page cache (sysctl vm.drop_caches=1, fadvise "DONTNEED",
umount and mount again, "reboot").

Cause significant read ahead.

Large read ahead request is split by blk_queue_split().
Parts of the read ahead that are already in the stripe cache,
or find an available stripe cache to use, can be serviced.
Parts of the read ahead that would need "too much work",
would need to wait for a "stripe_head" to become available,
are rejected immediately.

For larger read ahead requests that are split in many pieces, it is very
likely that some "splits" will be serviced, but then the stripe cache is
exhausted/busy, and the remaining ones will be rejected.
Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: <stable@vger.kernel.org> # 4.13.x
Link: https://lore.kernel.org/r/20220330185551.3553196-1-christoph.boehmwalder@linbit.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f4329d1f

30 3月, 2022 1 次提交

loop: fix ioctl calls using compat_loop_info · f941c51e

由 Carlos Llamas 提交于 3月 29, 2022

Support for cryptoloop was deleted in commit 47e96246 ("block:
remove support for cryptoloop and the xor transfer"), making the usage
of loop_info->lo_encrypt_type obsolete. However, this member was also
removed from the compat_loop_info definition and this breaks userspace
ioctl calls for 32-bit binaries and CONFIG_COMPAT=y.

This patch restores the compat_loop_info->lo_encrypt_type member and
marks it obsolete as well as in the uapi header definitions.

Fixes: 47e96246 ("block: remove support for cryptoloop and the xor transfer")
Signed-off-by: NCarlos Llamas <cmllamas@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220329201815.1347500-1-cmllamas@google.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f941c51e

29 3月, 2022 6 次提交

Merge tag 'nvme-5.18-2022-03-29' of git://git.infradead.org/nvme into for-5.18/drivers · 1e06b3e7

由 Jens Axboe 提交于 3月 29, 2022

Pull NVMe fixes from Christoph:

"- fix multipath hang when disk goes live over reconnect (Anton Eidelman)
 - fix RCU hole that allowed for endless looping in multipath round robin
   (Chris Leech)
 - remove redundant assignment after left shift (Colin Ian King)
 - add quirks for Samsung X5 SSDs (Monish Kumar R)
 - fix the read-only state for zoned namespaces with unsupposed features
   (Pankaj Raghav)
 - use a private workqueue instead of the system workqueue in nvmet
   (Sagi Grimberg)
 - allow duplicate NSIDs for private namespaces (Sungup Moon)
 - expose use_threaded_interrupts read-only in sysfs (Xin Hao)"

* tag 'nvme-5.18-2022-03-29' of git://git.infradead.org/nvme:
  nvme-multipath: fix hang when disk goes live over reconnect
  nvme: fix RCU hole that allowed for endless looping in multipath round robin
  nvme: allow duplicate NSIDs for private namespaces
  nvmet: remove redundant assignment after left shift
  nvmet: use a private workqueue instead of the system workqueue
  nvme-pci: add quirks for Samsung X5 SSDs
  nvme-pci: expose use_threaded_interrupts read-only in sysfs
  nvme: fix the read-only state for zoned namespaces with unsupposed features

1e06b3e7

nvme-multipath: fix hang when disk goes live over reconnect · a4a6f3c8

由 Anton Eidelman 提交于 3月 24, 2022

nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
fresh ANA log from the ctrl.  This is essential to have an up to date
path states for both existing namespaces and for those scan_work may
discover once the ctrl is up.

This happens in the following cases:
  1) A new ctrl is being connected.
  2) An existing ctrl is successfully reconnected.
  3) An existing ctrl is being reset.

While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
nvme_read_ana_log() may call nvme_update_ns_ana_state().

This result in a hang when the ANA state of an existing namespace changes
and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
through the ctrl, which does NOT have IO queues yet.

See sample hang below.

Solution:
- nvme_update_ns_ana_state() to call set_live only if ctrl is live
- nvme_read_ana_log() call from nvme_mpath_init_identify()
  therefore only fetches and parses the ANA log;
  any erros in this process will fail the ctrl setup as appropriate;
- a separate function nvme_mpath_update()
  is called in nvme_start_ctrl();
  this parses the ANA log without fetching it.
  At this point the ctrl is live,
  therefore, disks can be set live normally.

Sample failure:
    nvme nvme0: starting error recovery
    nvme nvme0: Reconnecting in 10 seconds...
    block nvme0n6: no usable path - requeuing I/O
    INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
          Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
    Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
    Call Trace:
     __schedule+0x2a2/0x7e0
     schedule+0x4e/0xb0
     io_schedule+0x16/0x40
     wait_on_page_bit_common+0x15c/0x3e0
     do_read_cache_page+0x1e0/0x410
     read_cache_page+0x12/0x20
     read_part_sector+0x46/0x100
     read_lba+0x121/0x240
     efi_partition+0x1d2/0x6a0
     bdev_disk_changed.part.0+0x1df/0x430
     bdev_disk_changed+0x18/0x20
     blkdev_get_whole+0x77/0xe0
     blkdev_get_by_dev+0xd2/0x3a0
     __device_add_disk+0x1ed/0x310
     device_add_disk+0x13/0x20
     nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
     nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
     nvme_update_ana_state+0xca/0xe0 [nvme_core]
     nvme_parse_ana_log+0xac/0x170 [nvme_core]
     nvme_read_ana_log+0x7d/0xe0 [nvme_core]
     nvme_mpath_init_identify+0x105/0x150 [nvme_core]
     nvme_init_identify+0x2df/0x4d0 [nvme_core]
     nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
     nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
     nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
     process_one_work+0x1bd/0x360
     worker_thread+0x50/0x3d0
Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a4a6f3c8

nvme: fix RCU hole that allowed for endless looping in multipath round robin · d6d67427

由 Chris Leech 提交于 3月 21, 2022

Make nvme_ns_remove match the assumptions elsewhere.

1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is
   running in __nvme_find_path or nvme_round_robin_path that will
   re-assign this ns to current_path.

2) Any matching current_path entries need to be cleared before removing
   from the siblings list, to prevent calling nvme_round_robin_path with
   an "old" ns that's off list.

3) Finally the list_del_rcu can happen, and then synchronize again
   before releasing any reference counts.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d6d67427

nvme: allow duplicate NSIDs for private namespaces · 5974ea7c

由 Sungup Moon 提交于 3月 14, 2022

A NVMe subsystem with multiple controller can have private namespaces
that use the same NSID under some conditions:

 "If Namespace Management, ANA Reporting, or NVM Sets are supported, the
  NSIDs shall be unique within the NVM subsystem. If the Namespace
  Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
   a) for shared namespace shall be unique; and
   b) for private namespace are not required to be unique."

Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.

Make sure this specific setup is supported in Linux.

Fixes: 9ad1927a ("nvme: always search for namespace head")
Signed-off-by: NSungup Moon <sungup.moon@samsung.com>
[hch: refactored and fixed the controller vs subsystem based naming
      conflict]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

5974ea7c

nvmet: remove redundant assignment after left shift · 63bc732c

由 Colin Ian King 提交于 3月 18, 2022

The left shift is followed by a re-assignment back to cc_css, the
assignment is redundant. Fix this by replacing the "<<=" operator with
"<<" instead.

This cleans up the clang scan build warning:

drivers/nvme/target/core.c:1124:10: warning: Although the value stored to 'cc_css' is used in the enclosing expression, the value is never actually read from 'cc_css' [deadcode.DeadStores]
Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

63bc732c

nvmet: use a private workqueue instead of the system workqueue · 8832cf92

由 Sagi Grimberg 提交于 3月 21, 2022

Any attempt to flush kernel-global WQs has possibility of deadlock
so we should simply stop using them, instead introduce nvmet_wq
which is the generic nvmet workqueue for work elements that
don't explicitly require a dedicated workqueue (by the mere fact
that they are using the system_wq).

Changes were done using the following replaces:

 - s/schedule_work(/queue_work(nvmet_wq, /g
 - s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g
 - s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8832cf92

23 3月, 2022 3 次提交

nvme-pci: add quirks for Samsung X5 SSDs · bc360b0b

由 Monish Kumar R 提交于 3月 16, 2022

Add quirks to not fail the initialization and to have quick resume
latency after cold/warm reboot.
Signed-off-by: NMonish Kumar R <monish.kumar.r@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bc360b0b

nvme-pci: expose use_threaded_interrupts read-only in sysfs · 2e21e445

由 Xin Hao 提交于 3月 22, 2022

Allow reading /sys/module/nvme/parameters/use_threaded_interrupts to see
if the use_threaded_interrupts module parameter is in use.
Signed-off-by: NXin Hao <xhao@linux.alibaba.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2e21e445

nvme: fix the read-only state for zoned namespaces with unsupposed features · 726be2c7

由 Pankaj Raghav 提交于 3月 22, 2022

commit 2f4c9ba2 ("nvme: export zoned namespaces without Zone Append
support read-only") marks zoned namespaces without append support
read-only.  It does iso by setting NVME_NS_FORCE_RO in ns->flags in
nvme_update_zone_info and checking for that flag later in
nvme_update_disk_info to mark the disk as read-only.

But commit 73d90386 ("nvme: cleanup zone information initialization")
rearranged nvme_update_disk_info to be called before
nvme_update_zone_info and thus not marking the disk as read-only.
The call order cannot be just reverted because nvme_update_zone_info sets
certain queue parameters such as zone_write_granularity that depend on the
prior call to nvme_update_disk_info.

Remove the call to set_disk_ro in nvme_update_disk_info. and call
set_disk_ro after nvme_update_zone_info and nvme_update_disk_info to set
the permission for ZNS drives correctly. The same applies to the
multipath disk path.

Fixes: 73d90386 ("nvme: cleanup zone information initialization")
Signed-off-by: NPankaj Raghav <p.raghav@samsung.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

726be2c7

21 3月, 2022 3 次提交

n64cart: convert bi_disk to bi_bdev->bd_disk fix build · b2479de3

由 Jackie Liu 提交于 3月 21, 2022

My kernel robot report below:

  drivers/block/n64cart.c: In function ‘n64cart_submit_bio’:
  drivers/block/n64cart.c:91:26: error: ‘struct bio’ has no member named ‘bi_disk’
     91 |  struct device *dev = bio->bi_disk->private_data;
        |                          ^~
    CC      drivers/slimbus/qcom-ctrl.o
    CC      drivers/auxdisplay/hd44780.o
    CC      drivers/watchdog/watchdog_core.o
    CC      drivers/nvme/host/fault_inject.o
    AR      drivers/accessibility/braille/built-in.a
  make[2]: *** [scripts/Makefile.build:288: drivers/block/n64cart.o] Error 1

Fixes: 309dca30 ("block: store a block_device pointer in struct bio");
Reported-by: Nk2ci <kernel-bot@kylinos.cn>
Signed-off-by: NJackie Liu <liuyun01@kylinos.cn>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220321071216.1549596-1-liu.yun@linux.devSigned-off-by: NJens Axboe <axboe@kernel.dk>

b2479de3

xen/blkfront: fix comment for need_copy · 08719dd9

由 Dongli Zhang 提交于 3月 17, 2022

The 'need_copy' is set when rq_data_dir(req) returns WRITE, in order to
copy the written data to persistent page.

".need_copy = rq_data_dir(req) && info->feature_persistent,"
Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Fixes: c004a6fe ('block/xen-blkfront: Make it running on 64KB page granularity')
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220317220930.5698-1-dongli.zhang@oracle.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

08719dd9

xen-blkback: remove redundant assignment to variable i · 93b4e747

由 Colin Ian King 提交于 3月 17, 2022

Variable i is being assigned a value that is never read, it is being
re-assigned later in a for-loop. The assignment is redundant and can
be removed.

Cleans up clang scan build warning:
drivers/block/xen-blkback/blkback.c:934:14: warning: Although the value
stored to 'i' is used in the enclosing expression, the value is never
actually read from 'i' [deadcode.DeadStores]
Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220317234646.78158-1-colin.i.king@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

93b4e747

18 3月, 2022 2 次提交

Merge tag 'nvme-5.18-2022-03-17' of git://git.infradead.org/nvme into for-5.18/drivers · ae53aea6

由 Jens Axboe 提交于 3月 17, 2022

Pull NVMe updates from Christoph:

"Second round of nvme updates for Linux 5.18

 - add lockdep annotations for in-kernel sockets (Chris Leech)
 - use vmalloc for ANA log buffer (Hannes Reinecke)
 - kerneldoc fixes (Chaitanya Kulkarni)
 - cleanups (Guoqing Jiang, Chaitanya Kulkarni, me)
 - warn about shared namespaces without multipathing (me)"

* tag 'nvme-5.18-2022-03-17' of git://git.infradead.org/nvme:
  nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
  nvme: remove nvme_alloc_request and nvme_alloc_request_qid
  nvme: cleanup how disk->disk_name is assigned
  nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
  nvmet: use snprintf() with PAGE_SIZE in configfs
  nvmet: don't fold lines
  nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
  nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
  nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
  nvme-tcp: lockdep: annotate in-kernel sockets
  nvme-tcp: don't fold the line
  nvme-tcp: don't initialize ret variable
  nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
  nvme-multipath: use vmalloc for ANA log buffer

ae53aea6

virtio_blk: eliminate anonymous module_init & module_exit · bcfe9b6c

由 Randy Dunlap 提交于 3月 16, 2022

Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.

Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.

Example 1: (System.map)
 ffffffff832fc78c t init
 ffffffff832fc79e t init
 ffffffff832fc8f8 t init

Example 2: (initcall_debug log)
 calling  init+0x0/0x12 @ 1
 initcall init+0x0/0x12 returned 0 after 15 usecs
 calling  init+0x0/0x60 @ 1
 initcall init+0x0/0x60 returned 0 after 2 usecs
 calling  init+0x0/0x9a @ 1
 initcall init+0x0/0x9a returned 0 after 74 usecs

Fixes: e467cde2 ("Block driver using virtio.")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220316192010.19001-2-rdunlap@infradead.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

bcfe9b6c

16 3月, 2022 3 次提交

nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH · ce8d7861

由 Christoph Hellwig 提交于 3月 15, 2022

Start warning about exposing a namespace as multiple block devices,
and set a fixed deprecation release.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>

ce8d7861

nvme: remove nvme_alloc_request and nvme_alloc_request_qid · e559398f

由 Christoph Hellwig 提交于 3月 15, 2022

Just open code the allocation + initialization in the callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

e559398f

nvme: cleanup how disk->disk_name is assigned · b739e137

由 Christoph Hellwig 提交于 3月 15, 2022

They way how assigning the disk name and commenting on why it is done
is split over core.c and multipath.c seems to be rather confusing.

Now that ns_head->disk always exists we can do all the work in core.c
and have a single big comment explaining the issues.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

b739e137

15 3月, 2022 1 次提交

nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate · da783733

由 Christoph Hellwig 提交于 3月 15, 2022

nvmet_ns_changed states via lockdep that the ns->subsys->lock must be
held. The only caller of nvmet_ns_changed which does not acquire that
lock is nvmet_ns_revalidate. nvmet_ns_revalidate has 3 callers,
of which 2 do not acquire that lock: nvmet_execute_identify_cns_cs_ns
and nvmet_execute_identify_ns. The other caller
nvmet_ns_revalidate_size_store does acquire the lock.

Move the call to nvmet_ns_changed from nvmet_ns_revalidate to the callers
so that they can perform the correct locking as needed.

This issue was found using a static type-based analyser and manually
verified.
Reported-by: NNiels Dossche <dossche.niels@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

da783733

14 3月, 2022 10 次提交

nvmet: use snprintf() with PAGE_SIZE in configfs · 98152eb7

由 Chaitanya Kulkarni 提交于 3月 08, 2022

Instead of using sprintf, use snprintf with buffer size limited to
PAGE_SIZE just like what we have for the rest of the file.
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

98152eb7

nvmet: don't fold lines · 73d77c53

由 Chaitanya Kulkarni 提交于 3月 08, 2022

Don't fold line that can fit into 80 char limit. No functional change
in this patch.
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

73d77c53

nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal · a8adf0cd

由 Chaitanya Kulkarni 提交于 3月 08, 2022

This fixes following kernel-doc warning:-

drivers/nvme/target/rdma.c:1722: warning: expecting prototype for nvme_rdma_device_removal(). Prototype was for nvmet_rdma_device_removal() instead
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a8adf0cd

nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport · 0acb8231

由 Chaitanya Kulkarni 提交于 3月 08, 2022

This fixes following kernel-doc warning:-

drivers/nvme/target/fc.c:1619: warning: expecting prototype for nvme_fc_unregister_targetport(). Prototype was for nvmet_fc_unregister_targetport() instead
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0acb8231

nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport · b2fb99e4

由 Chaitanya Kulkarni 提交于 3月 08, 2022

This fixes following kernel-doc warning :-

drivers/nvme/target/fc.c:1365: warning: expecting prototype for nvme_fc_register_targetport(). Prototype was for nvmet_fc_register_targetport() instead
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b2fb99e4

nvme-tcp: lockdep: annotate in-kernel sockets · 841aee4d

由 Chris Leech 提交于 2月 15, 2022

Put NVMe/TCP sockets in their own class to avoid some lockdep warnings.
Sockets created by nvme-tcp are not exposed to user-space, and will not
trigger certain code paths that the general socket API exposes.

Lockdep complains about a circular dependency between the socket and
filesystem locks, because setsockopt can trigger a page fault with a
socket lock held, but nvme-tcp sends requests on the socket while file
system locks are held.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.15.0-rc3 #1 Not tainted
  ------------------------------------------------------
  fio/1496 is trying to acquire lock:
  (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendpage+0x23/0x80

  but task is already holding lock:
  (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]

  which lock already depends on the new lock.

  other info that might help us debug this:

  chain exists of:
   sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xfs_dir_ilock_class/5);
                                lock(sb_internal);
                                lock(&xfs_dir_ilock_class/5);
   lock(sk_lock-AF_INET);

  *** DEADLOCK ***

  6 locks held by fio/1496:
   #0: (sb_writers#13){.+.+}-{0:0}, at: path_openat+0x9fc/0xa20
   #1: (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0x296/0xa20
   #2: (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc_icreate+0x41/0xd0 [xfs]
   #3: (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]
   #4: (hctx->srcu){....}-{0:0}, at: hctx_lock+0x51/0xd0
   #5: (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0x33e/0x380 [nvme_tcp]

This annotation lets lockdep analyze nvme-tcp controlled sockets
independently of what the user-space sockets API does.

Link: https://lore.kernel.org/linux-nvme/CAHj4cs9MDYLJ+q+2_GXUK9HxFizv2pxUryUR0toX974M040z7g@mail.gmail.com/Signed-off-by: NChris Leech <cleech@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

841aee4d

nvme-tcp: don't fold the line · a387935c

由 Chaitanya Kulkarni 提交于 2月 22, 2022

The call to nvme_tcp_alloc_queue() fits perfectly in one line without
exceeding 80 char limit for the line.
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a387935c

nvme-tcp: don't initialize ret variable · 462b8b2d

由 Chaitanya Kulkarni 提交于 2月 22, 2022

No point in initializing ret variable to 0 in nvme_tcp_start_io_queue()
since it gets overwritten by a call to nvme_tcp_start_queue().
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

462b8b2d

nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio · 8f31dded

由 Guoqing Jiang 提交于 3月 09, 2022

Use bio_io_error() here since bio_io_error does the same thing.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8f31dded

nvme-multipath: use vmalloc for ANA log buffer · 5e6a7d1e

由 Hannes Reinecke 提交于 3月 09, 2022

The ANA log buffer can get really large, as it depends on the
controller configuration. So to avoid an out-of-memory issue
during scanning use kvmalloc() instead of the kmalloc().
Signed-off-by: NHannes Reinecke <hare@suse.de>
Tested-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NDaniel Wagner <dwagner@suse.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5e6a7d1e

11 3月, 2022 2 次提交

xen/blkfront: speed up purge_persistent_grants() · 85d9abcd

由 Juergen Gross 提交于 3月 11, 2022

purge_persistent_grants() is scanning the grants list for persistent
grants being no longer in use by the backend. When having found such a
grant, it will be set to "invalid" and pushed to the tail of the list.

Instead of pushing it directly to the end of the list, add it first to
a temporary list, avoiding to scan those entries again in the main
list traversal. After having finished the scan, append the temporary
list to the grant list.
Suggested-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NRoger Pau Monné <roger.pau@citrix.com>
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20220311103527.12931-1-jgross@suse.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

85d9abcd

Merge branch 'md-next' of... · 67b56134

由 Jens Axboe 提交于 3月 10, 2022

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.18/drivers

Pull MD updates from Song:

"This set contains raid5 bio handling cleanups for raid5."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  raid5: initialize the stripe_head embeeded bios as needed
  raid5-cache: statically allocate the recovery ra bio
  raid5-cache: fully initialize flush_bio when needed
  raid5-ppl: fully initialize the bio in ppl_new_iounit

67b56134

09 3月, 2022 8 次提交

raid5: initialize the stripe_head embeeded bios as needed · 03a6b195

由 Christoph Hellwig 提交于 2月 28, 2022

Use bio_init to initialize the bios when needed to the full state
instead of a partial initialization plus later setting of dev and op
and bio_reset.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

03a6b195

raid5-cache: statically allocate the recovery ra bio · 89f94b64

由 Christoph Hellwig 提交于 2月 28, 2022

There is no need to preallocate the bio and reset it when use.  Just
allocate it on-stack and use a bvec places next to the pages used for
it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

89f94b64

raid5-cache: fully initialize flush_bio when needed · 0dd00cba

由 Christoph Hellwig 提交于 2月 28, 2022

Stop using bio_reset and just initialize the bio fully when needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

0dd00cba

raid5-ppl: fully initialize the bio in ppl_new_iounit · 9f7c3f83

由 Christoph Hellwig 提交于 2月 28, 2022

We have all the information to pass the bdev and op directly to bio_init,
so do that.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSong Liu <song@kernel.org>

9f7c3f83

Merge branch 'md-next' of... · a2daeab5

由 Jens Axboe 提交于 3月 08, 2022

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.18/drivers

Pull MD fixes from Song:

"Most of these changes are minor fixes and clean-ups."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: use msleep() in md_notify_reboot()
  lib/raid6: Include <asm/ppc-opcode.h> for VPERMXOR
  lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
  lib/raid6/test: fix multiple definition linking error
  md: raid1/raid10: drop pending_cnt

a2daeab5

md: use msleep() in md_notify_reboot() · 7d959f6e

由 Eric Dumazet 提交于 3月 03, 2022

Calling mdelay(1000) from process context, even while a reboot
is in progress, does not make sense.

Using msleep() allows other threads to make progress.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: linux-raid@vger.kernel.org
Signed-off-by: NSong Liu <song@kernel.org>

7d959f6e

lib/raid6: Include <asm/ppc-opcode.h> for VPERMXOR · 5b401e4e

由 Paul Menzel 提交于 2月 08, 2022

On Ubuntu 21.10 (ppc64le) building raid6test with gcc (Ubuntu
11.2.0-7ubuntu2) 11.2.0 fails with the error below.

    gcc -I.. -I ../../../include -g -O2                       \
             -I../../../arch/powerpc/include -DCONFIG_ALTIVEC \
             -c -o vpermxor1.o vpermxor1.c
    vpermxor1.c: In function ‘raid6_vpermxor1_gen_syndrome_real’:
    vpermxor1.c:64:29: error: expected string literal before ‘VPERMXOR’
       64 |   asm(VPERMXOR(%0,%1,%2,%3):"=v"(wq0):"v"(gf_high), "v"(gf_low), "v"(wq0));
          |       ^~~~~~~~
    make: *** [Makefile:58: vpermxor1.o] Error 1

So, include the header asm/ppc-opcode.h defining this macro also when
not building the Linux kernel but only this too.

Cc: Matt Brown <matthew.brown.dev@gmail.com>
Signed-off-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NSong Liu <song@kernel.org>

5b401e4e

lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 · 633174a7

由 Paul Menzel 提交于 2月 08, 2022

Buidling raid6test on Ubuntu 21.10 (ppc64le) with GNU Make 4.3 shows the
errors below:

    $ cd lib/raid6/test/
    $ make
    <stdin>:1:1: error: stray ‘\’ in program
    <stdin>:1:2: error: stray ‘#’ in program
    <stdin>:1:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ \
        before ‘<’ token

    [...]

The errors come from the HAS_ALTIVEC test, which fails, and the POWER
optimized versions are not built. That’s also reason nobody noticed on the
other architectures.

GNU Make 4.3 does not remove the backslash anymore. From the 4.3 release
announcment:

> * WARNING: Backward-incompatibility!
>   Number signs (#) appearing inside a macro reference or function invocation
>   no longer introduce comments and should not be escaped with backslashes:
>   thus a call such as:
>     foo := $(shell echo '#')
>   is legal.  Previously the number sign needed to be escaped, for example:
>     foo := $(shell echo '\#')
>   Now this latter will resolve to "\#".  If you want to write makefiles
>   portable to both versions, assign the number sign to a variable:
>     H := \#
>     foo := $(shell echo '$H')
>   This was claimed to be fixed in 3.81, but wasn't, for some reason.
>   To detect this change search for 'nocomment' in the .FEATURES variable.

So, do the same as commit 9564a8cf ("Kbuild: fix # escaping in .cmd
files for future Make") and commit 929bef46 ("bpf: Use $(pound) instead
of \# in Makefiles") and define and use a $(pound) variable.

Reference for the change in make:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57

Cc: Matt Brown <matthew.brown.dev@gmail.com>
Signed-off-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NSong Liu <song@kernel.org>

633174a7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功