提交 · 7d30c81b80ea9b0812d27030a46a5bf4c4e328f5 · openeuler / Kernel

12 7月, 2019 1 次提交

nvme: fix NULL deref for fabrics options · 7d30c81b

由 Minwoo Im 提交于 7月 12, 2019

git://git.infradead.org/nvme.git nvme-5.3 branch now causes the
following NULL deref oops.  Check the ctrl->opts first before the deref.

[   16.337581] BUG: kernel NULL pointer dereference, address: 0000000000000056
[   16.338551] #PF: supervisor read access in kernel mode
[   16.338551] #PF: error_code(0x0000) - not-present page
[   16.338551] PGD 0 P4D 0
[   16.338551] Oops: 0000 [#1] SMP PTI
[   16.338551] CPU: 2 PID: 1035 Comm: kworker/u16:5 Not tainted 5.2.0-rc6+ #1
[   16.338551] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[   16.338551] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[   16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
[   16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
[   16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
[   16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
[   16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
[   16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
[   16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
[   16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
[   16.338551] FS:  0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
[   16.338551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
[   16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   16.338551] Call Trace:
[   16.338551]  nvme_scan_work+0x2c0/0x340 [nvme_core]
[   16.338551]  ? __switch_to_asm+0x40/0x70
[   16.338551]  ? _raw_spin_unlock_irqrestore+0x18/0x30
[   16.338551]  ? try_to_wake_up+0x408/0x450
[   16.338551]  process_one_work+0x20b/0x3e0
[   16.338551]  worker_thread+0x1f9/0x3d0
[   16.338551]  ? cancel_delayed_work+0xa0/0xa0
[   16.338551]  kthread+0x117/0x120
[   16.338551]  ? kthread_stop+0xf0/0xf0
[   16.338551]  ret_from_fork+0x3a/0x50
[   16.338551] Modules linked in: nvme nvme_core
[   16.338551] CR2: 0000000000000056
[   16.338551] ---[ end trace b9bf761a93e62d84 ]---
[   16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core]
[   16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf
[   16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283
[   16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007
[   16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720
[   16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840
[   16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8
[   16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001
[   16.338551] FS:  0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000
[   16.338551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0
[   16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 958f2a0f ("nvme-tcp: set the STABLE_WRITES flag when data digests are enabled")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7d30c81b

11 7月, 2019 7 次提交

Merge branch 'nvme-5.3' of git://git.infradead.org/nvme into for-linus · b7403066

由 Jens Axboe 提交于 7月 11, 2019

Pull NVMe fixes from Christoph:

"Lof of fixes all over the place, and two very minor features that
 were in the nvme tree by the end of the merge window, but hadn't made
 it out to Jens yet."

* 'nvme-5.3' of git://git.infradead.org/nvme:
  nvme: fix regression upon hot device removal and insertion
  nvme-fc: fix module unloads while lports still pending
  nvme-tcp: don't use sendpage for SLAB pages
  nvme-tcp: set the STABLE_WRITES flag when data digests are enabled
  nvmet: print a hint while rejecting NSID 0 or 0xffffffff
  nvme-multipath: do not select namespaces which are about to be removed
  nvme-multipath: also check for a disabled path if there is a single sibling
  nvme-multipath: factor out a nvme_path_is_disabled helper
  nvme: set physical block size and optimal I/O size
  nvme: add I/O characteristics fields
  nvmet: export I/O characteristics attributes in Identify
  nvme-trace: add delete completion and submission queue to admin cmds tracer
  nvme-trace: fix spelling mistake "spcecific" -> "specific"
  nvme-pci: limit max_hw_sectors based on the DMA max mapping size
  nvme-pci: check for NULL return from pci_alloc_p2pmem()
  nvme-pci: don't create a read hctx mapping without read queues
  nvme-pci: don't fall back to a 32-bit DMA mask
  nvme-pci: make nvme_dev_pm_ops static
  nvme-fcloop: resolve warnings on RCU usage and sleep warnings
  nvme-fcloop: fix inconsistent lock state warnings

b7403066

nbd: add netlink reconfigure resize support · 4ddeaae8

由 Mike Christie 提交于 5月 29, 2019

If the device is setup with ioctl we can resize the device after the
initial setup, but if the device is setup with netlink we cannot use the
resize related ioctls and there is no netlink reconfigure size ATTR
handling code.

This patch adds netlink reconfigure resize support to match the ioctl
interface.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4ddeaae8

nbd: fix crash when the blksize is zero · 553768d1

由 Xiubo Li 提交于 5月 29, 2019

This will allow the blksize to be set zero and then use 1024 as
default.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NXiubo Li <xiubli@redhat.com>
[fix to use goto out instead of return in genl_connect]
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

553768d1

block: Disable write plugging for zoned block devices · b49773e7

由 Damien Le Moal 提交于 7月 11, 2019

Simultaneously writing to a sequential zone of a zoned block device
from multiple contexts requires mutual exclusion for BIO issuing to
ensure that writes happen sequentially. However, even for a well
behaved user correctly implementing such synchronization, BIO plugging
may interfere and result in BIOs from the different contextx to be
reordered if plugging is done outside of the mutual exclusion section,
e.g. the plug was started by a function higher in the call chain than
the function issuing BIOs.

         Context A                     Context B

   | blk_start_plug()
   | ...
   | seq_write_zone()
     | mutex_lock(zone)
     | bio-0->bi_iter.bi_sector = zone->wp
     | zone->wp += bio_sectors(bio-0)
     | submit_bio(bio-0)
     | bio-1->bi_iter.bi_sector = zone->wp
     | zone->wp += bio_sectors(bio-1)
     | submit_bio(bio-1)
     | mutex_unlock(zone)
     | return
   | -----------------------> | seq_write_zone()
  				| mutex_lock(zone)
     				| bio-2->bi_iter.bi_sector = zone->wp
     				| zone->wp += bio_sectors(bio-2)
				| submit_bio(bio-2)
				| mutex_unlock(zone)
   | <------------------------- |
   | blk_finish_plug()

In the above example, despite the mutex synchronization ensuring the
correct BIO issuing order 0, 1, 2, context A BIOs 0 and 1 end up being
issued after BIO 2 of context B, when the plug is released with
blk_finish_plug().

While this problem can be addressed using the blk_flush_plug_list()
function (in the above example, the call must be inserted before the
zone mutex lock is released), a simple generic solution in the block
layer avoid this additional code in all zoned block device user code.
The simple generic solution implemented with this patch is to introduce
the internal helper function blk_mq_plug() to access the current
context plug on BIO submission. This helper returns the current plug
only if the target device is not a zoned block device or if the BIO to
be plugged is not a write operation. Otherwise, the caller context plug
is ignored and NULL returned, resulting is all writes to zoned block
device to never be plugged.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b49773e7

block: Fix elevator name declaration · 9305d5d7

由 Damien Le Moal 提交于 7月 11, 2019

The elevator_name field in struct elevator_type is declared as an array
of characters (ELV_NAME_MAX size) but in practice used as a string
pointer with its initialization done statically within each
elevator elevator_type structure declaration.

Change the declaration of elevator_name to the more appropriate
"const char *" type.
Acked-by: NMarcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9305d5d7

block: Remove unused definitions · 36847a00

由 Damien Le Moal 提交于 7月 11, 2019

The ELV_MQUEUE_XXX definitions in include/linux/elevator.h are unused
since the removal of elevator_may_queue_fn in kernel 5.0. Remove these
definitions and also remove the documentation of elevator_may_queue_fn
in Documentiation/block/biodoc.txt.
Acked-by: NMarcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

36847a00

nvme: fix regression upon hot device removal and insertion · 420dc733

由 Sagi Grimberg 提交于 7月 10, 2019

When we validate the new controller id, we want to skip
controllers that are either deleting or dead. Fix the check
to do that and not on the newly added controller.

Fixes: 1b1031ca ("nvme: validate cntlid during controller initialisation")
Reported-by: NJon Derrick <jonathan.derrick@intel.com>
Tested-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

420dc733

10 7月, 2019 32 次提交

blk-throttle: fix zero wait time for iops throttled group · 3a10f999

由 Konstantin Khlebnikov 提交于 7月 08, 2019

After commit 991f61fe ("Blk-throttle: reduce tail io latency when
iops limit is enforced") wait time could be zero even if group is
throttled and cannot issue requests right now. As a result
throtl_select_dispatch() turns into busy-loop under irq-safe queue
spinlock.

Fix is simple: always round up target time to the next throttle slice.

Fixes: 991f61fe ("Blk-throttle: reduce tail io latency when iops limit is enforced")
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a10f999

block: Fix potential overflow in blk_report_zones() · 113ab72e

由 Damien Le Moal 提交于 7月 10, 2019

For large values of the number of zones reported and/or large zone
sizes, the sector increment calculated with

blk_queue_zone_sectors(q) * n

in blk_report_zones() loop can overflow the unsigned int type used for
the calculation as both "n" and blk_queue_zone_sectors() value are
unsigned int. E.g. for a device with 256 MB zones (524288 sectors),
overflow happens with 8192 or more zones reported.

Changing the return type of blk_queue_zone_sectors() to sector_t, fixes
this problem and avoids overflow problem for all other callers of this
helper too. The same change is also applied to the bdev_zone_sectors()
helper.

Fixes: e76239a3 ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

113ab72e

blkcg: implement REQ_CGROUP_PUNT · d3f77dfd

由 Tejun Heo 提交于 6月 27, 2019

When a shared kthread needs to issue a bio for a cgroup, doing so
synchronously can lead to priority inversions as the kthread can be
trapped waiting for that cgroup.  This patch implements
REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing
to a dedicated per-blkcg work item to avoid such priority inversions.

This will be used to fix priority inversions in btrfs compression and
should be generally useful as we grow filesystem support for
comprehensive IO control.

Cc: Chris Mason <clm@fb.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3f77dfd

blkcg, writeback: Implement wbc_blkcg_css() · 653c45c6

由 Tejun Heo 提交于 6月 27, 2019

Add a helper to determine the target blkcg from wbc.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

653c45c6

blkcg, writeback: Add wbc->no_cgroup_owner · 27b36d8f

由 Tejun Heo 提交于 6月 27, 2019

When writeback IOs are bounced through async layers, the IOs should
only be accounted against the wbc from the original bdi writeback to
avoid confusing cgroup inode ownership arbitration.  Add
wbc->no_cgroup_owner to allow disabling wbc cgroup owner accounting.
This will be used make btrfs compression work well with cgroup IO
control.

v2: Renamed from no_wbc_acct to no_cgroup_owner and added comment as
    per Jan.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

27b36d8f

blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() · 34e51a5e

由 Tejun Heo 提交于 6月 27, 2019

wbc_account_io() does a very specific job - try to see which cgroup is
actually dirtying an inode and transfer its ownership to the majority
dirtier if needed.  The name is too generic and confusing.  Let's
rename it to something more specific.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

34e51a5e

cgroup, blkcg: Prepare some symbols for module and !CONFIG_CGROUP usages · 9b0eb69b

由 Tejun Heo 提交于 6月 27, 2019

btrfs is going to use css_put() and wbc helpers to improve cgroup
writeback support.  Add dummy css_get() definition and export wbc
helpers to prepare for module and !CONFIG_CGROUP builds.
Reported-by: Nkbuild test robot <lkp@intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b0eb69b

blk-cgroup: turn on psi memstall stuff · fd112c74

由 Josef Bacik 提交于 7月 09, 2019

With the psi stuff in place we can use the memstall flag to indicate
pressure that happens from throttling.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fd112c74

block: init flush rq ref count to 1 · b554db14

由 Josef Bacik 提交于 3月 07, 2019

We discovered a problem in newer kernels where a disconnect of a NBD
device while the flush request was pending would result in a hang.  This
is because the blk mq timeout handler does

        if (!refcount_inc_not_zero(&rq->ref))
                return true;

to determine if it's ok to run the timeout handler for the request.
Flush_rq's don't have a ref count set, so we'd skip running the timeout
handler for this request and it would just sit there in limbo forever.

Fix this by always setting the refcount of any request going through
blk_init_rq() to 1.  I tested this with a nbd-server that dropped flush
requests to verify that it hung, and then tested with this patch to
verify I got the timeout as expected and the error handling kicked in.
Thanks,
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b554db14

nvme-fc: fix module unloads while lports still pending · 4c73cbdf

由 James Smart 提交于 6月 28, 2019

Current code allows the module to be unloaded even if there are
pending data structures, such as localports and controllers on
the localports, that have yet to hit their reference counting
to remove them.

Fix by having exit entrypoint explicitly delete every controller,
which in turn will remove references on the remoteports and localports
causing them to be deleted as well. The exit entrypoint, after
initiating the deletes, will wait for the last localport to be deleted
before continuing.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4c73cbdf

nvme-tcp: don't use sendpage for SLAB pages · 37c15219

由 Mikhail Skorzhinskii 提交于 7月 08, 2019

According to commit a10674bf ("tcp: detecting the misuse of
.sendpage for Slab objects") and previous discussion, tcp_sendpage
should not be used for pages that is managed by SLAB, as SLAB is not
taking page reference counters into consideration.
Signed-off-by: NMikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

37c15219

nvme-tcp: set the STABLE_WRITES flag when data digests are enabled · 958f2a0f

由 Mikhail Skorzhinskii 提交于 7月 04, 2019

There was a few false alarms sighted on target side about wrong data
digest while performing high throughput load to XFS filesystem shared
through NVMoF TCP.

This flag tells the rest of the kernel to ensure that the data buffer
does not change while the write is in flight.  It incurs a performance
penalty, so only enable it when it is actually needed, i.e. when we are
calculating data digests.

Although even with this change in place, ext2 users can steel experience
false positives, as ext2 is not respecting this flag. This may be apply
to vfat as well.
Signed-off-by: NMikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Signed-off-by: NMike Playle <mplayle@solarflare.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

958f2a0f

nvmet: print a hint while rejecting NSID 0 or 0xffffffff · 5ba89503

由 Mikhail Skorzhinskii 提交于 7月 04, 2019

Adding this hint for the sake of convenience.

It was spotted that a few times people spent some time before
understanding what is exactly wrong in configuration process.  This
should save a few time in such situations, especially for people who
is not very confident with NVMe requirements.
Signed-off-by: NMikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5ba89503

nvme-multipath: do not select namespaces which are about to be removed · 04e70bd4

由 Hannes Reinecke 提交于 7月 04, 2019

nvme_ns_remove() will first set the NVME_NS_REMOVING flag before removing
it from the list at the very last step.
So to avoid selecting a namespace in nvme_find_path() which is about to be
removed check the NVME_NS_REMOVING flag, too, when selecting a new path.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

04e70bd4

nvme-multipath: also check for a disabled path if there is a single sibling · 2032d074

由 Hannes Reinecke 提交于 7月 04, 2019

When we have a singular list in nvme_round_robin_path() we still
need to check its validity.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2032d074

nvme-multipath: factor out a nvme_path_is_disabled helper · ca7ae5c9

由 Hannes Reinecke 提交于 7月 04, 2019

Factor our a common helper to check if a path has been disabled
by something other than the per-namespace ANA state.
Signed-off-by: NHannes Reinecke <hare@suse.com>
[hch: split from a bigger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ca7ae5c9

nvme: set physical block size and optimal I/O size · 81adb863

由 Bart Van Assche 提交于 6月 28, 2019

>From the NVMe 1.4 spec:

NSFEAT bit 4 if set to 1: indicates that the fields NPWG, NPWA, NPDG, NPDA,
and NOWS are defined for this namespace and should be used by the host for
I/O optimization;
[ ... ]
Namespace Preferred Write Granularity (NPWG): This field indicates the
smallest recommended write granularity in logical blocks for this namespace.
This is a 0's based value. The size indicated should be less than or equal
to Maximum Data Transfer Size (MDTS) that is specified in units of minimum
memory page size. The value of this field may change if the namespace is
reformatted. The size should be a multiple of Namespace Preferred Write
Alignment (NPWA). Refer to section 8.25 for how this field is utilized to
improve performance and endurance.
[ ... ]
Each Write, Write Uncorrectable, or Write Zeroes commands should address a
multiple of Namespace Preferred Write Granularity (NPWG) (refer to Figure
245) and Stream Write Size (SWS) (refer to Figure 515) logical blocks (as
expressed in the NLB field), and the SLBA field of the command should be
aligned to Namespace Preferred Write Alignment (NPWA) (refer to Figure 245)
for best performance.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

81adb863

nvme: add I/O characteristics fields · 6605bdd5

由 Bart Van Assche 提交于 6月 28, 2019

Several new fields have been introduced in version 1.4 of the NVMe spec
at offsets that were defined as reserved in version 1.3d of the NVMe
spec. Update the definition of the nvme_id_ns data structure such that
it is in sync with version 1.4 of the NVMe spec. This change preserves
backwards compatibility.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6605bdd5

nvmet: export I/O characteristics attributes in Identify · 9d05a96e

由 Bart Van Assche 提交于 6月 28, 2019

Make the NVMe NAWUN, NAWUPF, NACWU, NPWG, NPWA, NPDG and NOWS attributes
available to initator systems for the block backend.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9d05a96e

nvme-trace: add delete completion and submission queue to admin cmds tracer · 4c0181bf

由 Tom Wu 提交于 7月 04, 2019

The trace log for 'delete I/O submission queue' and 'delete I/O
completion queue' command will look like as below:

kworker/u49:1-3438 [003] .... 6693.070865: nvme_setup_cmd: nvme0: qid=0, cmdid=11, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_delete_sq sqid=1)
kworker/u49:1-3438 [003] .... 6693.071171: nvme_setup_cmd: nvme0: qid=0, cmdid=8, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_delete_cq cqid=24)
Signed-off-by: NTom Wu <tomwu@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4c0181bf

nvme-trace: fix spelling mistake "spcecific" -> "specific" · 91f6d798

由 Colin Ian King 提交于 6月 26, 2019

There are two spelling mistakes in trace_seq_printf messages, fix these.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

91f6d798

nvme-pci: limit max_hw_sectors based on the DMA max mapping size · 7637de31

由 Christoph Hellwig 提交于 7月 03, 2019

When running a NVMe device that is attached to a addressing
challenged PCIe root port that requires bounce buffering, our
request sizes can easily overflow the swiotlb bounce buffer
size.  Limit the maximum I/O size to the limit exposed by
the DMA mapping subsystem.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NAtish Patra <Atish.Patra@wdc.com>
Tested-by: NAtish Patra <Atish.Patra@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

7637de31

nvme-pci: check for NULL return from pci_alloc_p2pmem() · bfac8e9f

由 Alan Mikhak 提交于 7月 08, 2019

Modify nvme_alloc_sq_cmds() to call pci_free_p2pmem() to free the memory
it allocated using pci_alloc_p2pmem() in case pci_p2pmem_virt_to_bus()
returns null.

Makes sure not to call pci_free_p2pmem() if pci_alloc_p2pmem() returned
NULL, which can happen if CONFIG_PCI_P2PDMA is not configured.

The current implementation is not expected to leak since
pci_p2pmem_virt_to_bus() is expected to fail only if pci_alloc_p2pmem()
returns null. However, checking the return value of pci_alloc_p2pmem()
is more explicit.
Signed-off-by: NAlan Mikhak <alan.mikhak@sifive.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfac8e9f

nvme-pci: don't create a read hctx mapping without read queues · 0298d543

由 Alan Mikhak 提交于 7月 08, 2019

Only request an IRQ mapping for read queues if at least one read queue
is being allocted, as nvme_pci_map_queues() will later on ignore the
unnecessary mapping request should nvme_dev_add() request such an IRQ
mapping even though no read queues are being allocated. However,
nvme_dev_add() can avoid making the request by checking the number of
read queues without assuming. This would bring it more in line with
nvme_setup_irqs() and nvme_calc_irq_sets().
Signed-off-by: NAlan Mikhak <alan.mikhak@sifive.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0298d543

nvme-pci: don't fall back to a 32-bit DMA mask · 4fe06923

由 Christoph Hellwig 提交于 6月 28, 2019

Since Linux 5.0 drivers can safely set the largest DMA mask supported
by the device, and don't need fallbacks to work around the dma mapping
implementations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

4fe06923

nvme-pci: make nvme_dev_pm_ops static · 21774222

由 YueHaibing 提交于 6月 26, 2019

Fix sparse warning:

drivers/nvme/host/pci.c:2926:25: warning:
 symbol 'nvme_dev_pm_ops' was not declared. Should it be static?
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

21774222

nvme-fcloop: resolve warnings on RCU usage and sleep warnings · e0620bf8

由 James Smart 提交于 6月 20, 2019

With additional debugging enabled, seeing warnings for suspicious RCU
usage or Sleeping function called from invalid context.

These both map to allocation of a work structure which is currently
GFP_KERNEL, meaning it can sleep. For the RCU warning, the sequence was
sleeping while holding the RCU lock.

Convert the allocation to GFP_ATOMIC.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e0620bf8

nvme-fcloop: fix inconsistent lock state warnings · c38dbbfa

由 James Smart 提交于 6月 20, 2019

With extra debug on, inconsistent lock state warnings are being called
out as the tfcp_req->reqlock is being taken out without irq, while some
calling sequences have the sequence in a softirq state.

Change the lock taking/release to raise/drop irq.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c38dbbfa

Merge tag 'for-5.3/libata-20190708' of git://git.kernel.dk/linux-block · cdc5ffc4

由 Linus Torvalds 提交于 7月 09, 2019

Pull libata updates from Jens Axboe:
 "These are the changes that are reviewed, tested, and queued up for
  this merge window. This contains:

   - Removal of redundant memset after dmam_alloc_coherent (Fuqian)

   - Expand blacklist check for ST1000LM024, making it independent of
     firmware version (Hans)

   - Request sense fix (Tejun)

   - ahci_sunxi FIFO fix (Uenal)"

* tag 'for-5.3/libata-20190708' of git://git.kernel.dk/linux-block:
  drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs
  libata: Drop firmware version check from the ST1000LM024 quirk
  ata: sata_sil24: Remove call to memset after dmam_alloc_coherent
  ata:sata_qstor: Remove call to memset after dmam_alloc_coherent
  ata: sata_nv: Remove call to memset after dmam_alloc_coherent
  ata: pdc_adma: Remove call to memset after dmam_alloc_coherent
  ata: libahci: Remove call to memset after dmam_alloc_coherent
  ata: acard-ahci: Remove call to memset after dmam_alloc_coherent
  libata: don't request sense data on !ZAC ATA devices

cdc5ffc4

Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block · 3b99107f

由 Linus Torvalds 提交于 7月 09, 2019

Pull block updates from Jens Axboe:
 "This is the main block updates for 5.3. Nothing earth shattering or
  major in here, just fixes, additions, and improvements all over the
  map. This contains:

   - Series of documentation fixes (Bart)

   - Optimization of the blk-mq ctx get/put (Bart)

   - null_blk removal race condition fix (Bob)

   - req/bio_op() cleanups (Chaitanya)

   - Series cleaning up the segment accounting, and request/bio mapping
     (Christoph)

   - Series cleaning up the page getting/putting for bios (Christoph)

   - block cgroup cleanups and moving it to where it is used (Christoph)

   - block cgroup fixes (Tejun)

   - Series of fixes and improvements to bcache, most notably a write
     deadlock fix (Coly)

   - blk-iolatency STS_AGAIN and accounting fixes (Dennis)

   - Series of improvements and fixes to BFQ (Douglas, Paolo)

   - debugfs_create() return value check removal for drbd (Greg)

   - Use struct_size(), where appropriate (Gustavo)

   - Two lighnvm fixes (Heiner, Geert)

   - MD fixes, including a read balance and corruption fix (Guoqing,
     Marcos, Xiao, Yufen)

   - block opal shadow mbr additions (Jonas, Revanth)

   - sbitmap compare-and-exhange improvemnts (Pavel)

   - Fix for potential bio->bi_size overflow (Ming)

   - NVMe pull requests:
       - improved PCIe suspent support (Keith Busch)
       - error injection support for the admin queue (Akinobu Mita)
       - Fibre Channel discovery improvements (James Smart)
       - tracing improvements including nvmetc tracing support (Minwoo Im)
       - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya
         Kulkarni)"

   - Various little fixes and improvements to drivers and core"

* tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits)
  blk-iolatency: fix STS_AGAIN handling
  block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES
  blk-mq: simplify blk_mq_make_request()
  blk-mq: remove blk_mq_put_ctx()
  sbitmap: Replace cmpxchg with xchg
  block: fix .bi_size overflow
  block: sed-opal: check size of shadow mbr
  block: sed-opal: ioctl for writing to shadow mbr
  block: sed-opal: add ioctl for done-mark of shadow mbr
  block: never take page references for ITER_BVEC
  direct-io: use bio_release_pages in dio_bio_complete
  block_dev: use bio_release_pages in bio_unmap_user
  block_dev: use bio_release_pages in blkdev_bio_end_io
  iomap: use bio_release_pages in iomap_dio_bio_end_io
  block: use bio_release_pages in bio_map_user_iov
  block: use bio_release_pages in bio_unmap_user
  block: optionally mark pages dirty in bio_release_pages
  block: move the BIO_NO_PAGE_REF check into bio_release_pages
  block: skd_main.c: Remove call to memset after dma_alloc_coherent
  block: mtip32xx: Remove call to memset after dma_alloc_coherent
  ...

3b99107f

Merge tag 'devprop-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0415052d

由 Linus Torvalds 提交于 7月 09, 2019

Pull device properties framework updates from Rafael Wysocki:
 "These add helpers for counting items in a property array and extend
  the "software nodes" support to be more convenient for representing
  device properties supplied by drivers and make the intel_cht_int33fe
  driver use that.

  Specifics:

   - Add helpers to count items in a property array (Andy Shevchenko).

   - Extend "software nodes" support to be more convenient for
     representing device properties supplied by drivers (Heikki
     Krogerus).

   - Add device_find_child_by_name() helper to the driver core (Heikki
     Krogerus).

   - Extend device connection code to also look for references provided
     via fwnode pointers (Heikki Krogerus).

   - Start to register proper struct device objects for USB Type-C muxes
     and orientation switches (Heikki Krogerus).

   - Update the intel_cht_int33fe driver to describe devices in a more
     general way with the help of "software nodes" (Heikki Krogerus)"

* tag 'devprop-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  device property: Add helpers to count items in an array
  platform/x86: intel_cht_int33fe: Replacing the old connections with references
  platform/x86: intel_cht_int33fe: Supply fwnodes for the external dependencies
  platform/x86: intel_cht_int33fe: Provide fwnode for the USB connector
  platform/x86: intel_cht_int33fe: Provide software nodes for the devices
  platform/x86: intel_cht_int33fe: Remove unused fusb302 device property
  platform/x86: intel_cht_int33fe: Register max17047 in its own function
  usb: typec: Registering real device entries for the muxes
  device connection: Find connections also by checking the references
  device property: Introduce fwnode_find_reference()
  ACPI / property: Don't limit named child node matching to data nodes
  driver core: Add helper device_find_child_by_name()
  software node: Add software_node_get_reference_args()
  software node: Use kobject name when finding child nodes by name
  software node: Add support for static node descriptors
  software node: Simplify software_node_release() function
  software node: Allow node creation without properties

0415052d

Merge tag 'acpi-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 4b470452

由 Linus Torvalds 提交于 7月 09, 2019

Pull ACPI updates from Rafael Wysocki:
 "These update the ACPICA code in the kernel to upstream revision
  20190703, fix up the handling of GPEs in ACPICA, allow some more ACPI
  code to be built on ARM64 platforms, allow BGRT to be overridden, fix
  minor issues and clean up assorted pieces of ACPI code.

  Specifics:

   - Update the ACPICA code in the kernel to upstream revision 20190703
     including:
       - Initial/default namespace creation simplification (Bob Moore).
       - Object initialization sequence update (Bob Moore).
       - Removal of legacy module-level (dead) code (Erik Schmauss).
       - Table load object initialization update (Erik Schmauss,
         Nikolaus Voss).

   - Fix GPE enabling issue in ACPICA causing premature wakeups from
     suspend-to-idle to occur (Rafael Wysocki).

   - Allow ACPI AC and battery drivers to be built on non-X86 (Ard
     Biesheuvel).

   - Fix address space handler removal in the ACPI PMIC driver for Intel
     platforms (Andy Shevchenko).

   - Allow BGRT to be overridden via initrd or configfs (Andrea
     Oliveri).

   - Fix object resolution on table loads via configfs (Nikolaus Voss).

   - Clean up assorted pieces of ACPI code and tools (Colin Ian King,
     Liguang Zhang, Masahiro Yamada).

   - Fix documentation build warning, convert the extcon document to
     ReST and add it to the ACPI documentation (Mauro Carvalho Chehab,
     Qian Cai)"

* tag 'acpi-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / APEI: Remove needless __ghes_check_estatus() calls
  ACPICA: Update version to 20190703
  ACPICA: Update table load object initialization
  ACPICA: Update for object initialization sequence
  ACPICA: remove legacy module-level code due to deprecation
  ACPICA: Namespace: simplify creation of the initial/default namespace
  ACPI / PMIC: intel: Drop double removal of address space handler
  ACPI: APD: remove redundant assignment to pointer clk
  docs: extcon: convert it to ReST and move to ACPI dir
  ACPI: Make AC and battery drivers available on !X86
  ACPICA: Clear status of GPEs on first direct enable
  ACPI: configfs: Resolve objects on host-directed table loads
  ACPI: tables: Allow BGRT to be overridden
  ACPI: OSL: Make a W=1 kernel-doc warning go away
  ACPI: tools: Exclude tools/* from .gitignore patterns

4b470452

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功