提交 · 1a37621658fe06b10cf8bac02c32304d2a1c888c · openeuler / Kernel

23 7月, 2018 2 次提交

nvme: use hw qid in trace events · 5d87eb94

由 Keith Busch 提交于 6月 29, 2018

We can not match a command to its completion based on the command
id alone. We need the submitting queue identifier to pair with the
completion, so this patch adds that to the trace buffer.

This patch is also collapsing the admin and IO submission traces into a
single one so we don't need to duplicate this and creating unnecessary
code branches: we know if the command is an admin vs IO based on the qid.

And since we're here, the patch fixes code formatting in the area.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
[hch: move the qid helper to nvme.h and made it an inline function]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5d87eb94

nvme: cache struct nvme_ctrl reference to struct nvme_request · 59e29ce6

由 Sagi Grimberg 提交于 6月 29, 2018

We will need to reference the controller in the setup and completion
time for tracing and future traffic based keep alive support.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

59e29ce6

22 6月, 2018 1 次提交

nvme-pci: limit max IO size and segments to avoid high order allocations · 943e942e

由 Jens Axboe 提交于 6月 21, 2018

nvme requires an sg table allocation for each request. If the request
is large, then the allocation can become quite large. For instance,
with our default software settings of 1280KB IO size, we'll need
10248 bytes of sg table. That turns into a 2nd order allocation,
which we can't always guarantee. If we fail the allocation, blk-mq
will retry it later. But there's no guarantee that we'll EVER be
able to allocate that much contigious memory.

Limit the IO size such that we never need more than a single page
of memory. That's a lot faster and more reliable. Then back that
allocation with a mempool, so that we know we'll always be able
to succeed the allocation at some point.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

943e942e

14 6月, 2018 1 次提交

nvme: remove nvme_reinit_tagset · 14dfa400

由 Christoph Hellwig 提交于 6月 14, 2018

Unused now that all transports stopped using it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>

14dfa400

09 6月, 2018 1 次提交

nvme: cleanup double shift issue · 77016199

由 Dan Carpenter 提交于 6月 07, 2018

The problem here is that set_bit() and test_bit() take a bit number so
we should be passing 0 but instead we're passing (1 << 0) which leads to
a double shift.  It doesn't cause a runtime bug in the current code
because it's done consistently and we only set that one bit.

I decided to just re-use NVME_AER_NOTICE_NS_CHANGED instead of
introducing a new define for this.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

77016199

01 6月, 2018 3 次提交

nvme: use the changed namespaces list log to clear ns data changed AENs · 30d90964

由 Christoph Hellwig 提交于 5月 25, 2018

Per section 5.2 we need to issue the corresponding log page to clear an
AEN, so for a namespace data changed AEN we need to read the changed
namespace list log.  And once we read that log anyway we might as well
use it to optimize the rescan.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

30d90964

nvme: mark nvme_queue_scan static · 50e8d8ee

由 Christoph Hellwig 提交于 5月 25, 2018

And move it toward the top of the file to avoid a forward declaration.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

50e8d8ee

nvme: submit AEN event configuration on startup · c0561f82

由 Hannes Reinecke 提交于 5月 22, 2018

We should register for AEN events; some law-abiding targets might
not be sending us AENs otherwise.
Signed-off-by: NHannes Reinecke <hare@suse.com>
[hch: slight cleanups]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

c0561f82

23 5月, 2018 1 次提交

nvme: fix lockdep warning in nvme_mpath_clear_current_path · 978628ec

由 Johannes Thumshirn 提交于 5月 17, 2018

When running blktest's nvme/005 with a lockdep enabled kernel the test
case fails due to the following lockdep splat in dmesg:

 =============================
 WARNING: suspicious RCU usage
 4.17.0-rc5 #881 Not tainted
 -----------------------------
 drivers/nvme/host/nvme.h:457 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 3 locks held by kworker/u32:5/1102:
  #0:         (ptrval) ((wq_completion)"nvme-wq"){+.+.}, at: process_one_work+0x152/0x5c0
  #1:         (ptrval) ((work_completion)(&ctrl->scan_work)){+.+.}, at: process_one_work+0x152/0x5c0
  #2:         (ptrval) (&subsys->lock#2){+.+.}, at: nvme_ns_remove+0x43/0x1c0 [nvme_core]

The only caller of nvme_mpath_clear_current_path() is nvme_ns_remove()
which holds the subsys lock so it's likely a false positive, but when
using rcu_access_pointer(), we're telling rcu and lockdep that we're
only after the pointer falue.

Fixes: 32acab31 ("nvme: implement multipath access to nvme subsystems")
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

978628ec

19 5月, 2018 1 次提交
- C
  nvme: mark the result argument to nvme_complete_async_event volatile · 287a63eb
  由 Christoph Hellwig 提交于 5月 17, 2018
```
We'll need that in the PCIe driver soon as we'll read it straight off the
CQ.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
  287a63eb
12 5月, 2018 1 次提交

nvme: add quirk to force medium priority for SQ creation · 9abd68ef

由 Jens Axboe 提交于 5月 08, 2018

Some P3100 drives have a bug where they think WRRU (weighted round robin)
is always enabled, even though the host doesn't set it. Since they think
it's enabled, they also look at the submission queue creation priority. We
used to set that to MEDIUM by default, but that was removed in commit
81c1cd98. This causes various issues on that drive. Add a quirk to
still set MEDIUM priority for that controller.

Fixes: 81c1cd98 ("nvme/pci: Don't set reserved SQ create flags")
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

9abd68ef

03 5月, 2018 1 次提交

nvme/multipath: Fix multipath disabled naming collisions · a785dbcc

由 Keith Busch 提交于 4月 26, 2018

When CONFIG_NVME_MULTIPATH is set, but we're not using nvme to multipath,
namespaces with multiple paths were not creating unique names due to
reusing the same instance number from the namespace's head.

This patch fixes this by falling back to the non-multipath naming method
when the parameter disabled using multipath.
Reported-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a785dbcc

12 4月, 2018 3 次提交

nvme: expand nvmf_check_if_ready checks · bb06ec31

由 James Smart 提交于 4月 12, 2018

The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.

The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports now call the routine directly. The routine is now a
fabrics routine rather than an inline function.

The routine now looks at controller state to decide the action to
take. Some states mandate io failure. Others define the condition where
a command can be accepted. When the decision is unclear, a generic
queue-or-reject check is made to look for failfast or multipath ios and
only fails the io if it is so marked. Otherwise, the io will be queued
and wait for the controller state to resolve.

Admin commands issued via ioctl share a live admin queue with commands
from the transport for controller init. The ioctls could be intermixed
with the initialization commands. It's possible for the ioctl cmd to
be issued prior to the controller being enabled. To block this, the
ioctl admin commands need to be distinguished from admin commands used
for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
reflect this division and set it on ioctls requests. As the
nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
ensure that commands allocated by the ioctl path (actually anything
in core.c) preps the nvme_req(req) before starting the io. This will
preserve the USERCMD flag during execution and/or retry.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.e>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb06ec31

nvme: unexport nvme_start_keep_alive · 00b683db

由 Johannes Thumshirn 提交于 4月 12, 2018

nvme_start_keep_alive() isn't used outside core.c so unexport it and
make it static.
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00b683db

nvme: enforce 64bit offset for nvme_get_log_ext fn · 7ec6074f

由 Matias Bjørling 提交于 4月 12, 2018

Compiling on 32 bits system produces a warning for the shift width
when shifting 32 bit integer with 64bit integer.

Make sure that offset always is 64bit, and use macros for retrieving
lower and upper bits of the offset.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7ec6074f

30 3月, 2018 1 次提交

nvme: lightnvm: add late setup of block size and metadata · 96257a8a

由 Matias Bjørling 提交于 3月 30, 2018

The nvme driver sets up the size of the nvme namespace in two steps.
First it initializes the device with standard logical block and
metadata sizes, and then sets the correct logical block and metadata
size. Due to the OCSSD 2.0 specification relies on the namespace to
expose these sizes for correct initialization, let it be updated
appropriately on the LightNVM side as well.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

96257a8a

26 3月, 2018 4 次提交

nvme: make nvme_get_log_ext non-static · d558fb51

由 Matias Bjørling 提交于 3月 21, 2018

Enable the lightnvm integration to use the nvme_get_log_ext()
function.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d558fb51

nvme: Add .stop_ctrl to nvme ctrl ops · b435ecea

由 Nitzan Carmi 提交于 3月 20, 2018

For consistancy reasons, any fabric-specific works
(e.g error recovery/reconnect) should be canceled in
nvme_stop_ctrl, as for all other NVMe pending works
(e.g. scan, keep alive).

The patch aims to simplify the logic of the code, as
we now only rely on a vague demand from any fabric
to flush its private workqueues at the beginning of
.delete_ctrl op.
Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b435ecea

nvme: change namespaces_mutext to namespaces_rwsem · 765cc031

由 Jianchao Wang 提交于 2月 12, 2018

namespaces_mutext is used to synchronize the operations on ctrl
namespaces list. Most of the time, it is a read operation.

On the other hand, there are many interfaces in nvme core that
need this lock, such as nvme_wait_freeze, and even more interfaces
will be added. If we use mutex here, circular dependency could be
introduced easily. For example:
context A                  context B
nvme_xxx                   nvme_xxx
hold namespaces_mutext     require namespaces_mutext
sync context B

So it is better to change it from mutex to rwsem.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

765cc031

nvme: Add fault injection feature · b9e03857

由 Thomas Tai 提交于 2月 08, 2018

Linux's fault injection framework provides a systematic way to support
error injection via debugfs in the /sys/kernel/debug directory. This
patch uses the framework to add error injection to NVMe driver. The
fault injection source code is stored in a separate file and only linked
if CONFIG_FAULT_INJECTION_DEBUG_FS kernel config is selected.

Once the error injection is enabled, NVME_SC_INVALID_OPCODE with no
retry will be injected into the nvme_end_request. Users can change
the default status code and no retry flag via debufs. Following example
shows how to enable and inject an error. For more examples, refer to
Documentation/fault-injection/nvme-fault-injection.txt

How to enable nvme fault injection:

First, enable CONFIG_FAULT_INJECTION_DEBUG_FS kernel config,
recompile the kernel. After booting up the kernel, do the
following.

How to inject an error:

mount /dev/nvme0n1 /mnt
echo 1 > /sys/kernel/debug/nvme0n1/fault_inject/times
echo 100 > /sys/kernel/debug/nvme0n1/fault_inject/probability
cp a.file /mnt

Expected Result:

cp: cannot stat ‘/mnt/a.file’: Input/output error

Message from dmesg:

FAULT_INJECTION: forcing a failure.
name fault_inject, interval 1, probability 100, space 0, times 1
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc8+ #2
Hardware name: innotek GmbH VirtualBox/VirtualBox,
BIOS VirtualBox 12/01/2006
Call Trace:
  <IRQ>
  dump_stack+0x5c/0x7d
  should_fail+0x148/0x170
  nvme_should_fail+0x2f/0x50 [nvme_core]
  nvme_process_cq+0xe7/0x1d0 [nvme]
  nvme_irq+0x1e/0x40 [nvme]
  __handle_irq_event_percpu+0x3a/0x190
  handle_irq_event_percpu+0x30/0x70
  handle_irq_event+0x36/0x60
  handle_fasteoi_irq+0x78/0x120
  handle_irq+0xa7/0x130
  ? tick_irq_enter+0xa8/0xc0
  do_IRQ+0x43/0xc0
  common_interrupt+0xa2/0xa2
  </IRQ>
RIP: 0010:native_safe_halt+0x2/0x10
RSP: 0018:ffffffff82003e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
RAX: ffffffff817a10c0 RBX: ffffffff82012480 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 000000008e38ce64 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82012480
R13: ffffffff82012480 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x4/0x4
  default_idle+0x18/0xf0
  do_idle+0x150/0x1d0
  cpu_startup_entry+0x6f/0x80
  start_kernel+0x4c4/0x4e4
  ? set_init_arg+0x55/0x55
  secondary_startup_64+0xa5/0xb0
  print_req_error: I/O error, dev nvme0n1, sector 9240
EXT4-fs error (device nvme0n1): ext4_find_entry:1436:
inode #2: comm cp: reading directory lblock 0
Signed-off-by: NThomas Tai <thomas.tai@oracle.com>
Reviewed-by: NEric Saint-Etienne <eric.saint.etienne@oracle.com>
Signed-off-by: NKarl Volz <karl.volz@oracle.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b9e03857

07 3月, 2018 1 次提交

Revert "nvme: create 'slaves' and 'holders' entries for hidden controllers" · 8a30ecc6

由 Christoph Hellwig 提交于 3月 07, 2018

This reverts commit e9a48034.

The slaves and holders link for the hidden gendisks confuse lsblk so that
it errors out on, or doesn't report the nvme multipath devices.  Given
that we don't need holder relationships for something that can't even be
directly accessed we should just stop creating those links.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NPotnuri Bharat Teja <bharat@chelsio.com>
Cc: stable@vger.kernel.org
Signed-off-by: NKeith Busch <keith.busch@intel.com>

8a30ecc6

13 2月, 2018 1 次提交

nvme: Don't use a stack buffer for keep-alive command · 0a34e466

由 Roland Dreier 提交于 1月 11, 2018

In nvme_keep_alive() we pass a request with a pointer to an NVMe command on
the stack into blk_execute_rq_nowait(). However, the block layer doesn't
guarantee that the request is fully queued before blk_execute_rq_nowait()
returns. If not, and the request is queued after nvme_keep_alive() returns,
then we'll end up using stack memory that might have been overwritten to
form the NVMe command we pass to hardware.

Fix this by keeping a special command struct in the nvme_ctrl struct right
next to the delayed work struct used for keep-alives.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

0a34e466

09 2月, 2018 1 次提交

nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING · ad6a0a52

由 Max Gurtovoy 提交于 1月 31, 2018

In pci transport, this state is used to mark the initialization
process. This should be also used in other transports as well.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

ad6a0a52

16 1月, 2018 1 次提交

nvme: host delete_work and reset_work on separate workqueues · b227c59b

由 Roy Shterman 提交于 1月 14, 2018

We need to ensure that delete_work will be hosted on a different
workqueue than all the works we flush or cancel from it.
Otherwise we may hit a circular dependency warning [1].

Also, given that delete_work flushes reset_work, host reset_work
on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
fix the flushing in the individual drivers to flush nvme_delete_wq
when draining queued deletes.

[1]:
[  178.491942] =============================================
[  178.492718] [ INFO: possible recursive locking detected ]
[  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G           OE
[  178.494382] ---------------------------------------------
[  178.495160] kworker/5:1/135 is trying to acquire lock:
[  178.495894]  (
[  178.496120] "nvme-wq"
[  178.496471] ){++++.+}
[  178.496599] , at:
[  178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0
[  178.497670]
               but task is already holding lock:
[  178.498499]  (
[  178.498724] "nvme-wq"
[  178.499074] ){++++.+}
[  178.499202] , at:
[  178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.500343]
               other info that might help us debug this:
[  178.501269]  Possible unsafe locking scenario:

[  178.502113]        CPU0
[  178.502472]        ----
[  178.502829]   lock(
[  178.503115] "nvme-wq"
[  178.503467] );
[  178.503716]   lock(
[  178.504001] "nvme-wq"
[  178.504353] );
[  178.504601]
                *** DEADLOCK ***

[  178.505441]  May be due to missing lock nesting notation

[  178.506453] 2 locks held by kworker/5:1/135:
[  178.507068]  #0:
[  178.507330]  (
[  178.507598] "nvme-wq"
[  178.507726] ){++++.+}
[  178.508079] , at:
[  178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.509004]  #1:
[  178.509265]  (
[  178.509532] (&ctrl->delete_work)
[  178.509795] ){+.+.+.}
[  178.510145] , at:
[  178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.511070]
               stack backtrace:
:
[  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G           OE   4.9.0-rc4-c844263313a8-lb #3
[  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
[  178.515071]  ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80
[  178.516195]  ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700
[  178.517318]  ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e
[  178.518443] Call Trace:
[  178.518807]  [<ffffffffa7450823>] dump_stack+0x85/0xc2
[  178.519542]  [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0
[  178.520377]  [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30
[  178.521330]  [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0
[  178.522174]  [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0
[  178.522975]  [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220
[  178.523753]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
[  178.524535]  [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0
[  178.525291]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
[  178.526077]  [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220
[  178.527040]  [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0
[  178.527907]  [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40
[  178.528726]  [<ffffffffa71cb507>] ? printk+0x48/0x50
[  178.529434]  [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20
[  178.530381]  [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
[  178.531314]  [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
[  178.532271]  [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0
[  178.533101]  [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0
[  178.533954]  [<ffffffffa70adc4e>] worker_thread+0x4e/0x490
[  178.534735]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
[  178.535588]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
[  178.536441]  [<ffffffffa70b48cf>] kthread+0xff/0x120
[  178.537149]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
[  178.538094]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
[  178.538900]  [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40
Signed-off-by: NRoy Shterman <roys@lightbitslabs.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b227c59b

15 1月, 2018 1 次提交

nvme-pci: serialize pci resets · 79c48ccf

由 Sagi Grimberg 提交于 1月 14, 2018

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

79c48ccf

11 1月, 2018 1 次提交

nvme/multipath: Consult blk_status_t for failover · 908e4564

由 Keith Busch 提交于 1月 09, 2018

This removes nvme multipath's specific status decoding to see if failover
is needed, using the generic blk_status_t that was decoded earlier. This
abstraction from the raw NVMe status means all status decoding exists
in one place.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

908e4564

08 1月, 2018 1 次提交

nvme-pci: fix NULL pointer reference in nvme_alloc_ns · 2b1b7e78

由 Jianchao Wang 提交于 1月 06, 2018

When the io queues setup or tagset allocation failed, ctrl.tagset is
NULL.  But the scan work will still be queued and executed, then panic
comes up due to NULL pointer reference of ctrl.tagset.

To fix this, add a new ctrl state NVME_CTRL_ADMIN_ONLY to inidcate only
admin queue is live. When non io queues or tagset allocation failed, ctrl
enters into this state, scan work will not be started.  But async event
work and nvme dev ioctl will be still available.  This will be helpful to
do further investigation and recovery.
Suggested-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2b1b7e78

29 12月, 2017 1 次提交

nvme-mpath: fix last path removal during traffic · 479a322f

由 Sagi Grimberg 提交于 12月 21, 2017

In case our last path is removed during traffic, we can end up requeueing
the bio(s) but never schedule the actual requeue work as upper layers
still have open handles on the mpath device node.

Fix this by scheduling requeue work if the namespace being removed is
the last path in the ns_head path list.

Fixes: 32acab31 ("nvme: implement multipath access to nvme subsystems")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

479a322f

23 11月, 2017 1 次提交

nvme-pci: add quirk for delay before CHK RDY for WDC SN200 · 8c97eecc

由 Jeff Lien 提交于 11月 21, 2017

And increase the existing delay to cover this device as well.

Cc: stable@vger.kernel.org
Signed-off-by: NJeff Lien <jeff.lien@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8c97eecc

11 11月, 2017 11 次提交

nvme: create 'slaves' and 'holders' entries for hidden controllers · e9a48034

由 Hannes Reinecke 提交于 11月 09, 2017

When creating nvme multipath devices we should populate the 'slaves' and
'holders' directorys properly to aid userspace topology detection.
Signed-off-by: NHannes Reinecke <hare@suse.com>
[hch: split from a larger patch, compile fix for CONFIG_NVME_MULTIPATH=n]
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9a48034

nvme: also expose the namespace identification sysfs files for mpath nodes · 5b85b826

由 Christoph Hellwig 提交于 11月 09, 2017

We do this by adding a helper that returns the ns_head for a device that
can belong to either the per-controller or per-subsystem block device
nodes, and otherwise reuse all the existing code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5b85b826

nvme: implement multipath access to nvme subsystems · 32acab31

由 Christoph Hellwig 提交于 11月 02, 2017

This patch adds native multipath support to the nvme driver.  For each
namespace we create only single block device node, which can be used
to access that namespace through any of the controllers that refer to it.
The gendisk for each controllers path to the name space still exists
inside the kernel, but is hidden from userspace.  The character device
nodes are still available on a per-controller basis.  A new link from
the sysfs directory for the subsystem allows to find all controllers
for a given subsystem.

Currently we will always send I/O to the first available path, this will
be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
ratified and implemented, at which point we will look at the ANA state
for each namespace.  Another possibility that was prototyped is to
use the path that is closes to the submitting NUMA code, which will be
mostly interesting for PCI, but might also be useful for RDMA or FC
transports in the future.  There is not plan to implement round robin
or I/O service time path selectors, as those are not scalable with
the performance rates provided by NVMe.

The multipath device will go away once all paths to it disappear,
any delay to keep it alive needs to be implemented at the controller
level.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

32acab31

nvme: track shared namespaces · ed754e5d

由 Christoph Hellwig 提交于 11月 09, 2017

Introduce a new struct nvme_ns_head that holds information about an actual
namespace, unlike struct nvme_ns, which only holds the per-controller
namespace information.  For private namespaces there is a 1:1 relation of
the two, but for shared namespaces this lets us discover all the paths to
it.  For now only the identifiers are moved to the new structure, but most
of the information in struct nvme_ns should eventually move over.

To allow lockless path lookup the list of nvme_ns structures per
nvme_ns_head is protected by SRCU, which requires freeing the nvme_ns
structure through call_srcu.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed754e5d

nvme: introduce a nvme_ns_ids structure · 002fab04

由 Christoph Hellwig 提交于 11月 09, 2017

This allows us to manage the various uniqueue namespace identifiers
together instead needing various variables and arguments.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

002fab04

nvme: track subsystems · ab9e00cc

由 Christoph Hellwig 提交于 11月 09, 2017

This adds a new nvme_subsystem structure so that we can track multiple
controllers that belong to a single subsystem.  For now we only use it
to store the NQN, and to check that we don't have duplicate NQNs unless
the involved subsystems support multiple controllers.

Includes code originally from Hannes Reinecke to expose the subsystems
in sysfs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ab9e00cc

block, nvme: Introduce blk_mq_req_flags_t · 9a95e4ef

由 Bart Van Assche 提交于 11月 09, 2017

Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9a95e4ef

nvme: send uevent for some asynchronous events · e3d7874d

由 Keith Busch 提交于 11月 07, 2017

This will give udev a chance to observe and handle asynchronous event
notifications and clear the log to unmask future events of the same type.
The driver will create a change uevent of the asyncronuos event result
before submitting the next AEN request to the device if a completed AEN
event is of type error, smart, command set or vendor specific,
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e3d7874d

nvme: unexport starting async event work · d99ca609

由 Keith Busch 提交于 11月 07, 2017

Async event work is for core use only and should not be called directly
from drivers.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d99ca609

nvme: remove handling of multiple AEN requests · ad22c355

由 Keith Busch 提交于 11月 07, 2017

The driver can handle tracking only one AEN request, so this patch
removes handling for multiple ones.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart  <james.smart@broadcom.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad22c355

nvme: centralize AEN defines · 38dabe21

由 Keith Busch 提交于 11月 07, 2017

All the transports were unnecessarilly duplicating the AEN request
accounting. This patch defines everything in one place.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38dabe21

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功