提交 · 923218f6166a84688973acdc39094f3bee1e9ad4 · openanolis / cloud-kernel

05 11月, 2017 9 次提交

blk-mq: don't allocate driver tag upfront for flush rq · 923218f6

由 Ming Lei 提交于 11月 02, 2017

The idea behind it is simple:

1) for none scheduler, driver tag has to be borrowed for flush rq,
   otherwise we may run out of tag, and that causes an IO hang. And
   get/put driver tag is actually noop for none, so reordering tags
   isn't necessary at all.

2) for a real I/O scheduler, we need not allocate a driver tag upfront
   for flush rq. It works just fine to follow the same approach as
   normal requests: allocate driver tag for each rq just before calling
   ->queue_rq().

One driver visible change is that the driver tag isn't shared in the
flush request sequence. That won't be a problem, since we always do that
in legacy path.

Then flush rq need not be treated specially wrt. get/put driver tag.
This cleans up the code - for instance, reorder_tags_to_front() can be
removed, and we needn't worry about request ordering in dispatch list
for avoiding I/O deadlock.

Also we have to put the driver tag before requeueing.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

923218f6

blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h · 244c65a3

由 Ming Lei 提交于 11月 04, 2017

We need this helper to put the driver tag for flush rq, since we will
not share tag in the flush request sequence in the following patch
in case that I/O scheduler is applied.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

244c65a3

blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ · a6a252e6

由 Ming Lei 提交于 11月 02, 2017

In case of IO scheduler we always pre-allocate one driver tag before
calling blk_insert_flush(), and flush request will be marked as
RQF_FLUSH_SEQ once it is in flush machinery.

So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush() to handle
the request, otherwise the flush request is dispatched to ->dispatch
list directly.

This is a preparation patch for not preallocating a driver tag for flush
requests, and for not treating flush requests as a special case. This is
similar to what the legacy path does.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a6a252e6

blk-flush: use blk_mq_request_bypass_insert() · 598906f8

由 Ming Lei 提交于 11月 02, 2017

In the following patch, we will use RQF_FLUSH_SEQ to decide:

1) if the flag isn't set, the flush rq need to be inserted via
blk_insert_flush()

2) otherwise, the flush rq need to be dispatched directly since
it is in flush machinery now.

So we use blk_mq_request_bypass_insert() for requests of bypassing
flush machinery, just like the legacy path did.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

598906f8

block: pass 'run_queue' to blk_mq_request_bypass_insert · b0850297

由 Ming Lei 提交于 11月 02, 2017

Block flush need this function without running the queue, so add a
parameter controlling whether we run it or not.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0850297

blk-flush: don't run queue for requests bypassing flush · 9c71c83c

由 Ming Lei 提交于 11月 02, 2017

blk_insert_flush() should only insert request since run queue always
follows it.

In case of bypassing flush, we don't need to run queue because every
blk_insert_flush() follows one run queue.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9c71c83c

blk-mq: put the driver tag of nxt rq before first one is requeued · 6d6f167c

由 Jianchao Wang 提交于 11月 02, 2017

When freeing the driver tag of the next rq with an I/O scheduler
configured, we get the first entry of the list. However, this can
race with requeue of a request, and we end up getting the wrong request
from the head of the list. Free the driver tag of next rq before the
failed one is requeued in the failure branch of queue_rq callback.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6d6f167c

blkcg: add sanity check for blkcg policy operations · e8401073

由 weiping zhang 提交于 10月 17, 2017

blkcg policy should keep cpd/pd's alloc_fn and free_fn in pairs,
otherwise policy would register fail.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8401073

blk-mq: don't handle failure in .get_budget · 88022d72

由 Ming Lei 提交于 11月 05, 2017

It is enough to just check if we can get the budget via .get_budget().
And we don't need to deal with device state change in .get_budget().

For SCSI, one issue to be fixed is that we have to call
scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
to handle the request. And it isn't enough to simply call
blk_mq_end_request() to do that if this request is marked as
RQF_DONTPREP.

Fixes: 0df21c86(scsi: implement .get_budget and .put_budget for blk-mq)
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

88022d72

04 11月, 2017 13 次提交

SCSI: don't get target/host busy_count in scsi_mq_get_budget() · 826a70a0

由 Ming Lei 提交于 11月 04, 2017

It is very expensive to atomic_inc/atomic_dec the host wide counter of
host->busy_count, and it should have been avoided via blk-mq's mechanism
of getting driver tag, which uses the more efficient way of sbitmap queue.

Also we don't check atomic_read(&sdev->device_busy) in scsi_mq_get_budget()
and don't run queue if the counter becomes zero, so IO hang may be caused
if all requests are completed just before the current SCSI device
is added to shost->starved_list.

Fixes: 0df21c86(scsi: implement .get_budget and .put_budget for blk-mq)
Reported-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

826a70a0

block: fix peeking requests during PM · e4f36b24

由 Christoph Hellwig 提交于 10月 20, 2017

We need to look for an active PM request until the next softbarrier
instead of looking for the first non-PM request.  Otherwise any cause
of request reordering might starve the PM request(s).
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e4f36b24

blk-mq: Make blk_mq_get_request() error path less confusing · 21e768b4

由 Bart Van Assche 提交于 10月 16, 2017

blk_mq_get_tag() can modify data->ctx. This means that in the
error path of blk_mq_get_request() data->ctx should be passed to
blk_mq_put_ctx() instead of local_ctx. Note: since blk_mq_put_ctx()
ignores its argument, this patch does not change any functionality.

References: commit 1ad43c00 ("blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed")
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21e768b4

blk-mq: fix nr_requests wrong value when modify it from sysfs · c2e82a23

由 weiping zhang 提交于 9月 22, 2017

if blk-mq use "none" io scheduler, nr_request get a wrong value when
input a number > tag_set->queue_depth. blk_mq_tag_update_depth will get
the smaller one min(nr, set->queue_depth), and then q->nr_request get a
wrong value.

Reproduce:

echo none > /sys/block/nvme0n1/queue/scheduler
echo 1000000 > /sys/block/nvme0n1/queue/nr_requests
cat /sys/block/nvme0n1/queue/nr_requests
1000000
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c2e82a23

cdrom: hide CONFIG_CDROM menu selection · a116895f

由 Jens Axboe 提交于 11月 03, 2017

We don't need to expose this. The point is that drivers select
the uniform CDROM layer, if they need it, the user should not
have to make a conscious decision on whether to include this
separately or not.

Fixes: 2a750166 ("block: Rework drivers/cdrom/Makefile")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a116895f

block: add a poll_fn callback to struct request_queue · ea435e1b

由 Christoph Hellwig 提交于 11月 02, 2017

That we we can also poll non blk-mq queues.  Mostly needed for
the NVMe multipath code, but could also be useful elsewhere.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ea435e1b

block: introduce GENHD_FL_HIDDEN · 8ddcd653

由 Christoph Hellwig 提交于 11月 02, 2017

With this flag a driver can create a gendisk that can be used for I/O
submission inside the kernel, but which is not registered as user
facing block device.  This will be useful for the NVMe multipath
implementation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8ddcd653

block: don't look at the struct device dev_t in disk_devt · 517bf3c3

由 Christoph Hellwig 提交于 11月 02, 2017

The hidden gendisks introduced in the next patch need to keep the dev
field in their struct device empty so that udev won't try to create
block device nodes for them.  To support that rewrite disk_devt to
look at the major and first_minor fields in the gendisk itself instead
of looking into the struct device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

517bf3c3

block: add a blk_steal_bios helper · ef71de8b

由 Christoph Hellwig 提交于 11月 02, 2017

This helpers allows to bounce steal the uncompleted bios from a request so
that they can be reissued on another path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef71de8b

block: provide a direct_make_request helper · f421e1d9

由 Christoph Hellwig 提交于 11月 02, 2017

This helper allows reinserting a bio into a new queue without much
overhead, but requires all queue limits to be the same for the upper
and lower queues, and it does not provide any recursion preventions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f421e1d9

block: add REQ_DRV bit · 96222bcc

由 Christoph Hellwig 提交于 11月 02, 2017

Set aside a bit in the request/bio flags for driver use.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

96222bcc

block: move REQ_NOWAIT · 8977f563

由 Christoph Hellwig 提交于 11月 02, 2017

This flag should be before the operation-specific REQ_NOUNMAP bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8977f563

Merge branch 'nvme-4.15' of git://git.infradead.org/nvme into for-4.15/block · 3e2cb3ad

由 Jens Axboe 提交于 11月 03, 2017

Pull NVMe changes from Christoph:

"Below are the currently queue nvme updates for Linux 4.15.  There are
a few more things that could make it for this merge window, but I'd
like to get things into linux-next, especially for the unlikely case
that Linus decided to cut -rc8.

Highlights:
 - support for SGLs in the PCIe driver (Chaitanya Kulkarni)
 - disable I/O schedulers for the admin queue (Israel Rukshin)
 - various Fibre Channel fixes and enhancements (James Smart)
 - various refactoring for better code sharing between transports
   (Sagi Grimberg and me)

as well as lots of little bits from various contributors."

3e2cb3ad

03 11月, 2017 1 次提交

nvme: comment typo fixed in clearing AER · a806c6c8

由 Minwoo Im 提交于 11月 02, 2017

a tiny typo fixed in a comment.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a806c6c8

02 11月, 2017 2 次提交

skd: use ktime_get_real_seconds() · 474f5da2

由 Arnd Bergmann 提交于 11月 02, 2017

Like many storage drivers, skd uses an unsigned 32-bit number for
interchanging the current time with the firmware. This will overflow in
y2106 and is otherwise safe.

However, the get_seconds() function is generally considered deprecated
since the behavior is different between 32-bit and 64-bit architectures,
and using it may indicate a bigger problem.

To annotate that we've thought about this, let's add a comment here
and migrate to the ktime_get_real_seconds() function that consistently
returns a 64-bit number.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

474f5da2

block: fix CDROM dependency on BLK_DEV · c091fbe9

由 Arnd Bergmann 提交于 11月 02, 2017

After the cdrom cleanup, I get randconfig warnings for some configurations:

warning: (BLK_DEV_IDECD && BLK_DEV_SR) selects CDROM which has unmet direct dependencies (BLK_DEV)

This adds an explicit BLK_DEV dependency for both drivers. The other
drivers that select 'CDROM' already have this and don't need a change.

Fixes: 2a750166 ("block: Rework drivers/cdrom/Makefile")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c091fbe9

01 11月, 2017 15 次提交

nvme: Remove unused headers · 3639efef

由 Keith Busch 提交于 10月 20, 2017

Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3639efef

nvmet: fix fatal_err_work deadlock · a96d4bd8

由 James Smart 提交于 10月 27, 2017

Below is a stack trace for an issue that was reported.

What's happening is that the nvmet layer had it's controller kato
timeout fire, which causes it to schedule its fatal error handler
via the fatal_err_work element. The error handler is invoked, which
calls the transport delete_ctrl() entry point, and as the transport
tears down the controller, nvmet_sq_destroy ends up doing the final
put on the ctlr causing it to enter its free routine. The ctlr free
routine does a cancel_work_sync() on fatal_err_work element, which
then does a flush_work and wait_for_completion. But, as the wait is
in the context of the work element being flushed, its in a catch-22
and the thread hangs.

[  326.903131] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[  326.909832] nvmet: ctrl 1 fatal error occurred!
[  327.643100] lpfc 0000:04:00.0: 0:6313 NVMET Defer ctx release xri
x114 flg x2
[  494.582064] INFO: task kworker/0:2:243 blocked for more than 120
seconds.
[  494.589638]       Not tainted 4.14.0-rc1.James+ #1
[  494.594986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  494.603718] kworker/0:2     D    0   243      2 0x80000000
[  494.609839] Workqueue: events nvmet_fatal_error_handler [nvmet]
[  494.616447] Call Trace:
[  494.619177]  __schedule+0x28d/0x890
[  494.623070]  schedule+0x36/0x80
[  494.626571]  schedule_timeout+0x1dd/0x300
[  494.631044]  ? dequeue_task_fair+0x592/0x840
[  494.635810]  ? pick_next_task_fair+0x23b/0x5c0
[  494.640756]  wait_for_completion+0x121/0x180
[  494.645521]  ? wake_up_q+0x80/0x80
[  494.649315]  flush_work+0x11d/0x1a0
[  494.653206]  ? wake_up_worker+0x30/0x30
[  494.657484]  __cancel_work_timer+0x10b/0x190
[  494.662249]  cancel_work_sync+0x10/0x20
[  494.666525]  nvmet_ctrl_put+0xa3/0x100 [nvmet]
[  494.671482]  nvmet_sq_:q+0x64/0xd0 [nvmet]
[  494.676540]  nvmet_fc_delete_target_queue+0x202/0x220 [nvmet_fc]
[  494.683245]  nvmet_fc_delete_target_assoc+0x6d/0xc0 [nvmet_fc]
[  494.689743]  nvmet_fc_delete_ctrl+0x137/0x1a0 [nvmet_fc]
[  494.695673]  nvmet_fatal_error_handler+0x30/0x40 [nvmet]
[  494.701589]  process_one_work+0x149/0x360
[  494.706064]  worker_thread+0x4d/0x3c0
[  494.710148]  kthread+0x109/0x140
[  494.713751]  ? rescuer_thread+0x380/0x380
[  494.718214]  ? kthread_park+0x60/0x60

Correct by having the fc transport convert to a different workq context
for the actual controller teardown which may call the cancel_work_sync.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a96d4bd8

nvme-fc: add dev_loss_tmo timeout and remoteport resume support · 2b632970