提交 772c8f6f 编写于 作者: L Linus Torvalds

Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:

 - blk-mq scheduling framework from me and Omar, with a port of the
   deadline scheduler for this framework. A port of BFQ from Paolo is in
   the works, and should be ready for 4.12.

 - Various fixups and improvements to the above scheduling framework
   from Omar, Paolo, Bart, me, others.

 - Cleanup of the exported sysfs blk-mq data into debugfs, from Omar.
   This allows us to export more information that helps debug hangs or
   performance issues, without cluttering or abusing the sysfs API.

 - Fixes for the sbitmap code, the scalable bitmap code that was
   migrated from blk-mq, from Omar.

 - Removal of the BLOCK_PC support in struct request, and refactoring of
   carrying SCSI payloads in the block layer. This cleans up the code
   nicely, and enables us to kill the SCSI specific parts of struct
   request, shrinking it down nicely. From Christoph mainly, with help
   from Hannes.

 - Support for ranged discard requests and discard merging, also from
   Christoph.

 - Support for OPAL in the block layer, and for NVMe as well. Mainly
   from Scott Bauer, with fixes/updates from various others folks.

 - Error code fixup for gdrom from Christophe.

 - cciss pci irq allocation cleanup from Christoph.

 - Making the cdrom device operations read only, from Kees Cook.

 - Fixes for duplicate bdi registrations and bdi/queue life time
   problems from Jan and Dan.

 - Set of fixes and updates for lightnvm, from Matias and Javier.

 - A few fixes for nbd from Josef, using idr to name devices and a
   workqueue deadlock fix on receive. Also marks Josef as the current
   maintainer of nbd.

 - Fix from Josef, overwriting queue settings when the number of
   hardware queues is updated for a blk-mq device.

 - NVMe fix from Keith, ensuring that we don't repeatedly mark and IO
   aborted, if we didn't end up aborting it.

 - SG gap merging fix from Ming Lei for block.

 - Loop fix also from Ming, fixing a race and crash between setting loop
   status and IO.

 - Two block race fixes from Tahsin, fixing request list iteration and
   fixing a race between device registration and udev device add
   notifiations.

 - Double free fix from cgroup writeback, from Tejun.

 - Another double free fix in blkcg, from Hou Tao.

 - Partition overflow fix for EFI from Alden Tondettar.

* tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits)
  nvme: Check for Security send/recv support before issuing commands.
  block/sed-opal: allocate struct opal_dev dynamically
  block/sed-opal: tone down not supported warnings
  block: don't defer flushes on blk-mq + scheduling
  blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers
  blk-mq: don't special case flush inserts for blk-mq-sched
  blk-mq-sched: don't add flushes to the head of requeue queue
  blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not
  block: do not allow updates through sysfs until registration completes
  lightnvm: set default lun range when no luns are specified
  lightnvm: fix off-by-one error on target initialization
  Maintainers: Modify SED list from nvme to block
  Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN
  uapi: sed-opal fix IOW for activate lsp to use correct struct
  cdrom: Make device operations read-only
  elevator: fix loading wrong elevator type for blk-mq devices
  cciss: switch to pci_irq_alloc_vectors
  block/loop: fix race between I/O and set_status
  blk-mq-sched: don't hold queue_lock when calling exit_icq
  block: set make_request_fn manually in blk_mq_update_nr_hw_queues
  ...
......@@ -249,7 +249,6 @@ struct& cdrom_device_ops\ \{ \hidewidth\cr
unsigned\ long);\cr
\noalign{\medskip}
&const\ int& capability;& capability flags \cr
&int& n_minors;& number of active minor devices \cr
\};\cr
}
$$
......@@ -258,13 +257,7 @@ it should add a function pointer to this $struct$. When a particular
function is not implemented, however, this $struct$ should contain a
NULL instead. The $capability$ flags specify the capabilities of the
\cdrom\ hardware and/or low-level \cdrom\ driver when a \cdrom\ drive
is registered with the \UCD. The value $n_minors$ should be a positive
value indicating the number of minor devices that are supported by
the low-level device driver, normally~1. Although these two variables
are `informative' rather than `operational,' they are included in
$cdrom_device_ops$ because they describe the capability of the {\em
driver\/} rather than the {\em drive}. Nomenclature has always been
difficult in computer programming.
is registered with the \UCD.
Note that most functions have fewer parameters than their
$blkdev_fops$ counterparts. This is because very little of the
......
......@@ -8620,10 +8620,10 @@ S: Maintained
F: drivers/net/ethernet/netronome/
NETWORK BLOCK DEVICE (NBD)
M: Markus Pargmann <mpa@pengutronix.de>
M: Josef Bacik <jbacik@fb.com>
S: Maintained
L: linux-block@vger.kernel.org
L: nbd-general@lists.sourceforge.net
T: git git://git.pengutronix.de/git/mpa/linux-nbd.git
F: Documentation/blockdev/nbd.txt
F: drivers/block/nbd.c
F: include/uapi/linux/nbd.h
......@@ -11097,6 +11097,17 @@ L: linux-mmc@vger.kernel.org
S: Maintained
F: drivers/mmc/host/sdhci-spear.c
SECURE ENCRYPTING DEVICE (SED) OPAL DRIVER
M: Scott Bauer <scott.bauer@intel.com>
M: Jonathan Derrick <jonathan.derrick@intel.com>
M: Rafael Antognolli <rafael.antognolli@intel.com>
L: linux-block@vger.kernel.org
S: Supported
F: block/sed*
F: block/opal_proto.h
F: include/linux/sed*
F: include/uapi/linux/sed*
SECURITY SUBSYSTEM
M: James Morris <james.l.morris@oracle.com>
M: "Serge E. Hallyn" <serge@hallyn.com>
......
......@@ -49,9 +49,13 @@ config LBDAF
If unsure, say Y.
config BLK_SCSI_REQUEST
bool
config BLK_DEV_BSG
bool "Block layer SG support v4"
default y
select BLK_SCSI_REQUEST
help
Saying Y here will enable generic SG (SCSI generic) v4 support
for any block device.
......@@ -71,6 +75,7 @@ config BLK_DEV_BSGLIB
bool "Block layer SG support v4 helper lib"
default n
select BLK_DEV_BSG
select BLK_SCSI_REQUEST
help
Subsystems will normally enable this if needed. Users will not
normally need to manually enable this.
......@@ -147,6 +152,25 @@ config BLK_WBT_MQ
Multiqueue currently doesn't have support for IO scheduling,
enabling this option is recommended.
config BLK_DEBUG_FS
bool "Block layer debugging information in debugfs"
default y
depends on DEBUG_FS
---help---
Include block layer debugging information in debugfs. This information
is mostly useful for kernel developers, but it doesn't incur any cost
at runtime.
Unless you are building a kernel for a tiny system, you should
say Y here.
config BLK_SED_OPAL
bool "Logic for interfacing with Opal enabled SEDs"
---help---
Builds Logic for interfacing with Opal enabled controllers.
Enabling this option enables users to setup/unlock/lock
Locking ranges for SED devices using the Opal protocol.
menu "Partition Types"
source "block/partitions/Kconfig"
......
......@@ -63,6 +63,56 @@ config DEFAULT_IOSCHED
default "cfq" if DEFAULT_CFQ
default "noop" if DEFAULT_NOOP
config MQ_IOSCHED_DEADLINE
tristate "MQ deadline I/O scheduler"
default y
---help---
MQ version of the deadline IO scheduler.
config MQ_IOSCHED_NONE
bool
default y
choice
prompt "Default single-queue blk-mq I/O scheduler"
default DEFAULT_SQ_NONE
help
Select the I/O scheduler which will be used by default for blk-mq
managed block devices with a single queue.
config DEFAULT_SQ_DEADLINE
bool "MQ Deadline" if MQ_IOSCHED_DEADLINE=y
config DEFAULT_SQ_NONE
bool "None"
endchoice
config DEFAULT_SQ_IOSCHED
string
default "mq-deadline" if DEFAULT_SQ_DEADLINE
default "none" if DEFAULT_SQ_NONE
choice
prompt "Default multi-queue blk-mq I/O scheduler"
default DEFAULT_MQ_NONE
help
Select the I/O scheduler which will be used by default for blk-mq
managed block devices with multiple queues.
config DEFAULT_MQ_DEADLINE
bool "MQ Deadline" if MQ_IOSCHED_DEADLINE=y
config DEFAULT_MQ_NONE
bool "None"
endchoice
config DEFAULT_MQ_IOSCHED
string
default "mq-deadline" if DEFAULT_MQ_DEADLINE
default "none" if DEFAULT_MQ_NONE
endmenu
endif
......@@ -6,11 +6,12 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-tag.o blk-sysfs.o \
blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
blk-mq-sysfs.o blk-mq-cpumap.o ioctl.o \
genhd.o scsi_ioctl.o partition-generic.o ioprio.o \
blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
genhd.o partition-generic.o ioprio.o \
badblocks.o partitions/
obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_BOUNCE) += bounce.o
obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o
obj-$(CONFIG_BLK_DEV_BSG) += bsg.o
obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o
obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o
......@@ -18,6 +19,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
obj-$(CONFIG_MQ_IOSCHED_DEADLINE) += mq-deadline.o
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o
......@@ -25,3 +27,5 @@ obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
obj-$(CONFIG_BLK_MQ_PCI) += blk-mq-pci.o
obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o
obj-$(CONFIG_BLK_WBT) += blk-wbt.o
obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o
obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o
......@@ -1227,9 +1227,6 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
if (!bio)
goto out_bmd;
if (iter->type & WRITE)
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
ret = 0;
if (map_data) {
......@@ -1394,16 +1391,10 @@ struct bio *bio_map_user_iov(struct request_queue *q,
kfree(pages);
/*
* set data direction, and check if mapped pages need bouncing
*/
if (iter->type & WRITE)
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
bio_set_flag(bio, BIO_USER_MAPPED);
/*
* subtle -- if __bio_map_user() ended up bouncing a bio,
* subtle -- if bio_map_user_iov() ended up bouncing a bio,
* it would normally disappear when its bi_end_io is run.
* however, we need it for the unmap, so grab an extra
* reference to it
......@@ -1445,8 +1436,8 @@ static void __bio_unmap_user(struct bio *bio)
* bio_unmap_user - unmap a bio
* @bio: the bio being unmapped
*
* Unmap a bio previously mapped by bio_map_user(). Must be called with
* a process context.
* Unmap a bio previously mapped by bio_map_user_iov(). Must be called from
* process context.
*
* bio_unmap_user() may sleep.
*/
......@@ -1590,7 +1581,6 @@ struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len,
bio->bi_private = data;
} else {
bio->bi_end_io = bio_copy_kern_endio;
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
}
return bio;
......
......@@ -184,7 +184,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
goto err_free_blkg;
}
wb_congested = wb_congested_get_create(&q->backing_dev_info,
wb_congested = wb_congested_get_create(q->backing_dev_info,
blkcg->css.id,
GFP_NOWAIT | __GFP_NOWARN);
if (!wb_congested) {
......@@ -469,8 +469,8 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
const char *blkg_dev_name(struct blkcg_gq *blkg)
{
/* some drivers (floppy) instantiate a queue w/o disk registered */
if (blkg->q->backing_dev_info.dev)
return dev_name(blkg->q->backing_dev_info.dev);
if (blkg->q->backing_dev_info->dev)
return dev_name(blkg->q->backing_dev_info->dev);
return NULL;
}
EXPORT_SYMBOL_GPL(blkg_dev_name);
......@@ -1079,10 +1079,8 @@ int blkcg_init_queue(struct request_queue *q)
if (preloaded)
radix_tree_preload_end();
if (IS_ERR(blkg)) {
blkg_free(new_blkg);
if (IS_ERR(blkg))
return PTR_ERR(blkg);
}
q->root_blkg = blkg;
q->root_rl.blkg = blkg;
......@@ -1223,7 +1221,10 @@ int blkcg_activate_policy(struct request_queue *q,
if (blkcg_policy_enabled(q, pol))
return 0;
blk_queue_bypass_start(q);
if (q->mq_ops)
blk_mq_freeze_queue(q);
else
blk_queue_bypass_start(q);
pd_prealloc:
if (!pd_prealloc) {
pd_prealloc = pol->pd_alloc_fn(GFP_KERNEL, q->node);
......@@ -1261,7 +1262,10 @@ int blkcg_activate_policy(struct request_queue *q,
spin_unlock_irq(q->queue_lock);
out_bypass_end:
blk_queue_bypass_end(q);
if (q->mq_ops)
blk_mq_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
if (pd_prealloc)
pol->pd_free_fn(pd_prealloc);
return ret;
......@@ -1284,7 +1288,11 @@ void blkcg_deactivate_policy(struct request_queue *q,
if (!blkcg_policy_enabled(q, pol))
return;
blk_queue_bypass_start(q);
if (q->mq_ops)
blk_mq_freeze_queue(q);
else
blk_queue_bypass_start(q);
spin_lock_irq(q->queue_lock);
__clear_bit(pol->plid, q->blkcg_pols);
......@@ -1304,7 +1312,11 @@ void blkcg_deactivate_policy(struct request_queue *q,
}
spin_unlock_irq(q->queue_lock);
blk_queue_bypass_end(q);
if (q->mq_ops)
blk_mq_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
}
EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
......
此差异已折叠。
......@@ -9,11 +9,7 @@
#include <linux/sched/sysctl.h>
#include "blk.h"
/*
* for max sense size
*/
#include <scsi/scsi_cmnd.h>
#include "blk-mq-sched.h"
/**
* blk_end_sync_rq - executes a completion event on a request
......@@ -55,7 +51,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;
WARN_ON(irqs_disabled());
WARN_ON(rq->cmd_type == REQ_TYPE_FS);
WARN_ON(!blk_rq_is_passthrough(rq));
rq->rq_disk = bd_disk;
rq->end_io = done;
......@@ -65,7 +61,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
* be reused after dying flag is set
*/
if (q->mq_ops) {
blk_mq_insert_request(rq, at_head, true, false);
blk_mq_sched_insert_request(rq, at_head, true, false, false);
return;
}
......@@ -100,16 +96,9 @@ int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
struct request *rq, int at_head)
{
DECLARE_COMPLETION_ONSTACK(wait);
char sense[SCSI_SENSE_BUFFERSIZE];
int err = 0;
unsigned long hang_check;
if (!rq->sense) {
memset(sense, 0, sizeof(sense));
rq->sense = sense;
rq->sense_len = 0;
}
rq->end_io_data = &wait;
blk_execute_rq_nowait(q, bd_disk, rq, at_head, blk_end_sync_rq);
......@@ -123,11 +112,6 @@ int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
if (rq->errors)
err = -EIO;
if (rq->sense == sense) {
rq->sense = NULL;
rq->sense_len = 0;
}
return err;
}
EXPORT_SYMBOL(blk_execute_rq);
......@@ -74,6 +74,7 @@
#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-tag.h"
#include "blk-mq-sched.h"
/* FLUSH/FUA sequences */
enum {
......@@ -296,8 +297,14 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
if (fq->flush_pending_idx != fq->flush_running_idx || list_empty(pending))
return false;
/* C2 and C3 */
/* C2 and C3
*
* For blk-mq + scheduling, we can risk having all driver tags
* assigned to empty flushes, and we deadlock if we are expecting
* other requests to make progress. Don't defer for that case.
*/
if (!list_empty(&fq->flush_data_in_flight) &&
!(q->mq_ops && q->elevator) &&
time_before(jiffies,
fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
return false;
......@@ -326,7 +333,6 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
}
flush_rq->cmd_type = REQ_TYPE_FS;
flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
flush_rq->rq_flags |= RQF_FLUSH_SEQ;
flush_rq->rq_disk = first_rq->rq_disk;
......@@ -391,9 +397,10 @@ static void mq_flush_data_end_io(struct request *rq, int error)
* the comment in flush_end_io().
*/
spin_lock_irqsave(&fq->mq_flush_lock, flags);
if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
blk_mq_run_hw_queue(hctx, true);
blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error);
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
blk_mq_run_hw_queue(hctx, true);
}
/**
......@@ -453,9 +460,9 @@ void blk_insert_flush(struct request *rq)
*/
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops) {
blk_mq_insert_request(rq, false, true, false);
} else
if (q->mq_ops)
blk_mq_sched_insert_request(rq, false, true, false, false);
else
list_add_tail(&rq->queuelist, &q->queue_head);
return;
}
......@@ -545,11 +552,10 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
if (!fq)
goto fail;
if (q->mq_ops) {
if (q->mq_ops)
spin_lock_init(&fq->mq_flush_lock);
rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
}
rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
fq->flush_rq = kzalloc_node(rq_sz, GFP_KERNEL, node);
if (!fq->flush_rq)
goto fail_rq;
......
......@@ -443,10 +443,10 @@ void blk_integrity_revalidate(struct gendisk *disk)
return;
if (bi->profile)
disk->queue->backing_dev_info.capabilities |=
disk->queue->backing_dev_info->capabilities |=
BDI_CAP_STABLE_WRITES;
else
disk->queue->backing_dev_info.capabilities &=
disk->queue->backing_dev_info->capabilities &=
~BDI_CAP_STABLE_WRITES;
}
......
......@@ -35,7 +35,10 @@ static void icq_free_icq_rcu(struct rcu_head *head)
kmem_cache_free(icq->__rcu_icq_cache, icq);
}
/* Exit an icq. Called with both ioc and q locked. */
/*
* Exit an icq. Called with both ioc and q locked for sq, only ioc locked for
* mq.
*/
static void ioc_exit_icq(struct io_cq *icq)
{
struct elevator_type *et = icq->q->elevator->type;
......@@ -43,8 +46,10 @@ static void ioc_exit_icq(struct io_cq *icq)
if (icq->flags & ICQ_EXITED)
return;
if (et->ops.elevator_exit_icq_fn)
et->ops.elevator_exit_icq_fn(icq);
if (et->uses_mq && et->ops.mq.exit_icq)
et->ops.mq.exit_icq(icq);
else if (!et->uses_mq && et->ops.sq.elevator_exit_icq_fn)
et->ops.sq.elevator_exit_icq_fn(icq);
icq->flags |= ICQ_EXITED;
}
......@@ -164,6 +169,7 @@ EXPORT_SYMBOL(put_io_context);
*/
void put_io_context_active(struct io_context *ioc)
{
struct elevator_type *et;
unsigned long flags;
struct io_cq *icq;
......@@ -182,13 +188,19 @@ void put_io_context_active(struct io_context *ioc)
hlist_for_each_entry(icq, &ioc->icq_list, ioc_node) {
if (icq->flags & ICQ_EXITED)
continue;
if (spin_trylock(icq->q->queue_lock)) {
et = icq->q->elevator->type;
if (et->uses_mq) {
ioc_exit_icq(icq);
spin_unlock(icq->q->queue_lock);
} else {
spin_unlock_irqrestore(&ioc->lock, flags);
cpu_relax();
goto retry;
if (spin_trylock(icq->q->queue_lock)) {
ioc_exit_icq(icq);
spin_unlock(icq->q->queue_lock);
} else {
spin_unlock_irqrestore(&ioc->lock, flags);
cpu_relax();
goto retry;
}
}
}
spin_unlock_irqrestore(&ioc->lock, flags);
......@@ -383,8 +395,10 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q,
if (likely(!radix_tree_insert(&ioc->icq_tree, q->id, icq))) {
hlist_add_head(&icq->ioc_node, &ioc->icq_list);
list_add(&icq->q_node, &q->icq_list);
if (et->ops.elevator_init_icq_fn)
et->ops.elevator_init_icq_fn(icq);
if (et->uses_mq && et->ops.mq.init_icq)
et->ops.mq.init_icq(icq);
else if (!et->uses_mq && et->ops.sq.elevator_init_icq_fn)
et->ops.sq.elevator_init_icq_fn(icq);
} else {
kmem_cache_free(et->icq_cache, icq);
icq = ioc_lookup_icq(ioc, q);
......
......@@ -16,8 +16,6 @@
int blk_rq_append_bio(struct request *rq, struct bio *bio)
{
if (!rq->bio) {
rq->cmd_flags &= REQ_OP_MASK;
rq->cmd_flags |= (bio->bi_opf & REQ_OP_MASK);
blk_rq_bio_prep(rq->q, rq, bio);
} else {
if (!ll_back_merge_fn(rq->q, rq, bio))
......@@ -62,6 +60,9 @@ static int __blk_rq_map_user_iov(struct request *rq,
if (IS_ERR(bio))
return PTR_ERR(bio);
bio->bi_opf &= ~REQ_OP_MASK;
bio->bi_opf |= req_op(rq);
if (map_data && map_data->null_mapped)
bio_set_flag(bio, BIO_NULL_MAPPED);
......@@ -90,7 +91,7 @@ static int __blk_rq_map_user_iov(struct request *rq,
}
/**
* blk_rq_map_user_iov - map user data to a request, for REQ_TYPE_BLOCK_PC usage
* blk_rq_map_user_iov - map user data to a request, for passthrough requests
* @q: request queue where request should be inserted
* @rq: request to map data to
* @map_data: pointer to the rq_map_data holding pages (if necessary)
......@@ -199,7 +200,7 @@ int blk_rq_unmap_user(struct bio *bio)
EXPORT_SYMBOL(blk_rq_unmap_user);
/**
* blk_rq_map_kern - map kernel data to a request, for REQ_TYPE_BLOCK_PC usage
* blk_rq_map_kern - map kernel data to a request, for passthrough requests
* @q: request queue where request should be inserted
* @rq: request to fill
* @kbuf: the kernel buffer
......@@ -234,8 +235,8 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
if (IS_ERR(bio))
return PTR_ERR(bio);
if (!reading)
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
bio->bi_opf &= ~REQ_OP_MASK;
bio->bi_opf |= req_op(rq);
if (do_copy)
rq->rq_flags |= RQF_COPY_USER;
......
......@@ -482,13 +482,6 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
}
EXPORT_SYMBOL(blk_rq_map_sg);
static void req_set_nomerge(struct request_queue *q, struct request *req)
{
req->cmd_flags |= REQ_NOMERGE;
if (req == q->last_merge)
q->last_merge = NULL;
}
static inline int ll_new_hw_segment(struct request_queue *q,
struct request *req,
struct bio *bio)
......@@ -659,31 +652,32 @@ static void blk_account_io_merge(struct request *req)
}
/*
* Has to be called with the request spinlock acquired
* For non-mq, this has to be called with the request spinlock acquired.
* For mq with scheduling, the appropriate queue wide lock should be held.
*/
static int attempt_merge(struct request_queue *q, struct request *req,
struct request *next)
static struct request *attempt_merge(struct request_queue *q,
struct request *req, struct request *next)
{
if (!rq_mergeable(req) || !rq_mergeable(next))
return 0;
return NULL;
if (req_op(req) != req_op(next))
return 0;
return NULL;
/*
* not contiguous
*/
if (blk_rq_pos(req) + blk_rq_sectors(req) != blk_rq_pos(next))
return 0;
return NULL;
if (rq_data_dir(req) != rq_data_dir(next)
|| req->rq_disk != next->rq_disk
|| req_no_special_merge(next))
return 0;
return NULL;
if (req_op(req) == REQ_OP_WRITE_SAME &&
!blk_write_same_mergeable(req->bio, next->bio))
return 0;
return NULL;
/*
* If we are allowed to merge, then append bio list
......@@ -692,7 +686,7 @@ static int attempt_merge(struct request_queue *q, struct request *req,
* counts here.
*/
if (!ll_merge_requests_fn(q, req, next))
return 0;
return NULL;
/*
* If failfast settings disagree or any of the two is already
......@@ -732,42 +726,51 @@ static int attempt_merge(struct request_queue *q, struct request *req,
if (blk_rq_cpu_valid(next))
req->cpu = next->cpu;
/* owner-ship of bio passed from next to req */
/*
* ownership of bio passed from next to req, return 'next' for
* the caller to free
*/
next->bio = NULL;
__blk_put_request(q, next);
return 1;
return next;
}
int attempt_back_merge(struct request_queue *q, struct request *rq)
struct request *attempt_back_merge(struct request_queue *q, struct request *rq)
{
struct request *next = elv_latter_request(q, rq);
if (next)
return attempt_merge(q, rq, next);
return 0;
return NULL;
}
int attempt_front_merge(struct request_queue *q, struct request *rq)
struct request *attempt_front_merge(struct request_queue *q, struct request *rq)
{
struct request *prev = elv_former_request(q, rq);
if (prev)
return attempt_merge(q, prev, rq);
return 0;
return NULL;
}
int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
struct request *next)
{
struct elevator_queue *e = q->elevator;
struct request *free;
if (e->type->ops.elevator_allow_rq_merge_fn)
if (!e->type->ops.elevator_allow_rq_merge_fn(q, rq, next))
if (!e->uses_mq && e->type->ops.sq.elevator_allow_rq_merge_fn)
if (!e->type->ops.sq.elevator_allow_rq_merge_fn(q, rq, next))
return 0;
return attempt_merge(q, rq, next);
free = attempt_merge(q, rq, next);
if (free) {
__blk_put_request(q, free);
return 1;
}
return 0;
}
bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
......@@ -798,9 +801,12 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
return true;
}
int blk_try_merge(struct request *rq, struct bio *bio)
enum elv_merge blk_try_merge(struct request *rq, struct bio *bio)
{
if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector)
if (req_op(rq) == REQ_OP_DISCARD &&
queue_max_discard_segments(rq->q) > 1)
return ELEVATOR_DISCARD_MERGE;
else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector)
return ELEVATOR_BACK_MERGE;
else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector)
return ELEVATOR_FRONT_MERGE;
......
/*
* Copyright (C) 2017 Facebook
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License v2 as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <https://www.gnu.org/licenses/>.
*/
#include <linux/kernel.h>
#include <linux/blkdev.h>
#include <linux/debugfs.h>
#include <linux/blk-mq.h>
#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-tag.h"
struct blk_mq_debugfs_attr {
const char *name;
umode_t mode;
const struct file_operations *fops;
};
static int blk_mq_debugfs_seq_open(struct inode *inode, struct file *file,
const struct seq_operations *ops)
{
struct seq_file *m;
int ret;
ret = seq_open(file, ops);
if (!ret) {
m = file->private_data;
m->private = inode->i_private;
}
return ret;
}
static int hctx_state_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
seq_printf(m, "0x%lx\n", hctx->state);
return 0;
}
static int hctx_state_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_state_show, inode->i_private);
}
static const struct file_operations hctx_state_fops = {
.open = hctx_state_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_flags_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
seq_printf(m, "0x%lx\n", hctx->flags);
return 0;
}
static int hctx_flags_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_flags_show, inode->i_private);
}
static const struct file_operations hctx_flags_fops = {
.open = hctx_flags_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static int blk_mq_debugfs_rq_show(struct seq_file *m, void *v)
{
struct request *rq = list_entry_rq(v);
seq_printf(m, "%p {.cmd_flags=0x%x, .rq_flags=0x%x, .tag=%d, .internal_tag=%d}\n",
rq, rq->cmd_flags, (__force unsigned int)rq->rq_flags,
rq->tag, rq->internal_tag);
return 0;
}
static void *hctx_dispatch_start(struct seq_file *m, loff_t *pos)
__acquires(&hctx->lock)
{
struct blk_mq_hw_ctx *hctx = m->private;
spin_lock(&hctx->lock);
return seq_list_start(&hctx->dispatch, *pos);
}
static void *hctx_dispatch_next(struct seq_file *m, void *v, loff_t *pos)
{
struct blk_mq_hw_ctx *hctx = m->private;
return seq_list_next(v, &hctx->dispatch, pos);
}
static void hctx_dispatch_stop(struct seq_file *m, void *v)
__releases(&hctx->lock)
{
struct blk_mq_hw_ctx *hctx = m->private;
spin_unlock(&hctx->lock);
}
static const struct seq_operations hctx_dispatch_seq_ops = {
.start = hctx_dispatch_start,
.next = hctx_dispatch_next,
.stop = hctx_dispatch_stop,
.show = blk_mq_debugfs_rq_show,
};
static int hctx_dispatch_open(struct inode *inode, struct file *file)
{
return blk_mq_debugfs_seq_open(inode, file, &hctx_dispatch_seq_ops);
}
static const struct file_operations hctx_dispatch_fops = {
.open = hctx_dispatch_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,
};
static int hctx_ctx_map_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
sbitmap_bitmap_show(&hctx->ctx_map, m);
return 0;
}
static int hctx_ctx_map_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_ctx_map_show, inode->i_private);
}
static const struct file_operations hctx_ctx_map_fops = {
.open = hctx_ctx_map_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static void blk_mq_debugfs_tags_show(struct seq_file *m,
struct blk_mq_tags *tags)
{
seq_printf(m, "nr_tags=%u\n", tags->nr_tags);
seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
seq_printf(m, "active_queues=%d\n",
atomic_read(&tags->active_queues));
seq_puts(m, "\nbitmap_tags:\n");
sbitmap_queue_show(&tags->bitmap_tags, m);
if (tags->nr_reserved_tags) {
seq_puts(m, "\nbreserved_tags:\n");
sbitmap_queue_show(&tags->breserved_tags, m);
}
}
static int hctx_tags_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
struct request_queue *q = hctx->queue;
int res;
res = mutex_lock_interruptible(&q->sysfs_lock);
if (res)
goto out;
if (hctx->tags)
blk_mq_debugfs_tags_show(m, hctx->tags);
mutex_unlock(&q->sysfs_lock);
out:
return res;
}
static int hctx_tags_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_tags_show, inode->i_private);
}
static const struct file_operations hctx_tags_fops = {
.open = hctx_tags_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_tags_bitmap_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
struct request_queue *q = hctx->queue;
int res;
res = mutex_lock_interruptible(&q->sysfs_lock);
if (res)
goto out;
if (hctx->tags)
sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
mutex_unlock(&q->sysfs_lock);
out:
return res;
}
static int hctx_tags_bitmap_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_tags_bitmap_show, inode->i_private);
}
static const struct file_operations hctx_tags_bitmap_fops = {
.open = hctx_tags_bitmap_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_sched_tags_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
struct request_queue *q = hctx->queue;
int res;
res = mutex_lock_interruptible(&q->sysfs_lock);
if (res)
goto out;
if (hctx->sched_tags)
blk_mq_debugfs_tags_show(m, hctx->sched_tags);
mutex_unlock(&q->sysfs_lock);
out:
return res;
}
static int hctx_sched_tags_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_sched_tags_show, inode->i_private);
}
static const struct file_operations hctx_sched_tags_fops = {
.open = hctx_sched_tags_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_sched_tags_bitmap_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
struct request_queue *q = hctx->queue;
int res;
res = mutex_lock_interruptible(&q->sysfs_lock);
if (res)
goto out;
if (hctx->sched_tags)
sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
mutex_unlock(&q->sysfs_lock);
out:
return res;
}
static int hctx_sched_tags_bitmap_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_sched_tags_bitmap_show, inode->i_private);
}
static const struct file_operations hctx_sched_tags_bitmap_fops = {
.open = hctx_sched_tags_bitmap_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_io_poll_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
seq_printf(m, "considered=%lu\n", hctx->poll_considered);
seq_printf(m, "invoked=%lu\n", hctx->poll_invoked);
seq_printf(m, "success=%lu\n", hctx->poll_success);
return 0;
}
static int hctx_io_poll_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_io_poll_show, inode->i_private);
}
static ssize_t hctx_io_poll_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_hw_ctx *hctx = m->private;
hctx->poll_considered = hctx->poll_invoked = hctx->poll_success = 0;
return count;
}
static const struct file_operations hctx_io_poll_fops = {
.open = hctx_io_poll_open,
.read = seq_read,
.write = hctx_io_poll_write,
.llseek = seq_lseek,
.release = single_release,
};
static void print_stat(struct seq_file *m, struct blk_rq_stat *stat)
{
seq_printf(m, "samples=%d, mean=%lld, min=%llu, max=%llu",
stat->nr_samples, stat->mean, stat->min, stat->max);
}
static int hctx_stats_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
struct blk_rq_stat stat[2];
blk_stat_init(&stat[BLK_STAT_READ]);
blk_stat_init(&stat[BLK_STAT_WRITE]);
blk_hctx_stat_get(hctx, stat);
seq_puts(m, "read: ");
print_stat(m, &stat[BLK_STAT_READ]);
seq_puts(m, "\n");
seq_puts(m, "write: ");
print_stat(m, &stat[BLK_STAT_WRITE]);
seq_puts(m, "\n");
return 0;
}
static int hctx_stats_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_stats_show, inode->i_private);
}
static ssize_t hctx_stats_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_hw_ctx *hctx = m->private;
struct blk_mq_ctx *ctx;
int i;
hctx_for_each_ctx(hctx, ctx, i) {
blk_stat_init(&ctx->stat[BLK_STAT_READ]);
blk_stat_init(&ctx->stat[BLK_STAT_WRITE]);
}
return count;
}
static const struct file_operations hctx_stats_fops = {
.open = hctx_stats_open,
.read = seq_read,
.write = hctx_stats_write,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_dispatched_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
int i;
seq_printf(m, "%8u\t%lu\n", 0U, hctx->dispatched[0]);
for (i = 1; i < BLK_MQ_MAX_DISPATCH_ORDER - 1; i++) {
unsigned int d = 1U << (i - 1);
seq_printf(m, "%8u\t%lu\n", d, hctx->dispatched[i]);
}
seq_printf(m, "%8u+\t%lu\n", 1U << (i - 1), hctx->dispatched[i]);
return 0;
}
static int hctx_dispatched_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_dispatched_show, inode->i_private);
}
static ssize_t hctx_dispatched_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_hw_ctx *hctx = m->private;
int i;
for (i = 0; i < BLK_MQ_MAX_DISPATCH_ORDER; i++)
hctx->dispatched[i] = 0;
return count;
}
static const struct file_operations hctx_dispatched_fops = {
.open = hctx_dispatched_open,
.read = seq_read,
.write = hctx_dispatched_write,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_queued_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
seq_printf(m, "%lu\n", hctx->queued);
return 0;
}
static int hctx_queued_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_queued_show, inode->i_private);
}
static ssize_t hctx_queued_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_hw_ctx *hctx = m->private;
hctx->queued = 0;
return count;
}
static const struct file_operations hctx_queued_fops = {
.open = hctx_queued_open,
.read = seq_read,
.write = hctx_queued_write,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_run_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
seq_printf(m, "%lu\n", hctx->run);
return 0;
}
static int hctx_run_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_run_show, inode->i_private);
}
static ssize_t hctx_run_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_hw_ctx *hctx = m->private;
hctx->run = 0;
return count;
}
static const struct file_operations hctx_run_fops = {
.open = hctx_run_open,
.read = seq_read,
.write = hctx_run_write,
.llseek = seq_lseek,
.release = single_release,
};
static int hctx_active_show(struct seq_file *m, void *v)
{
struct blk_mq_hw_ctx *hctx = m->private;
seq_printf(m, "%d\n", atomic_read(&hctx->nr_active));
return 0;
}
static int hctx_active_open(struct inode *inode, struct file *file)
{
return single_open(file, hctx_active_show, inode->i_private);
}
static const struct file_operations hctx_active_fops = {
.open = hctx_active_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static void *ctx_rq_list_start(struct seq_file *m, loff_t *pos)
__acquires(&ctx->lock)
{
struct blk_mq_ctx *ctx = m->private;
spin_lock(&ctx->lock);
return seq_list_start(&ctx->rq_list, *pos);
}
static void *ctx_rq_list_next(struct seq_file *m, void *v, loff_t *pos)
{
struct blk_mq_ctx *ctx = m->private;
return seq_list_next(v, &ctx->rq_list, pos);
}
static void ctx_rq_list_stop(struct seq_file *m, void *v)
__releases(&ctx->lock)
{
struct blk_mq_ctx *ctx = m->private;
spin_unlock(&ctx->lock);
}
static const struct seq_operations ctx_rq_list_seq_ops = {
.start = ctx_rq_list_start,
.next = ctx_rq_list_next,
.stop = ctx_rq_list_stop,
.show = blk_mq_debugfs_rq_show,
};
static int ctx_rq_list_open(struct inode *inode, struct file *file)
{
return blk_mq_debugfs_seq_open(inode, file, &ctx_rq_list_seq_ops);
}
static const struct file_operations ctx_rq_list_fops = {
.open = ctx_rq_list_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release,
};
static int ctx_dispatched_show(struct seq_file *m, void *v)
{
struct blk_mq_ctx *ctx = m->private;
seq_printf(m, "%lu %lu\n", ctx->rq_dispatched[1], ctx->rq_dispatched[0]);
return 0;
}
static int ctx_dispatched_open(struct inode *inode, struct file *file)
{
return single_open(file, ctx_dispatched_show, inode->i_private);
}
static ssize_t ctx_dispatched_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_ctx *ctx = m->private;
ctx->rq_dispatched[0] = ctx->rq_dispatched[1] = 0;
return count;
}
static const struct file_operations ctx_dispatched_fops = {
.open = ctx_dispatched_open,
.read = seq_read,
.write = ctx_dispatched_write,
.llseek = seq_lseek,
.release = single_release,
};
static int ctx_merged_show(struct seq_file *m, void *v)
{
struct blk_mq_ctx *ctx = m->private;
seq_printf(m, "%lu\n", ctx->rq_merged);
return 0;
}
static int ctx_merged_open(struct inode *inode, struct file *file)
{
return single_open(file, ctx_merged_show, inode->i_private);
}
static ssize_t ctx_merged_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_ctx *ctx = m->private;
ctx->rq_merged = 0;
return count;
}
static const struct file_operations ctx_merged_fops = {
.open = ctx_merged_open,
.read = seq_read,
.write = ctx_merged_write,
.llseek = seq_lseek,
.release = single_release,
};
static int ctx_completed_show(struct seq_file *m, void *v)
{
struct blk_mq_ctx *ctx = m->private;
seq_printf(m, "%lu %lu\n", ctx->rq_completed[1], ctx->rq_completed[0]);
return 0;
}
static int ctx_completed_open(struct inode *inode, struct file *file)
{
return single_open(file, ctx_completed_show, inode->i_private);
}
static ssize_t ctx_completed_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct seq_file *m = file->private_data;
struct blk_mq_ctx *ctx = m->private;
ctx->rq_completed[0] = ctx->rq_completed[1] = 0;
return count;
}
static const struct file_operations ctx_completed_fops = {
.open = ctx_completed_open,
.read = seq_read,
.write = ctx_completed_write,
.llseek = seq_lseek,
.release = single_release,
};
static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
{"state", 0400, &hctx_state_fops},
{"flags", 0400, &hctx_flags_fops},
{"dispatch", 0400, &hctx_dispatch_fops},
{"ctx_map", 0400, &hctx_ctx_map_fops},
{"tags", 0400, &hctx_tags_fops},
{"tags_bitmap", 0400, &hctx_tags_bitmap_fops},
{"sched_tags", 0400, &hctx_sched_tags_fops},
{"sched_tags_bitmap", 0400, &hctx_sched_tags_bitmap_fops},
{"io_poll", 0600, &hctx_io_poll_fops},
{"stats", 0600, &hctx_stats_fops},
{"dispatched", 0600, &hctx_dispatched_fops},
{"queued", 0600, &hctx_queued_fops},
{"run", 0600, &hctx_run_fops},
{"active", 0400, &hctx_active_fops},
{},
};
static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
{"rq_list", 0400, &ctx_rq_list_fops},
{"dispatched", 0600, &ctx_dispatched_fops},
{"merged", 0600, &ctx_merged_fops},
{"completed", 0600, &ctx_completed_fops},
{},
};
int blk_mq_debugfs_register(struct request_queue *q, const char *name)
{
if (!blk_debugfs_root)
return -ENOENT;
q->debugfs_dir = debugfs_create_dir(name, blk_debugfs_root);
if (!q->debugfs_dir)
goto err;
if (blk_mq_debugfs_register_hctxs(q))
goto err;
return 0;
err:
blk_mq_debugfs_unregister(q);
return -ENOMEM;
}
void blk_mq_debugfs_unregister(struct request_queue *q)
{
debugfs_remove_recursive(q->debugfs_dir);
q->mq_debugfs_dir = NULL;
q->debugfs_dir = NULL;
}
static bool debugfs_create_files(struct dentry *parent, void *data,
const struct blk_mq_debugfs_attr *attr)
{
for (; attr->name; attr++) {
if (!debugfs_create_file(attr->name, attr->mode, parent,
data, attr->fops))
return false;
}
return true;
}
static int blk_mq_debugfs_register_ctx(struct request_queue *q,
struct blk_mq_ctx *ctx,
struct dentry *hctx_dir)
{
struct dentry *ctx_dir;
char name[20];
snprintf(name, sizeof(name), "cpu%u", ctx->cpu);
ctx_dir = debugfs_create_dir(name, hctx_dir);
if (!ctx_dir)
return -ENOMEM;
if (!debugfs_create_files(ctx_dir, ctx, blk_mq_debugfs_ctx_attrs))
return -ENOMEM;
return 0;
}
static int blk_mq_debugfs_register_hctx(struct request_queue *q,
struct blk_mq_hw_ctx *hctx)
{
struct blk_mq_ctx *ctx;
struct dentry *hctx_dir;
char name[20];
int i;
snprintf(name, sizeof(name), "%u", hctx->queue_num);
hctx_dir = debugfs_create_dir(name, q->mq_debugfs_dir);
if (!hctx_dir)
return -ENOMEM;
if (!debugfs_create_files(hctx_dir, hctx, blk_mq_debugfs_hctx_attrs))
return -ENOMEM;
hctx_for_each_ctx(hctx, ctx, i) {
if (blk_mq_debugfs_register_ctx(q, ctx, hctx_dir))
return -ENOMEM;
}
return 0;
}
int blk_mq_debugfs_register_hctxs(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
int i;
if (!q->debugfs_dir)
return -ENOENT;
q->mq_debugfs_dir = debugfs_create_dir("mq", q->debugfs_dir);
if (!q->mq_debugfs_dir)
goto err;
queue_for_each_hw_ctx(q, hctx, i) {
if (blk_mq_debugfs_register_hctx(q, hctx))
goto err;
}
return 0;
err:
blk_mq_debugfs_unregister_hctxs(q);
return -ENOMEM;
}
void blk_mq_debugfs_unregister_hctxs(struct request_queue *q)
{
debugfs_remove_recursive(q->mq_debugfs_dir);
q->mq_debugfs_dir = NULL;
}
/*
* blk-mq scheduling framework
*
* Copyright (C) 2016 Jens Axboe
*/
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/blk-mq.h>
#include <trace/events/block.h>
#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-sched.h"
#include "blk-mq-tag.h"
#include "blk-wbt.h"
void blk_mq_sched_free_hctx_data(struct request_queue *q,
void (*exit)(struct blk_mq_hw_ctx *))
{
struct blk_mq_hw_ctx *hctx;
int i;
queue_for_each_hw_ctx(q, hctx, i) {
if (exit && hctx->sched_data)
exit(hctx);
kfree(hctx->sched_data);
hctx->sched_data = NULL;
}
}
EXPORT_SYMBOL_GPL(blk_mq_sched_free_hctx_data);
int blk_mq_sched_init_hctx_data(struct request_queue *q, size_t size,
int (*init)(struct blk_mq_hw_ctx *),
void (*exit)(struct blk_mq_hw_ctx *))
{
struct blk_mq_hw_ctx *hctx;
int ret;
int i;
queue_for_each_hw_ctx(q, hctx, i) {
hctx->sched_data = kmalloc_node(size, GFP_KERNEL, hctx->numa_node);
if (!hctx->sched_data) {
ret = -ENOMEM;
goto error;
}
if (init) {
ret = init(hctx);
if (ret) {
/*
* We don't want to give exit() a partially
* initialized sched_data. init() must clean up
* if it fails.
*/
kfree(hctx->sched_data);
hctx->sched_data = NULL;
goto error;
}
}
}
return 0;
error:
blk_mq_sched_free_hctx_data(q, exit);
return ret;
}
EXPORT_SYMBOL_GPL(blk_mq_sched_init_hctx_data);
static void __blk_mq_sched_assign_ioc(struct request_queue *q,
struct request *rq,
struct bio *bio,
struct io_context *ioc)
{
struct io_cq *icq;
spin_lock_irq(q->queue_lock);
icq = ioc_lookup_icq(ioc, q);
spin_unlock_irq(q->queue_lock);
if (!icq) {
icq = ioc_create_icq(ioc, q, GFP_ATOMIC);
if (!icq)
return;
}
rq->elv.icq = icq;
if (!blk_mq_sched_get_rq_priv(q, rq, bio)) {
rq->rq_flags |= RQF_ELVPRIV;
get_io_context(icq->ioc);
return;
}
rq->elv.icq = NULL;
}
static void blk_mq_sched_assign_ioc(struct request_queue *q,
struct request *rq, struct bio *bio)
{
struct io_context *ioc;
ioc = rq_ioc(bio);
if (ioc)
__blk_mq_sched_assign_ioc(q, rq, bio, ioc);
}
struct request *blk_mq_sched_get_request(struct request_queue *q,
struct bio *bio,
unsigned int op,
struct blk_mq_alloc_data *data)
{
struct elevator_queue *e = q->elevator;
struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx;
struct request *rq;
blk_queue_enter_live(q);
ctx = blk_mq_get_ctx(q);
hctx = blk_mq_map_queue(q, ctx->cpu);
blk_mq_set_alloc_data(data, q, data->flags, ctx, hctx);
if (e) {
data->flags |= BLK_MQ_REQ_INTERNAL;
/*
* Flush requests are special and go directly to the
* dispatch list.
*/
if (!op_is_flush(op) && e->type->ops.mq.get_request) {
rq = e->type->ops.mq.get_request(q, op, data);
if (rq)
rq->rq_flags |= RQF_QUEUED;
} else
rq = __blk_mq_alloc_request(data, op);
} else {
rq = __blk_mq_alloc_request(data, op);
if (rq)
data->hctx->tags->rqs[rq->tag] = rq;
}
if (rq) {
if (!op_is_flush(op)) {
rq->elv.icq = NULL;
if (e && e->type->icq_cache)
blk_mq_sched_assign_ioc(q, rq, bio);
}
data->hctx->queued++;
return rq;
}
blk_queue_exit(q);
return NULL;
}
void blk_mq_sched_put_request(struct request *rq)
{
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
if (rq->rq_flags & RQF_ELVPRIV) {
blk_mq_sched_put_rq_priv(rq->q, rq);
if (rq->elv.icq) {
put_io_context(rq->elv.icq->ioc);
rq->elv.icq = NULL;
}
}
if ((rq->rq_flags & RQF_QUEUED) && e && e->type->ops.mq.put_request)
e->type->ops.mq.put_request(rq);
else
blk_mq_finish_request(rq);
}
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
{
struct elevator_queue *e = hctx->queue->elevator;
const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
bool did_work = false;
LIST_HEAD(rq_list);
if (unlikely(blk_mq_hctx_stopped(hctx)))
return;
hctx->run++;
/*
* If we have previous entries on our dispatch list, grab them first for
* more fair dispatch.
*/
if (!list_empty_careful(&hctx->dispatch)) {
spin_lock(&hctx->lock);
if (!list_empty(&hctx->dispatch))
list_splice_init(&hctx->dispatch, &rq_list);
spin_unlock(&hctx->lock);
}
/*
* Only ask the scheduler for requests, if we didn't have residual
* requests from the dispatch list. This is to avoid the case where
* we only ever dispatch a fraction of the requests available because
* of low device queue depth. Once we pull requests out of the IO
* scheduler, we can no longer merge or sort them. So it's best to
* leave them there for as long as we can. Mark the hw queue as
* needing a restart in that case.
*/
if (!list_empty(&rq_list)) {
blk_mq_sched_mark_restart(hctx);
did_work = blk_mq_dispatch_rq_list(hctx, &rq_list);
} else if (!has_sched_dispatch) {
blk_mq_flush_busy_ctxs(hctx, &rq_list);
blk_mq_dispatch_rq_list(hctx, &rq_list);
}
/*
* We want to dispatch from the scheduler if we had no work left
* on the dispatch list, OR if we did have work but weren't able
* to make progress.
*/
if (!did_work && has_sched_dispatch) {
do {
struct request *rq;
rq = e->type->ops.mq.dispatch_request(hctx);
if (!rq)
break;
list_add(&rq->queuelist, &rq_list);
} while (blk_mq_dispatch_rq_list(hctx, &rq_list));
}
}
void blk_mq_sched_move_to_dispatch(struct blk_mq_hw_ctx *hctx,
struct list_head *rq_list,
struct request *(*get_rq)(struct blk_mq_hw_ctx *))
{
do {
struct request *rq;
rq = get_rq(hctx);
if (!rq)
break;
list_add_tail(&rq->queuelist, rq_list);
} while (1);
}
EXPORT_SYMBOL_GPL(blk_mq_sched_move_to_dispatch);
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
struct request **merged_request)
{
struct request *rq;
switch (elv_merge(q, &rq, bio)) {
case ELEVATOR_BACK_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio))
return false;
if (!bio_attempt_back_merge(q, rq, bio))
return false;
*merged_request = attempt_back_merge(q, rq);
if (!*merged_request)
elv_merged_request(q, rq, ELEVATOR_BACK_MERGE);
return true;
case ELEVATOR_FRONT_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio))
return false;
if (!bio_attempt_front_merge(q, rq, bio))
return false;
*merged_request = attempt_front_merge(q, rq);
if (!*merged_request)
elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
return true;
default:
return false;
}
}
EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
{
struct elevator_queue *e = q->elevator;
if (e->type->ops.mq.bio_merge) {
struct blk_mq_ctx *ctx = blk_mq_get_ctx(q);
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
blk_mq_put_ctx(ctx);
return e->type->ops.mq.bio_merge(hctx, bio);
}
return false;
}
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq)
{
return rq_mergeable(rq) && elv_attempt_insert_merge(q, rq);
}
EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge);
void blk_mq_sched_request_inserted(struct request *rq)
{
trace_block_rq_insert(rq->q, rq);
}
EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted);
static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
if (rq->tag == -1) {
rq->rq_flags |= RQF_SORTED;
return false;
}
/*
* If we already have a real request tag, send directly to
* the dispatch list.
*/
spin_lock(&hctx->lock);
list_add(&rq->queuelist, &hctx->dispatch);
spin_unlock(&hctx->lock);
return true;
}
static void blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
{
if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) {
clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
if (blk_mq_hctx_has_pending(hctx))
blk_mq_run_hw_queue(hctx, true);
}
}
void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx)
{
unsigned int i;
if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
blk_mq_sched_restart_hctx(hctx);
else {
struct request_queue *q = hctx->queue;
if (!test_bit(QUEUE_FLAG_RESTART, &q->queue_flags))
return;
clear_bit(QUEUE_FLAG_RESTART, &q->queue_flags);
queue_for_each_hw_ctx(q, hctx, i)
blk_mq_sched_restart_hctx(hctx);
}
}
/*
* Add flush/fua to the queue. If we fail getting a driver tag, then
* punt to the requeue list. Requeue will re-invoke us from a context
* that's safe to block from.
*/
static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
struct request *rq, bool can_block)
{
if (blk_mq_get_driver_tag(rq, &hctx, can_block)) {
blk_insert_flush(rq);
blk_mq_run_hw_queue(hctx, true);
} else
blk_mq_add_to_requeue_list(rq, false, true);
}
void blk_mq_sched_insert_request(struct request *rq, bool at_head,
bool run_queue, bool async, bool can_block)
{
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
blk_mq_sched_insert_flush(hctx, rq, can_block);
return;
}
if (e && blk_mq_sched_bypass_insert(hctx, rq))
goto run;
if (e && e->type->ops.mq.insert_requests) {
LIST_HEAD(list);
list_add(&rq->queuelist, &list);
e->type->ops.mq.insert_requests(hctx, &list, at_head);
} else {
spin_lock(&ctx->lock);
__blk_mq_insert_request(hctx, rq, at_head);
spin_unlock(&ctx->lock);
}
run:
if (run_queue)
blk_mq_run_hw_queue(hctx, async);
}
void blk_mq_sched_insert_requests(struct request_queue *q,
struct blk_mq_ctx *ctx,
struct list_head *list, bool run_queue_async)
{
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
struct elevator_queue *e = hctx->queue->elevator;
if (e) {
struct request *rq, *next;
/*
* We bypass requests that already have a driver tag assigned,
* which should only be flushes. Flushes are only ever inserted
* as single requests, so we shouldn't ever hit the
* WARN_ON_ONCE() below (but let's handle it just in case).
*/
list_for_each_entry_safe(rq, next, list, queuelist) {
if (WARN_ON_ONCE(rq->tag != -1)) {
list_del_init(&rq->queuelist);
blk_mq_sched_bypass_insert(hctx, rq);
}
}
}
if (e && e->type->ops.mq.insert_requests)
e->type->ops.mq.insert_requests(hctx, list, false);
else
blk_mq_insert_requests(hctx, ctx, list);
blk_mq_run_hw_queue(hctx, run_queue_async);
}
static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
{
if (hctx->sched_tags) {
blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
blk_mq_free_rq_map(hctx->sched_tags);
hctx->sched_tags = NULL;
}
}
int blk_mq_sched_setup(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;
struct blk_mq_hw_ctx *hctx;
int ret, i;
/*
* Default to 256, since we don't split into sync/async like the
* old code did. Additionally, this is a per-hw queue depth.
*/
q->nr_requests = 2 * BLKDEV_MAX_RQ;
/*
* We're switching to using an IO scheduler, so setup the hctx
* scheduler tags and switch the request map from the regular
* tags to scheduler tags. First allocate what we need, so we
* can safely fail and fallback, if needed.
*/
ret = 0;
queue_for_each_hw_ctx(q, hctx, i) {
hctx->sched_tags = blk_mq_alloc_rq_map(set, i, q->nr_requests, 0);
if (!hctx->sched_tags) {
ret = -ENOMEM;
break;
}
ret = blk_mq_alloc_rqs(set, hctx->sched_tags, i, q->nr_requests);
if (ret)
break;
}
/*
* If we failed, free what we did allocate
*/
if (ret) {
queue_for_each_hw_ctx(q, hctx, i) {
if (!hctx->sched_tags)
continue;
blk_mq_sched_free_tags(set, hctx, i);
}
return ret;
}
return 0;
}
void blk_mq_sched_teardown(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;
struct blk_mq_hw_ctx *hctx;
int i;
queue_for_each_hw_ctx(q, hctx, i)
blk_mq_sched_free_tags(set, hctx, i);
}
int blk_mq_sched_init(struct request_queue *q)
{
int ret;
#if defined(CONFIG_DEFAULT_SQ_NONE)
if (q->nr_hw_queues == 1)
return 0;
#endif
#if defined(CONFIG_DEFAULT_MQ_NONE)
if (q->nr_hw_queues > 1)
return 0;
#endif
mutex_lock(&q->sysfs_lock);
ret = elevator_init(q, NULL);
mutex_unlock(&q->sysfs_lock);
return ret;
}
#ifndef BLK_MQ_SCHED_H
#define BLK_MQ_SCHED_H
#include "blk-mq.h"
#include "blk-mq-tag.h"
int blk_mq_sched_init_hctx_data(struct request_queue *q, size_t size,
int (*init)(struct blk_mq_hw_ctx *),
void (*exit)(struct blk_mq_hw_ctx *));
void blk_mq_sched_free_hctx_data(struct request_queue *q,
void (*exit)(struct blk_mq_hw_ctx *));
struct request *blk_mq_sched_get_request(struct request_queue *q, struct bio *bio, unsigned int op, struct blk_mq_alloc_data *data);
void blk_mq_sched_put_request(struct request *rq);
void blk_mq_sched_request_inserted(struct request *rq);
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
struct request **merged_request);
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio);
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq);
void blk_mq_sched_restart_queues(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_insert_request(struct request *rq, bool at_head,
bool run_queue, bool async, bool can_block);
void blk_mq_sched_insert_requests(struct request_queue *q,
struct blk_mq_ctx *ctx,
struct list_head *list, bool run_queue_async);
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
void blk_mq_sched_move_to_dispatch(struct blk_mq_hw_ctx *hctx,
struct list_head *rq_list,
struct request *(*get_rq)(struct blk_mq_hw_ctx *));
int blk_mq_sched_setup(struct request_queue *q);
void blk_mq_sched_teardown(struct request_queue *q);
int blk_mq_sched_init(struct request_queue *q);
static inline bool
blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
{
struct elevator_queue *e = q->elevator;
if (!e || blk_queue_nomerges(q) || !bio_mergeable(bio))
return false;
return __blk_mq_sched_bio_merge(q, bio);
}
static inline int blk_mq_sched_get_rq_priv(struct request_queue *q,
struct request *rq,
struct bio *bio)
{
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.mq.get_rq_priv)
return e->type->ops.mq.get_rq_priv(q, rq, bio);
return 0;
}
static inline void blk_mq_sched_put_rq_priv(struct request_queue *q,
struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.mq.put_rq_priv)
e->type->ops.mq.put_rq_priv(q, rq);
}
static inline bool
blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
struct bio *bio)
{
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.mq.allow_merge)
return e->type->ops.mq.allow_merge(q, rq, bio);
return true;
}
static inline void
blk_mq_sched_completed_request(struct blk_mq_hw_ctx *hctx, struct request *rq)
{
struct elevator_queue *e = hctx->queue->elevator;
if (e && e->type->ops.mq.completed_request)
e->type->ops.mq.completed_request(hctx, rq);
BUG_ON(rq->internal_tag == -1);
blk_mq_put_tag(hctx, hctx->sched_tags, rq->mq_ctx, rq->internal_tag);
}
static inline void blk_mq_sched_started_request(struct request *rq)
{
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.mq.started_request)
e->type->ops.mq.started_request(rq);
}
static inline void blk_mq_sched_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;
struct elevator_queue *e = q->elevator;
if (e && e->type->ops.mq.requeue_request)
e->type->ops.mq.requeue_request(rq);
}
static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
{
struct elevator_queue *e = hctx->queue->elevator;
if (e && e->type->ops.mq.has_work)
return e->type->ops.mq.has_work(hctx);
return false;
}
static inline void blk_mq_sched_mark_restart(struct blk_mq_hw_ctx *hctx)
{
if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) {
set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
if (hctx->flags & BLK_MQ_F_TAG_SHARED) {
struct request_queue *q = hctx->queue;
if (!test_bit(QUEUE_FLAG_RESTART, &q->queue_flags))
set_bit(QUEUE_FLAG_RESTART, &q->queue_flags);
}
}
}
static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
{
return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
}
#endif
......@@ -122,123 +122,16 @@ static ssize_t blk_mq_hw_sysfs_store(struct kobject *kobj,
return res;
}
static ssize_t blk_mq_sysfs_dispatched_show(struct blk_mq_ctx *ctx, char *page)
{
return sprintf(page, "%lu %lu\n", ctx->rq_dispatched[1],
ctx->rq_dispatched[0]);
}
static ssize_t blk_mq_sysfs_merged_show(struct blk_mq_ctx *ctx, char *page)
{
return sprintf(page, "%lu\n", ctx->rq_merged);
}
static ssize_t blk_mq_sysfs_completed_show(struct blk_mq_ctx *ctx, char *page)
{
return sprintf(page, "%lu %lu\n", ctx->rq_completed[1],
ctx->rq_completed[0]);
}
static ssize_t sysfs_list_show(char *page, struct list_head *list, char *msg)
{
struct request *rq;
int len = snprintf(page, PAGE_SIZE - 1, "%s:\n", msg);
list_for_each_entry(rq, list, queuelist) {
const int rq_len = 2 * sizeof(rq) + 2;
/* if the output will be truncated */
if (PAGE_SIZE - 1 < len + rq_len) {
/* backspacing if it can't hold '\t...\n' */
if (PAGE_SIZE - 1 < len + 5)
len -= rq_len;
len += snprintf(page + len, PAGE_SIZE - 1 - len,
"\t...\n");
break;
}
len += snprintf(page + len, PAGE_SIZE - 1 - len,
"\t%p\n", rq);
}
return len;
}
static ssize_t blk_mq_sysfs_rq_list_show(struct blk_mq_ctx *ctx, char *page)
{
ssize_t ret;
spin_lock(&ctx->lock);
ret = sysfs_list_show(page, &ctx->rq_list, "CTX pending");
spin_unlock(&ctx->lock);
return ret;
}
static ssize_t blk_mq_hw_sysfs_poll_show(struct blk_mq_hw_ctx *hctx, char *page)
{
return sprintf(page, "considered=%lu, invoked=%lu, success=%lu\n",
hctx->poll_considered, hctx->poll_invoked,
hctx->poll_success);
}
static ssize_t blk_mq_hw_sysfs_poll_store(struct blk_mq_hw_ctx *hctx,
const char *page, size_t size)
{
hctx->poll_considered = hctx->poll_invoked = hctx->poll_success = 0;
return size;
}
static ssize_t blk_mq_hw_sysfs_queued_show(struct blk_mq_hw_ctx *hctx,
char *page)
{
return sprintf(page, "%lu\n", hctx->queued);
}
static ssize_t blk_mq_hw_sysfs_run_show(struct blk_mq_hw_ctx *hctx, char *page)
{
return sprintf(page, "%lu\n", hctx->run);
}
static ssize_t blk_mq_hw_sysfs_dispatched_show(struct blk_mq_hw_ctx *hctx,
char *page)
{
char *start_page = page;
int i;
page += sprintf(page, "%8u\t%lu\n", 0U, hctx->dispatched[0]);
for (i = 1; i < BLK_MQ_MAX_DISPATCH_ORDER - 1; i++) {
unsigned int d = 1U << (i - 1);
page += sprintf(page, "%8u\t%lu\n", d, hctx->dispatched[i]);
}
page += sprintf(page, "%8u+\t%lu\n", 1U << (i - 1),
hctx->dispatched[i]);
return page - start_page;
}
static ssize_t blk_mq_hw_sysfs_rq_list_show(struct blk_mq_hw_ctx *hctx,
static ssize_t blk_mq_hw_sysfs_nr_tags_show(struct blk_mq_hw_ctx *hctx,
char *page)
{
ssize_t ret;
spin_lock(&hctx->lock);
ret = sysfs_list_show(page, &hctx->dispatch, "HCTX pending");
spin_unlock(&hctx->lock);
return ret;
return sprintf(page, "%u\n", hctx->tags->nr_tags);
}
static ssize_t blk_mq_hw_sysfs_tags_show(struct blk_mq_hw_ctx *hctx, char *page)
static ssize_t blk_mq_hw_sysfs_nr_reserved_tags_show(struct blk_mq_hw_ctx *hctx,
char *page)
{
return blk_mq_tag_sysfs_show(hctx->tags, page);
}
static ssize_t blk_mq_hw_sysfs_active_show(struct blk_mq_hw_ctx *hctx, char *page)
{
return sprintf(page, "%u\n", atomic_read(&hctx->nr_active));
return sprintf(page, "%u\n", hctx->tags->nr_reserved_tags);
}
static ssize_t blk_mq_hw_sysfs_cpus_show(struct blk_mq_hw_ctx *hctx, char *page)
......@@ -259,121 +152,27 @@ static ssize_t blk_mq_hw_sysfs_cpus_show(struct blk_mq_hw_ctx *hctx, char *page)
return ret;
}
static void blk_mq_stat_clear(struct blk_mq_hw_ctx *hctx)
{
struct blk_mq_ctx *ctx;
unsigned int i;
hctx_for_each_ctx(hctx, ctx, i) {
blk_stat_init(&ctx->stat[BLK_STAT_READ]);
blk_stat_init(&ctx->stat[BLK_STAT_WRITE]);
}
}
static ssize_t blk_mq_hw_sysfs_stat_store(struct blk_mq_hw_ctx *hctx,
const char *page, size_t count)
{
blk_mq_stat_clear(hctx);
return count;
}
static ssize_t print_stat(char *page, struct blk_rq_stat *stat, const char *pre)
{
return sprintf(page, "%s samples=%llu, mean=%lld, min=%lld, max=%lld\n",
pre, (long long) stat->nr_samples,
(long long) stat->mean, (long long) stat->min,
(long long) stat->max);
}
static ssize_t blk_mq_hw_sysfs_stat_show(struct blk_mq_hw_ctx *hctx, char *page)
{
struct blk_rq_stat stat[2];
ssize_t ret;
blk_stat_init(&stat[BLK_STAT_READ]);
blk_stat_init(&stat[BLK_STAT_WRITE]);
blk_hctx_stat_get(hctx, stat);
ret = print_stat(page, &stat[BLK_STAT_READ], "read :");
ret += print_stat(page + ret, &stat[BLK_STAT_WRITE], "write:");
return ret;
}
static struct blk_mq_ctx_sysfs_entry blk_mq_sysfs_dispatched = {
.attr = {.name = "dispatched", .mode = S_IRUGO },
.show = blk_mq_sysfs_dispatched_show,
};
static struct blk_mq_ctx_sysfs_entry blk_mq_sysfs_merged = {
.attr = {.name = "merged", .mode = S_IRUGO },
.show = blk_mq_sysfs_merged_show,
};
static struct blk_mq_ctx_sysfs_entry blk_mq_sysfs_completed = {
.attr = {.name = "completed", .mode = S_IRUGO },
.show = blk_mq_sysfs_completed_show,
};
static struct blk_mq_ctx_sysfs_entry blk_mq_sysfs_rq_list = {
.attr = {.name = "rq_list", .mode = S_IRUGO },
.show = blk_mq_sysfs_rq_list_show,
};
static struct attribute *default_ctx_attrs[] = {
&blk_mq_sysfs_dispatched.attr,
&blk_mq_sysfs_merged.attr,
&blk_mq_sysfs_completed.attr,
&blk_mq_sysfs_rq_list.attr,
NULL,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_queued = {
.attr = {.name = "queued", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_queued_show,
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_nr_tags = {
.attr = {.name = "nr_tags", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_nr_tags_show,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_run = {
.attr = {.name = "run", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_run_show,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_dispatched = {
.attr = {.name = "dispatched", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_dispatched_show,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_active = {
.attr = {.name = "active", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_active_show,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_pending = {
.attr = {.name = "pending", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_rq_list_show,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_tags = {
.attr = {.name = "tags", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_tags_show,
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_nr_reserved_tags = {
.attr = {.name = "nr_reserved_tags", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_nr_reserved_tags_show,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_cpus = {
.attr = {.name = "cpu_list", .mode = S_IRUGO },
.show = blk_mq_hw_sysfs_cpus_show,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_poll = {
.attr = {.name = "io_poll", .mode = S_IWUSR | S_IRUGO },
.show = blk_mq_hw_sysfs_poll_show,
.store = blk_mq_hw_sysfs_poll_store,
};
static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_stat = {
.attr = {.name = "stats", .mode = S_IRUGO | S_IWUSR },
.show = blk_mq_hw_sysfs_stat_show,
.store = blk_mq_hw_sysfs_stat_store,
};
static struct attribute *default_hw_ctx_attrs[] = {
&blk_mq_hw_sysfs_queued.attr,
&blk_mq_hw_sysfs_run.attr,
&blk_mq_hw_sysfs_dispatched.attr,
&blk_mq_hw_sysfs_pending.attr,
&blk_mq_hw_sysfs_tags.attr,
&blk_mq_hw_sysfs_nr_tags.attr,
&blk_mq_hw_sysfs_nr_reserved_tags.attr,
&blk_mq_hw_sysfs_cpus.attr,
&blk_mq_hw_sysfs_active.attr,
&blk_mq_hw_sysfs_poll.attr,
&blk_mq_hw_sysfs_stat.attr,
NULL,
};
......@@ -455,6 +254,8 @@ static void __blk_mq_unregister_dev(struct device *dev, struct request_queue *q)
kobject_put(&hctx->kobj);
}
blk_mq_debugfs_unregister_hctxs(q);
kobject_uevent(&q->mq_kobj, KOBJ_REMOVE);
kobject_del(&q->mq_kobj);
kobject_put(&q->mq_kobj);
......@@ -504,6 +305,8 @@ int blk_mq_register_dev(struct device *dev, struct request_queue *q)
kobject_uevent(&q->mq_kobj, KOBJ_ADD);
blk_mq_debugfs_register(q, kobject_name(&dev->kobj));
queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_register_hctx(hctx);
if (ret)
......@@ -529,6 +332,8 @@ void blk_mq_sysfs_unregister(struct request_queue *q)
if (!q->mq_sysfs_init_done)
return;
blk_mq_debugfs_unregister_hctxs(q);
queue_for_each_hw_ctx(q, hctx, i)
blk_mq_unregister_hctx(hctx);
}
......@@ -541,6 +346,8 @@ int blk_mq_sysfs_register(struct request_queue *q)
if (!q->mq_sysfs_init_done)
return ret;
blk_mq_debugfs_register_hctxs(q);
queue_for_each_hw_ctx(q, hctx, i) {
ret = blk_mq_register_hctx(hctx);
if (ret)
......
......@@ -90,113 +90,97 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
return atomic_read(&hctx->nr_active) < depth;
}
static int __bt_get(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt)
static int __blk_mq_get_tag(struct blk_mq_alloc_data *data,
struct sbitmap_queue *bt)
{
if (!hctx_may_queue(hctx, bt))
if (!(data->flags & BLK_MQ_REQ_INTERNAL) &&
!hctx_may_queue(data->hctx, bt))
return -1;
return __sbitmap_queue_get(bt);
}
static int bt_get(struct blk_mq_alloc_data *data, struct sbitmap_queue *bt,
struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags)
unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
{
struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
struct sbitmap_queue *bt;
struct sbq_wait_state *ws;
DEFINE_WAIT(wait);
unsigned int tag_offset;
bool drop_ctx;
int tag;
tag = __bt_get(hctx, bt);
if (data->flags & BLK_MQ_REQ_RESERVED) {
if (unlikely(!tags->nr_reserved_tags)) {
WARN_ON_ONCE(1);
return BLK_MQ_TAG_FAIL;
}
bt = &tags->breserved_tags;
tag_offset = 0;
} else {
bt = &tags->bitmap_tags;
tag_offset = tags->nr_reserved_tags;
}
tag = __blk_mq_get_tag(data, bt);
if (tag != -1)
return tag;
goto found_tag;
if (data->flags & BLK_MQ_REQ_NOWAIT)
return -1;
return BLK_MQ_TAG_FAIL;
ws = bt_wait_ptr(bt, hctx);
ws = bt_wait_ptr(bt, data->hctx);
drop_ctx = data->ctx == NULL;
do {
prepare_to_wait(&ws->wait, &wait, TASK_UNINTERRUPTIBLE);
tag = __bt_get(hctx, bt);
tag = __blk_mq_get_tag(data, bt);
if (tag != -1)
break;
/*
* We're out of tags on this hardware queue, kick any
* pending IO submits before going to sleep waiting for
* some to complete. Note that hctx can be NULL here for
* reserved tag allocation.
* some to complete.
*/
if (hctx)
blk_mq_run_hw_queue(hctx, false);
blk_mq_run_hw_queue(data->hctx, false);
/*
* Retry tag allocation after running the hardware queue,
* as running the queue may also have found completions.
*/
tag = __bt_get(hctx, bt);
tag = __blk_mq_get_tag(data, bt);
if (tag != -1)
break;
blk_mq_put_ctx(data->ctx);
if (data->ctx)
blk_mq_put_ctx(data->ctx);
io_schedule();
data->ctx = blk_mq_get_ctx(data->q);
data->hctx = blk_mq_map_queue(data->q, data->ctx->cpu);
if (data->flags & BLK_MQ_REQ_RESERVED) {
bt = &data->hctx->tags->breserved_tags;
} else {
hctx = data->hctx;
bt = &hctx->tags->bitmap_tags;
}
tags = blk_mq_tags_from_data(data);
if (data->flags & BLK_MQ_REQ_RESERVED)
bt = &tags->breserved_tags;
else
bt = &tags->bitmap_tags;
finish_wait(&ws->wait, &wait);
ws = bt_wait_ptr(bt, hctx);
ws = bt_wait_ptr(bt, data->hctx);
} while (1);
finish_wait(&ws->wait, &wait);
return tag;
}
static unsigned int __blk_mq_get_tag(struct blk_mq_alloc_data *data)
{
int tag;
tag = bt_get(data, &data->hctx->tags->bitmap_tags, data->hctx,
data->hctx->tags);
if (tag >= 0)
return tag + data->hctx->tags->nr_reserved_tags;
return BLK_MQ_TAG_FAIL;
}
static unsigned int __blk_mq_get_reserved_tag(struct blk_mq_alloc_data *data)
{
int tag;
if (unlikely(!data->hctx->tags->nr_reserved_tags)) {
WARN_ON_ONCE(1);
return BLK_MQ_TAG_FAIL;
}
tag = bt_get(data, &data->hctx->tags->breserved_tags, NULL,
data->hctx->tags);
if (tag < 0)
return BLK_MQ_TAG_FAIL;
if (drop_ctx && data->ctx)
blk_mq_put_ctx(data->ctx);
return tag;
}
finish_wait(&ws->wait, &wait);
unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
{
if (data->flags & BLK_MQ_REQ_RESERVED)
return __blk_mq_get_reserved_tag(data);
return __blk_mq_get_tag(data);
found_tag:
return tag + tag_offset;
}
void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
unsigned int tag)
void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
struct blk_mq_ctx *ctx, unsigned int tag)
{
struct blk_mq_tags *tags = hctx->tags;
if (tag >= tags->nr_reserved_tags) {
const int real_tag = tag - tags->nr_reserved_tags;
......@@ -312,11 +296,11 @@ int blk_mq_reinit_tagset(struct blk_mq_tag_set *set)
struct blk_mq_tags *tags = set->tags[i];
for (j = 0; j < tags->nr_tags; j++) {
if (!tags->rqs[j])
if (!tags->static_rqs[j])
continue;
ret = set->ops->reinit_request(set->driver_data,
tags->rqs[j]);
tags->static_rqs[j]);
if (ret)
goto out;
}
......@@ -351,11 +335,6 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
}
static unsigned int bt_unused_tags(const struct sbitmap_queue *bt)
{
return bt->sb.depth - sbitmap_weight(&bt->sb);
}
static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
bool round_robin, int node)
{
......@@ -411,19 +390,56 @@ void blk_mq_free_tags(struct blk_mq_tags *tags)
kfree(tags);
}
int blk_mq_tag_update_depth(struct blk_mq_tags *tags, unsigned int tdepth)
int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tags **tagsptr, unsigned int tdepth,
bool can_grow)
{
tdepth -= tags->nr_reserved_tags;
if (tdepth > tags->nr_tags)
struct blk_mq_tags *tags = *tagsptr;
if (tdepth <= tags->nr_reserved_tags)
return -EINVAL;
tdepth -= tags->nr_reserved_tags;
/*
* Don't need (or can't) update reserved tags here, they remain
* static and should never need resizing.
* If we are allowed to grow beyond the original size, allocate
* a new set of tags before freeing the old one.
*/
sbitmap_queue_resize(&tags->bitmap_tags, tdepth);
if (tdepth > tags->nr_tags) {
struct blk_mq_tag_set *set = hctx->queue->tag_set;
struct blk_mq_tags *new;
bool ret;
if (!can_grow)
return -EINVAL;
/*
* We need some sort of upper limit, set it high enough that
* no valid use cases should require more.
*/
if (tdepth > 16 * BLKDEV_MAX_RQ)
return -EINVAL;
new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth, 0);
if (!new)
return -ENOMEM;
ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
if (ret) {
blk_mq_free_rq_map(new);
return -ENOMEM;
}
blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
blk_mq_free_rq_map(*tagsptr);
*tagsptr = new;
} else {
/*
* Don't need (or can't) update reserved tags here, they
* remain static and should never need resizing.
*/
sbitmap_queue_resize(&tags->bitmap_tags, tdepth);
}
blk_mq_tag_wakeup_all(tags, false);
return 0;
}
......@@ -454,25 +470,3 @@ u32 blk_mq_unique_tag(struct request *rq)
(rq->tag & BLK_MQ_UNIQUE_TAG_MASK);
}
EXPORT_SYMBOL(blk_mq_unique_tag);
ssize_t blk_mq_tag_sysfs_show(struct blk_mq_tags *tags, char *page)
{
char *orig_page = page;
unsigned int free, res;
if (!tags)
return 0;
page += sprintf(page, "nr_tags=%u, reserved_tags=%u, "
"bits_per_word=%u\n",
tags->nr_tags, tags->nr_reserved_tags,
1U << tags->bitmap_tags.sb.shift);
free = bt_unused_tags(&tags->bitmap_tags);
res = bt_unused_tags(&tags->breserved_tags);
page += sprintf(page, "nr_free=%u, nr_reserved=%u\n", free, res);
page += sprintf(page, "active_queues=%u\n", atomic_read(&tags->active_queues));
return page - orig_page;
}
......@@ -16,6 +16,7 @@ struct blk_mq_tags {
struct sbitmap_queue breserved_tags;
struct request **rqs;
struct request **static_rqs;
struct list_head page_list;
};
......@@ -24,11 +25,12 @@ extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int r
extern void blk_mq_free_tags(struct blk_mq_tags *tags);
extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
extern void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
unsigned int tag);
extern void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
struct blk_mq_ctx *ctx, unsigned int tag);
extern bool blk_mq_has_free_tags(struct blk_mq_tags *tags);
extern ssize_t blk_mq_tag_sysfs_show(struct blk_mq_tags *tags, char *page);
extern int blk_mq_tag_update_depth(struct blk_mq_tags *tags, unsigned int depth);
extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tags **tags,
unsigned int depth, bool can_grow);
extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
void *priv);
......
此差异已折叠。
......@@ -32,7 +32,31 @@ void blk_mq_free_queue(struct request_queue *q);
int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *, struct list_head *);
void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list);
bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx);
bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
bool wait);
/*
* Internal helpers for allocating/freeing the request map
*/
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx);
void blk_mq_free_rq_map(struct blk_mq_tags *tags);
struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
unsigned int hctx_idx,
unsigned int nr_tags,
unsigned int reserved_tags);
int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx, unsigned int depth);
/*
* Internal helpers for request insertion into sw queues
*/
void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
bool at_head);
void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
struct list_head *list);
/*
* CPU hotplug helpers
*/
......@@ -57,6 +81,35 @@ extern int blk_mq_sysfs_register(struct request_queue *q);
extern void blk_mq_sysfs_unregister(struct request_queue *q);
extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
/*
* debugfs helpers
*/
#ifdef CONFIG_BLK_DEBUG_FS
int blk_mq_debugfs_register(struct request_queue *q, const char *name);
void blk_mq_debugfs_unregister(struct request_queue *q);
int blk_mq_debugfs_register_hctxs(struct request_queue *q);
void blk_mq_debugfs_unregister_hctxs(struct request_queue *q);
#else
static inline int blk_mq_debugfs_register(struct request_queue *q,
const char *name)
{
return 0;
}
static inline void blk_mq_debugfs_unregister(struct request_queue *q)
{
}
static inline int blk_mq_debugfs_register_hctxs(struct request_queue *q)
{
return 0;
}
static inline void blk_mq_debugfs_unregister_hctxs(struct request_queue *q)
{
}
#endif
extern void blk_mq_rq_timed_out(struct request *req, bool reserved);
void blk_mq_release(struct request_queue *q);
......@@ -103,6 +156,25 @@ static inline void blk_mq_set_alloc_data(struct blk_mq_alloc_data *data,
data->hctx = hctx;
}
static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
{
if (data->flags & BLK_MQ_REQ_INTERNAL)
return data->hctx->sched_tags;
return data->hctx->tags;
}
/*
* Internal helpers for request allocation/init/free
*/
void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx,
struct request *rq, unsigned int op);
void __blk_mq_finish_request(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
struct request *rq);
void blk_mq_finish_request(struct request *rq);
struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data,
unsigned int op);
static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
{
return test_bit(BLK_MQ_S_STOPPED, &hctx->state);
......
......@@ -88,6 +88,7 @@ EXPORT_SYMBOL_GPL(blk_queue_lld_busy);
void blk_set_default_limits(struct queue_limits *lim)
{
lim->max_segments = BLK_MAX_SEGMENTS;
lim->max_discard_segments = 1;
lim->max_integrity_segments = 0;
lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
lim->virt_boundary_mask = 0;
......@@ -128,6 +129,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
/* Inherit limits from component devices */
lim->discard_zeroes_data = 1;
lim->max_segments = USHRT_MAX;
lim->max_discard_segments = 1;
lim->max_hw_sectors = UINT_MAX;
lim->max_segment_size = UINT_MAX;
lim->max_sectors = UINT_MAX;
......@@ -253,7 +255,7 @@ void blk_queue_max_hw_sectors(struct request_queue *q, unsigned int max_hw_secto
max_sectors = min_not_zero(max_hw_sectors, limits->max_dev_sectors);
max_sectors = min_t(unsigned int, max_sectors, BLK_DEF_MAX_SECTORS);
limits->max_sectors = max_sectors;
q->backing_dev_info.io_pages = max_sectors >> (PAGE_SHIFT - 9);
q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
}
EXPORT_SYMBOL(blk_queue_max_hw_sectors);
......@@ -336,6 +338,22 @@ void blk_queue_max_segments(struct request_queue *q, unsigned short max_segments
}
EXPORT_SYMBOL(blk_queue_max_segments);
/**
* blk_queue_max_discard_segments - set max segments for discard requests
* @q: the request queue for the device
* @max_segments: max number of segments
*
* Description:
* Enables a low level driver to set an upper limit on the number of
* segments in a discard request.
**/
void blk_queue_max_discard_segments(struct request_queue *q,
unsigned short max_segments)
{
q->limits.max_discard_segments = max_segments;
}
EXPORT_SYMBOL_GPL(blk_queue_max_discard_segments);
/**
* blk_queue_max_segment_size - set max segment size for blk_rq_map_sg
* @q: the request queue for the device
......@@ -553,6 +571,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
b->virt_boundary_mask);
t->max_segments = min_not_zero(t->max_segments, b->max_segments);
t->max_discard_segments = min_not_zero(t->max_discard_segments,
b->max_discard_segments);
t->max_integrity_segments = min_not_zero(t->max_integrity_segments,
b->max_integrity_segments);
......
......@@ -89,7 +89,7 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
static ssize_t queue_ra_show(struct request_queue *q, char *page)
{
unsigned long ra_kb = q->backing_dev_info.ra_pages <<
unsigned long ra_kb = q->backing_dev_info->ra_pages <<
(PAGE_SHIFT - 10);
return queue_var_show(ra_kb, (page));
......@@ -104,7 +104,7 @@ queue_ra_store(struct request_queue *q, const char *page, size_t count)
if (ret < 0)
return ret;
q->backing_dev_info.ra_pages = ra_kb >> (PAGE_SHIFT - 10);
q->backing_dev_info->ra_pages = ra_kb >> (PAGE_SHIFT - 10);
return ret;
}
......@@ -121,6 +121,12 @@ static ssize_t queue_max_segments_show(struct request_queue *q, char *page)
return queue_var_show(queue_max_segments(q), (page));
}
static ssize_t queue_max_discard_segments_show(struct request_queue *q,
char *page)
{
return queue_var_show(queue_max_discard_segments(q), (page));
}
static ssize_t queue_max_integrity_segments_show(struct request_queue *q, char *page)
{
return queue_var_show(q->limits.max_integrity_segments, (page));
......@@ -236,7 +242,7 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
spin_lock_irq(q->queue_lock);
q->limits.max_sectors = max_sectors_kb << 1;
q->backing_dev_info.io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
spin_unlock_irq(q->queue_lock);
return ret;
......@@ -545,6 +551,11 @@ static struct queue_sysfs_entry queue_max_segments_entry = {
.show = queue_max_segments_show,
};
static struct queue_sysfs_entry queue_max_discard_segments_entry = {
.attr = {.name = "max_discard_segments", .mode = S_IRUGO },
.show = queue_max_discard_segments_show,
};
static struct queue_sysfs_entry queue_max_integrity_segments_entry = {
.attr = {.name = "max_integrity_segments", .mode = S_IRUGO },
.show = queue_max_integrity_segments_show,
......@@ -697,6 +708,7 @@ static struct attribute *default_attrs[] = {
&queue_max_hw_sectors_entry.attr,
&queue_max_sectors_entry.attr,
&queue_max_segments_entry.attr,
&queue_max_discard_segments_entry.attr,
&queue_max_integrity_segments_entry.attr,
&queue_max_segment_size_entry.attr,
&queue_iosched_entry.attr,
......@@ -799,7 +811,7 @@ static void blk_release_queue(struct kobject *kobj)
container_of(kobj, struct request_queue, kobj);
wbt_exit(q);
bdi_exit(&q->backing_dev_info);
bdi_put(q->backing_dev_info);
blkcg_exit_queue(q);
if (q->elevator) {
......@@ -814,13 +826,19 @@ static void blk_release_queue(struct kobject *kobj)
if (q->queue_tags)
__blk_queue_free_tags(q);
if (!q->mq_ops)
if (!q->mq_ops) {
if (q->exit_rq_fn)
q->exit_rq_fn(q, q->fq->flush_rq);
blk_free_flush_queue(q->fq);
else
} else {
blk_mq_release(q);
}
blk_trace_shutdown(q);
if (q->mq_ops)
blk_mq_debugfs_unregister(q);
if (q->bio_split)
bioset_free(q->bio_split);
......@@ -884,32 +902,36 @@ int blk_register_queue(struct gendisk *disk)
if (ret)
return ret;
if (q->mq_ops)
blk_mq_register_dev(dev, q);
/* Prevent changes through sysfs until registration is completed. */
mutex_lock(&q->sysfs_lock);
ret = kobject_add(&q->kobj, kobject_get(&dev->kobj), "%s", "queue");
if (ret < 0) {
blk_trace_remove_sysfs(dev);
return ret;
goto unlock;
}
kobject_uevent(&q->kobj, KOBJ_ADD);
if (q->mq_ops)
blk_mq_register_dev(dev, q);
blk_wb_init(q);
if (!q->request_fn)
return 0;
ret = elv_register_queue(q);
if (ret) {
kobject_uevent(&q->kobj, KOBJ_REMOVE);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(dev);
kobject_put(&dev->kobj);
return ret;
if (q->request_fn || (q->mq_ops && q->elevator)) {
ret = elv_register_queue(q);
if (ret) {
kobject_uevent(&q->kobj, KOBJ_REMOVE);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(dev);
kobject_put(&dev->kobj);
goto unlock;
}
}
return 0;
ret = 0;
unlock:
mutex_unlock(&q->sysfs_lock);
return ret;
}
void blk_unregister_queue(struct gendisk *disk)
......@@ -922,7 +944,7 @@ void blk_unregister_queue(struct gendisk *disk)
if (q->mq_ops)
blk_mq_unregister_dev(disk_to_dev(disk), q);
if (q->request_fn)
if (q->request_fn || (q->mq_ops && q->elevator))
elv_unregister_queue(q);
kobject_uevent(&q->kobj, KOBJ_REMOVE);
......
......@@ -272,6 +272,7 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
list_del_init(&rq->queuelist);
rq->rq_flags &= ~RQF_QUEUED;
rq->tag = -1;
rq->internal_tag = -1;
if (unlikely(bqt->tag_index[tag] == NULL))
printk(KERN_ERR "%s: tag %d is missing\n",
......
......@@ -866,10 +866,12 @@ static void tg_update_disptime(struct throtl_grp *tg)
unsigned long read_wait = -1, write_wait = -1, min_wait = -1, disptime;
struct bio *bio;
if ((bio = throtl_peek_queued(&sq->queued[READ])))
bio = throtl_peek_queued(&sq->queued[READ]);
if (bio)
tg_may_dispatch(tg, bio, &read_wait);
if ((bio = throtl_peek_queued(&sq->queued[WRITE])))
bio = throtl_peek_queued(&sq->queued[WRITE]);
if (bio)
tg_may_dispatch(tg, bio, &write_wait);
min_wait = min(read_wait, write_wait);
......
......@@ -96,7 +96,7 @@ static void wb_timestamp(struct rq_wb *rwb, unsigned long *var)
*/
static bool wb_recent_wait(struct rq_wb *rwb)
{
struct bdi_writeback *wb = &rwb->queue->backing_dev_info.wb;
struct bdi_writeback *wb = &rwb->queue->backing_dev_info->wb;
return time_before(jiffies, wb->dirty_sleep + HZ);
}
......@@ -279,7 +279,7 @@ enum {
static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
{
struct backing_dev_info *bdi = &rwb->queue->backing_dev_info;
struct backing_dev_info *bdi = rwb->queue->backing_dev_info;
u64 thislat;
/*
......@@ -339,7 +339,7 @@ static int latency_exceeded(struct rq_wb *rwb)
static void rwb_trace_step(struct rq_wb *rwb, const char *msg)
{
struct backing_dev_info *bdi = &rwb->queue->backing_dev_info;
struct backing_dev_info *bdi = rwb->queue->backing_dev_info;
trace_wbt_step(bdi, msg, rwb->scale_step, rwb->cur_win_nsec,
rwb->wb_background, rwb->wb_normal, rwb->wb_max);
......@@ -423,7 +423,7 @@ static void wb_timer_fn(unsigned long data)
status = latency_exceeded(rwb);
trace_wbt_timer(&rwb->queue->backing_dev_info, status, rwb->scale_step,
trace_wbt_timer(rwb->queue->backing_dev_info, status, rwb->scale_step,
inflight);
/*
......
......@@ -14,6 +14,10 @@
/* Max future timer expiry for timeouts */
#define BLK_MAX_TIMEOUT (5 * HZ)
#ifdef CONFIG_DEBUG_FS
extern struct dentry *blk_debugfs_root;
#endif
struct blk_flush_queue {
unsigned int flush_queue_delayed:1;
unsigned int flush_pending_idx:1;
......@@ -96,6 +100,8 @@ bool bio_attempt_front_merge(struct request_queue *q, struct request *req,
struct bio *bio);
bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
struct bio *bio);
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
struct bio *bio);
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
unsigned int *request_count,
struct request **same_queue_rq);
......@@ -167,7 +173,7 @@ static inline struct request *__elv_next_request(struct request_queue *q)
return NULL;
}
if (unlikely(blk_queue_bypass(q)) ||
!q->elevator->type->ops.elevator_dispatch_fn(q, 0))
!q->elevator->type->ops.sq.elevator_dispatch_fn(q, 0))
return NULL;
}
}
......@@ -176,16 +182,16 @@ static inline void elv_activate_rq(struct request_queue *q, struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (e->type->ops.elevator_activate_req_fn)
e->type->ops.elevator_activate_req_fn(q, rq);
if (e->type->ops.sq.elevator_activate_req_fn)
e->type->ops.sq.elevator_activate_req_fn(q, rq);
}
static inline void elv_deactivate_rq(struct request_queue *q, struct request *rq)
{
struct elevator_queue *e = q->elevator;
if (e->type->ops.elevator_deactivate_req_fn)
e->type->ops.elevator_deactivate_req_fn(q, rq);
if (e->type->ops.sq.elevator_deactivate_req_fn)
e->type->ops.sq.elevator_deactivate_req_fn(q, rq);
}
#ifdef CONFIG_FAIL_IO_TIMEOUT
......@@ -204,14 +210,14 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req,
struct bio *bio);
int ll_front_merge_fn(struct request_queue *q, struct request *req,
struct bio *bio);
int attempt_back_merge(struct request_queue *q, struct request *rq);
int attempt_front_merge(struct request_queue *q, struct request *rq);
struct request *attempt_back_merge(struct request_queue *q, struct request *rq);
struct request *attempt_front_merge(struct request_queue *q, struct request *rq);
int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
struct request *next);
void blk_recalc_rq_segments(struct request *rq);
void blk_rq_set_mixed_merge(struct request *rq);
bool blk_rq_merge_ok(struct request *rq, struct bio *bio);
int blk_try_merge(struct request *rq, struct bio *bio);
enum elv_merge blk_try_merge(struct request *rq, struct bio *bio);
void blk_queue_congestion_threshold(struct request_queue *q);
......@@ -249,7 +255,14 @@ static inline int blk_do_io_stat(struct request *rq)
{
return rq->rq_disk &&
(rq->rq_flags & RQF_IO_STAT) &&
(rq->cmd_type == REQ_TYPE_FS);
!blk_rq_is_passthrough(rq);
}
static inline void req_set_nomerge(struct request_queue *q, struct request *req)
{
req->cmd_flags |= REQ_NOMERGE;
if (req == q->last_merge)
q->last_merge = NULL;
}
/*
......@@ -263,6 +276,22 @@ void ioc_clear_queue(struct request_queue *q);
int create_task_io_context(struct task_struct *task, gfp_t gfp_mask, int node);
/**
* rq_ioc - determine io_context for request allocation
* @bio: request being allocated is for this bio (can be %NULL)
*
* Determine io_context to use for request allocation for @bio. May return
* %NULL if %current->io_context doesn't exist.
*/
static inline struct io_context *rq_ioc(struct bio *bio)
{
#ifdef CONFIG_BLK_CGROUP
if (bio && bio->bi_ioc)
return bio->bi_ioc;
#endif
return current->io_context;
}
/**
* create_io_context - try to create task->io_context
* @gfp_mask: allocation mask
......
......@@ -71,22 +71,24 @@ void bsg_job_done(struct bsg_job *job, int result,
{
struct request *req = job->req;
struct request *rsp = req->next_rq;
struct scsi_request *rq = scsi_req(req);
int err;
err = job->req->errors = result;
if (err < 0)
/* we're only returning the result field in the reply */
job->req->sense_len = sizeof(u32);
rq->sense_len = sizeof(u32);
else
job->req->sense_len = job->reply_len;
rq->sense_len = job->reply_len;
/* we assume all request payload was transferred, residual == 0 */
req->resid_len = 0;
rq->resid_len = 0;
if (rsp) {
WARN_ON(reply_payload_rcv_len > rsp->resid_len);
WARN_ON(reply_payload_rcv_len > scsi_req(rsp)->resid_len);
/* set reply (bidi) residual */
rsp->resid_len -= min(reply_payload_rcv_len, rsp->resid_len);
scsi_req(rsp)->resid_len -=
min(reply_payload_rcv_len, scsi_req(rsp)->resid_len);
}
blk_complete_request(req);
}
......@@ -113,6 +115,7 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
if (!buf->sg_list)
return -ENOMEM;
sg_init_table(buf->sg_list, req->nr_phys_segments);
scsi_req(req)->resid_len = blk_rq_bytes(req);
buf->sg_cnt = blk_rq_map_sg(req->q, req, buf->sg_list);
buf->payload_len = blk_rq_bytes(req);
return 0;
......@@ -127,6 +130,7 @@ static int bsg_create_job(struct device *dev, struct request *req)
{
struct request *rsp = req->next_rq;
struct request_queue *q = req->q;
struct scsi_request *rq = scsi_req(req);
struct bsg_job *job;
int ret;
......@@ -140,9 +144,9 @@ static int bsg_create_job(struct device *dev, struct request *req)
job->req = req;
if (q->bsg_job_size)
job->dd_data = (void *)&job[1];
job->request = req->cmd;
job->request_len = req->cmd_len;
job->reply = req->sense;
job->request = rq->cmd;
job->request_len = rq->cmd_len;
job->reply = rq->sense;
job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
* allocated */
if (req->bio) {
......@@ -177,7 +181,7 @@ static int bsg_create_job(struct device *dev, struct request *req)
*
* Drivers/subsys should pass this to the queue init function.
*/
void bsg_request_fn(struct request_queue *q)
static void bsg_request_fn(struct request_queue *q)
__releases(q->queue_lock)
__acquires(q->queue_lock)
{
......@@ -214,24 +218,30 @@ void bsg_request_fn(struct request_queue *q)
put_device(dev);
spin_lock_irq(q->queue_lock);
}
EXPORT_SYMBOL_GPL(bsg_request_fn);
/**
* bsg_setup_queue - Create and add the bsg hooks so we can receive requests
* @dev: device to attach bsg device to
* @q: request queue setup by caller
* @name: device to give bsg device
* @job_fn: bsg job handler
* @dd_job_size: size of LLD data needed for each job
*
* The caller should have setup the reuqest queue with bsg_request_fn
* as the request_fn.
*/
int bsg_setup_queue(struct device *dev, struct request_queue *q,
char *name, bsg_job_fn *job_fn, int dd_job_size)
struct request_queue *bsg_setup_queue(struct device *dev, char *name,
bsg_job_fn *job_fn, int dd_job_size)
{
struct request_queue *q;
int ret;
q = blk_alloc_queue(GFP_KERNEL);
if (!q)
return ERR_PTR(-ENOMEM);
q->cmd_size = sizeof(struct scsi_request);
q->request_fn = bsg_request_fn;
ret = blk_init_allocated_queue(q);
if (ret)
goto out_cleanup_queue;
q->queuedata = dev;
q->bsg_job_size = dd_job_size;
q->bsg_job_fn = job_fn;
......@@ -243,9 +253,12 @@ int bsg_setup_queue(struct device *dev, struct request_queue *q,
if (ret) {
printk(KERN_ERR "%s: bsg interface failed to "
"initialize - register queue\n", dev->kobj.name);
return ret;
goto out_cleanup_queue;
}
return 0;
return q;
out_cleanup_queue:
blk_cleanup_queue(q);
return ERR_PTR(ret);
}
EXPORT_SYMBOL_GPL(bsg_setup_queue);
此差异已折叠。
此差异已折叠。
......@@ -661,7 +661,6 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
struct block_device *bdev = inode->i_bdev;
struct gendisk *disk = bdev->bd_disk;
fmode_t mode = file->f_mode;
struct backing_dev_info *bdi;
loff_t size;
unsigned int max_sectors;
......@@ -708,9 +707,8 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
case BLKFRAGET:
if (!arg)
return -EINVAL;
bdi = blk_get_backing_dev_info(bdev);
return compat_put_long(arg,
(bdi->ra_pages * PAGE_SIZE) / 512);
(bdev->bd_bdi->ra_pages * PAGE_SIZE) / 512);
case BLKROGET: /* compatible */
return compat_put_int(arg, bdev_read_only(bdev) != 0);
case BLKBSZGET_32: /* get the logical block size (cf. BLKSSZGET) */
......@@ -728,8 +726,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
case BLKFRASET:
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
bdi = blk_get_backing_dev_info(bdev);
bdi->ra_pages = (arg * 512) / PAGE_SIZE;
bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
return 0;
case BLKGETSIZE:
size = i_size_read(bdev->bd_inode);
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -25,6 +25,7 @@ config PARIDE_PD
config PARIDE_PCD
tristate "Parallel port ATAPI CD-ROMs"
depends on PARIDE
select BLK_SCSI_REQUEST # only for the generic cdrom code
---help---
This option enables the high-level driver for ATAPI CD-ROM devices
connected through a parallel port. If you chose to build PARIDE
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册