- 12 11月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently scsi piggy backs on the block layer to define the concept of a tagged command. But we want to be able to have block-level host-wide tags assigned even for untagged commands like the initial INQUIRY, so add a new SCSI-level flag for commands that are tagged at the scsi level, so that even commands without that set can have tags assigned to them. Note that this alredy is the case for the blk-mq code path, and this just lets the old path catch up with it. We also set this flag based upon sdev->simple_tags instead of the block queue flag, so that it is entirely independent of the block layer tagging, and thus always correct even if a driver doesn't use block level tagging yet. Also remove the old blk_rq_tagged; it was only used by SCSI drivers, and removing it forces them to look for the proper replacement. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Christie <michaelc@cs.wisc.edu> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de>
-
- 29 10月, 2014 1 次提交
-
-
由 Martin K. Petersen 提交于
Commit 4eaf99be switched to returning bool and as a result reversed the logic of the integrity merge checks. However, the empty stubs used when the block integrity code is compiled out were still returning 0. Make these stubs return "true". Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reported-by: NMichael L. Semon <mlsemon35@gmail.com> Tested-by: NMichael L. Semon <mlsemon35@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 28 10月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
This reverts commit fb3ccb5d. SCSI-2/SPI actually needs the tagged/untagged flag in the request to work properly. Revert this patch and add a follow on to set it in the right place. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NJens Axboe <axboe@kernel.dk> Reported-by: NMeelis Roos <mroos@linux.ee> Tested-by: NMeelis Roos <mroos@linux.ee> Cc: stable@vger.kernel.org
-
- 10 10月, 2014 1 次提交
-
-
由 Michele Curti 提交于
Quite useless but it shuts up some warnings. Signed-off-by: NMichele Curti <michele.curti@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 10月, 2014 1 次提交
-
-
由 Mike Snitzer 提交于
The math in both blk_stack_limits() and queue_limit_alignment_offset() assume that a block device's io_min (aka minimum_io_size) is always a power-of-2. Fix the math such that it works for non-power-of-2 io_min. This issue (of alignment_offset != 0) became apparent when testing dm-thinp with a thinp blocksize that matches a RAID6 stripesize of 1280K. Commit fdfb4c8c ("dm thin: set minimum_io_size to pool's data block size") unlocked the potential for alignment_offset != 0 due to the dm-thin-pool's io_min possibly being a non-power-of-2. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 27 9月, 2014 8 次提交
-
-
由 Martin K. Petersen 提交于
We'd occasionally merge requests with conflicting integrity flags. Introduce a merge helper which checks that the requests have compatible integrity payloads. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Martin K. Petersen 提交于
Make the choice of checksum a per-I/O property by introducing a flag that can be inspected by the SCSI layer. There are several reasons for this: 1. It allows us to switch choice of checksum without unloading and reloading the HBA driver. 2. During error recovery we need to be able to tell the HBA that checksums read from disk should not be verified and converted to IP checksums. 3. For error injection purposes we need to be able to write a bad guard tag to storage. Since the storage device only supports T10 CRC we need to be able to disable IP checksum conversion on the HBA. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Martin K. Petersen 提交于
So far we have relied on the app tag size to determine whether a disk has been formatted with T10 protection information or not. However, not all target devices provide application tag storage. Add a flag to the block integrity profile that indicates whether the disk has been formatted with protection information. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NSagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Martin K. Petersen 提交于
Add a BLK_ prefix to the integrity profile flags. Also rename the flags to be more consistent with the generate/verify terminology in the rest of the integrity code. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Martin K. Petersen 提交于
Instead of the "operate" parameter we pass in a seed value and a pointer to a function that can be used to process the integrity metadata. The generation function is changed to have a return value to fit into this scheme. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Martin K. Petersen 提交于
The protection interval is not necessarily tied to the logical block size of a block device. Stop using the terms "sector" and "sectors". Going forward we will use the term "seed" to describe the initial reference tag value for a given I/O. "Interval" will be used to describe the portion of the data buffer that a given piece of protection information is associated with. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Martin K. Petersen 提交于
None of the filesystems appear interested in using the integrity tagging feature. Potentially because very few storage devices actually permit using the application tag space. Remove the tagging functions. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Martin K. Petersen 提交于
For commands like REQ_COPY we need a way to pass extra information along with each bio. Like integrity metadata this information must be available at the bottom of the stack so bi_private does not suffice. Rename the existing bi_integrity field to bi_special and make it a union so we can have different bio extensions for each class of command. We previously used bi_integrity != NULL as a way to identify whether a bio had integrity metadata or not. Introduce a REQ_INTEGRITY to be the indicator now that bi_special can contain different things. In addition, bio_integrity(bio) will now return a pointer to the integrity payload (when applicable). Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 9月, 2014 1 次提交
-
-
由 Ming Lei 提交于
This patch introduces 'struct blk_flush_queue' and puts all flush machinery related fields into this structure, so that - flush implementation details aren't exposed to driver - it is easy to convert to per dispatch-queue flush machinery This patch is basically a mechanical replacement. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 9月, 2014 1 次提交
-
-
由 Tejun Heo 提交于
bdev_get_queue() returns the request_queue associated with the specified block_device. blk_get_backing_dev_info() makes use of bdev_get_queue() to determine the associated bdi given a block_device. All the callers of bdev_get_queue() including blk_get_backing_dev_info() assume that bdev_get_queue() may return NULL and implement NULL handling; however, bdev_get_queue() requires the passed in block_device is opened and attached to its gendisk. Because an active gendisk always has a valid request_queue associated with it, bdev_get_queue() can never return NULL and neither can blk_get_backing_dev_info(). Make it clear that neither of the two functions can return NULL and remove NULL handling from all the callers. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 7月, 2014 2 次提交
-
-
由 Tejun Heo 提交于
Currently, blk-mq uses a percpu_counter to keep track of how many usages are in flight. The percpu_counter is drained while freezing to ensure that no usage is left in-flight after freezing is complete. blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this per-cpu gating mechanism. This type of code has relatively high chance of subtle bugs which are extremely difficult to trigger and it's way too hairy to be open coded in blk-mq. percpu_ref can serve the same purpose after the recent changes. This patch replaces the open-coded per-cpu usage counting and draining mechanism with percpu_ref. blk_mq_queue_enter() performs tryget_live on the ref and exit() performs put. blk_mq_freeze_queue() kills the ref and waits until the reference count reaches zero. blk_mq_unfreeze_queue() revives the ref and wakes up the waiters. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
blk_mq freezing is entangled with generic bypassing which bypasses blkcg and io scheduler and lets IO requests fall through the block layer to the drivers in FIFO order. This allows forward progress on IOs with the advanced features disabled so that those features can be configured or altered without worrying about stalling IO which may lead to deadlock through memory allocation. However, generic bypassing doesn't quite fit blk-mq. blk-mq currently doesn't make use of blkcg or ioscheds and it maps bypssing to freezing, which blocks request processing and drains all the in-flight ones. This causes problems as bypassing assumes that request processing is online. blk-mq works around this by conditionally allowing request processing for the problem case - during queue initialization. Another weirdity is that except for during queue cleanup, bypassing started on the generic side prevents blk-mq from processing new requests but doesn't drain the in-flight ones. This shouldn't break anything but again highlights that something isn't quite right here. The root cause is conflating blk-mq freezing and generic bypassing which are two different mechanisms. The only intersecting purpose that they serve is during queue cleanup. Let's properly separate blk-mq freezing from generic bypassing and simply use it where necessary. * request_queue->mq_freeze_depth is added and blk_mq_[un]freeze_queue() now operate on this counter instead of ->bypass_depth. The replacement for QUEUE_FLAG_BYPASS isn't added but the counter is tested directly. This will be further updated by later changes. * blk_mq_drain_queue() is dropped and "__" prefix is dropped from blk_mq_freeze_queue(). Queue cleanup path now calls blk_mq_freeze_queue() directly. * blk_queue_enter()'s fast path condition is simplified to simply check @q->mq_freeze_depth. Previously, the condition was !blk_queue_dying(q) && (!blk_queue_bypass(q) || !blk_queue_init_done(q)) mq_freeze_depth is incremented right after dying is set and blk_queue_init_done() exception isn't necessary as blk-mq doesn't start frozen, which only leaves the blk_queue_bypass() test which can be replaced by @q->mq_freeze_depth test. This change simplifies the code and reduces confusion in the area. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 25 6月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
Another restriction inherited for NVMe - those devices don't support SG lists that have "gaps" in them. Gaps refers to cases where the previous SG entry doesn't end on a page boundary. For NVMe, all SG entries must start at offset 0 (except the first) and end on a page boundary (except the last). Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 6月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
Commit 762380ad inadvertently changed a check for max_sectors to max_hw_sectors. Revert that part, so we still compare against max_sectors. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 6月, 2014 2 次提交
-
-
由 Jens Axboe 提交于
With the optimizations around not clearing the full request at alloc time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC up to the user allocating the request. Add a blk_rq_set_block_pc() that sets the command type to REQ_TYPE_BLOCK_PC, and properly initializes the members associated with this type of request. Update callers to use this function instead of manipulating rq->cmd_type directly. Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed attempt. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Some drivers have different limits on what size a request should optimally be, depending on the offset of the request. Similar to dividing a device into chunks. Add a setting that allows the driver to inform the block layer of such a chunk size. The block layer will then prevent merging across the chunks. This is needed to optimally support NVMe with a non-zero stripe size. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 6月, 2014 2 次提交
-
-
由 Matthew Wilcox 提交于
A block device driver may choose to provide a rw_page operation. These will be called when the filesystem is attempting to do page sized I/O to page cache pages (ie not for direct I/O). This does preclude I/Os that are larger than page size, so this may only be a performance gain for some devices. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Tested-by: NDheeraj Reddy <dheeraj.reddy@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fabian Frederick 提交于
Description by Jan Kara: "A lot of older filesystems don't properly flush volatile disk caches on fsync(2) which can lead to loss of fsynced data after power failure. This patch makes generic_file_fsync() issue proper cache flush to fix the problem. Sysadmin can use /sys/devices/.../cache_type to tell the system it should not send the cache flush." [akpm@linux-foundation.org: nuke ifdef] [akpm@linux-foundation.org: fix warning] Signed-off-by: NFabian Frederick <fabf@skynet.be> Suggested-by: NJan Kara <jack@suse.cz> Suggested-by: NChristoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2014 1 次提交
-
-
由 Ming Lei 提交于
'struct blk_mq_ctx' is __percpu, so add the annotation and fix the sparse warning reported from Fengguang: [block:for-linus 2/3] block/blk-mq.h:75:16: sparse: incorrect type in initializer (different address spaces) Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 29 5月, 2014 2 次提交
-
-
由 Jens Axboe 提交于
If devices are not SG starved, we waste a lot of time potentially collapsing SG segments. Enough that 1.5% of the CPU time goes to this, at only 400K IOPS. Add a queue flag, QUEUE_FLAG_NO_SG_MERGE, which just returns the number of vectors in a bio instead of looping over all segments and checking for collapsible ones. Add a BLK_MQ_F_SG_MERGE flag so that drivers can opt-in on the sg merging, if they so desire. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
I don't think we've ever caught any bugs with this, and there's the list poisoning for the plug lists to catch uninitialized cases. So remove the magic member and save 8 bytes in the struct. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 28 5月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
Both the cache flush state machine and the SCSI midlayer want to submit requests from irq context, and the current per-request requeue_work unfortunately causes corruption due to sharing with the csd field for flushes. Replace them with a per-request_queue list of requests to be requeued. Based on an earlier test by Ming Lei. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NMing Lei <tom.leiming@gmail.com> Tested-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 5月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
This adds support for active queue tracking, meaning that the blk-mq tagging maintains a count of active users of a tag set. This allows us to maintain a notion of fairness between users, so that we can distribute the tag depth evenly without starving some users while allowing others to try unfair deep queues. If sharing of a tag set is detected, each hardware queue will track the depth of its own queue. And if this exceeds the total depth divided by the number of active queues, the user is actively throttled down. The active queue count is done lazily to avoid bouncing that data between submitter and completer. Each hardware queue gets marked active when it allocates its first tag, and gets marked inactive when 1) the last tag is cleared, and 2) the queue timeout grace period has passed. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 5月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
This allows us to avoid a non-atomic memset over ->atomic_flags as well as killing lots of duplicate initializations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 17 4月, 2014 3 次提交
-
-
由 Jens Axboe 提交于
bsg currently checks ->request_fn to check whether a queue can handle struct request. But with blk-mq, we don't have a request_fn yet are request based. Add a queue_is_rq_based() helper and use that in bsg, I'm guessing this is not the last place we need to update for this. Besides, it better explains what is being checked. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
This allows to mirror the blk-mq code flow for more a more readable I/O completion handler in SCSI. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
We will use this work_struct to requeue scsi commands from the completion handler as well, so give it a more generic name. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 16 4月, 2014 2 次提交
-
-
由 Christoph Hellwig 提交于
Instead of setting the REQ_QUEUED flag on each of them just take it into account in the only macro checking it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
This was used in the olden days, back when onions were proper yellow. Basically it mapped to the current buffer to be transferred. With highmem being added more than a decade ago, most drivers map pages out of a bio, and rq->buffer isn't pointing at anything valid. Convert old style drivers to just use bio_data(). For the discard payload use case, just reference the page in the bio. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 10 4月, 2014 3 次提交
-
-
由 Jens Axboe 提交于
Martin reported that his test system would not boot with current git, it oopsed with this: BUG: unable to handle kernel paging request at ffff88046c6c9e80 IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060 Oops: 0002 [#1] SMP DEBUG_PAGEALLOC Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246 Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013 Workqueue: events_unbound async_run_entry_fn task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000 RIP: 0010:[<ffffffff812971e0>] [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 RSP: 0018:ffff880273d03a58 EFLAGS: 00010092 RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6 RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410 R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0 R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e FS: 0000000000000000(0000) GS:ffff880277b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0 Stack: ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000 ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62 ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8 Call Trace: [<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510 [<ffffffff81293167>] __blk_run_queue+0x37/0x50 [<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130 [<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0 [<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0 [<ffffffff813bba35>] scsi_execute+0xe5/0x180 [<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110 [<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod] [<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0 [<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod] [<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod] [<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140 [<ffffffff8106a1c5>] process_one_work+0x175/0x410 [<ffffffff8106b373>] worker_thread+0x123/0x400 [<ffffffff8106b250>] ? manage_workers+0x160/0x160 [<ffffffff8107104e>] kthread+0xce/0xf0 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70 Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89 df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89 58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d RIP [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 RSP <ffff880273d03a58> CR2: ffff88046c6c9e80 Martin bisected and found this to be the problem patch; commit 6d113398 Author: Jan Kara <jack@suse.cz> Date: Mon Feb 24 16:39:54 2014 +0100 block: Stop abusing rq->csd.list in blk-softirq and the problem was immediately apparent. The patch states that it is safe to reuse queuelist at completion time, since it is no longer used. However, that is not true if a device is using block enabled tagging. If that is the case, then the queuelist is reused to keep track of busy tags. If a device also ended up using softirq completions, we'd reuse ->queuelist for the IPI handling while block tagging was still using it. Boom. Fix this by adding a new ipi_list list head, and share the memory used with the request hash table. The hash table is never used after the request is moved to the dispatch list, which happens long before any potential completion of the request. Add a new request bit for this, so we don't have cases that check rq->hash while it could potentially have been reused for the IPI completion. Reported-by: NMartin K. Petersen <martin.petersen@oracle.com> Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Same function as kblockd_schedule_delayed_work(), but allow the caller to pass in a CPU that the work should be executed on. This just directly extends and maps into the workqueue API, and will be used to make the blk-mq mappings more strict. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
The queue parameter is never used, just get rid of it. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 4月, 2014 1 次提交
-
-
由 Al Viro 提交于
sg_iovec array passed to it can be const Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 25 2月, 2014 1 次提交
-
-
由 Jan Kara 提交于
Block layer currently abuses rq->csd.list.next for storing fifo_time. That is a terrible hack and completely unnecessary as well. Union achieves the same space saving in a cleaner way. Signed-off-by: NJan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 11 2月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
Witch to using a preallocated flush_rq for blk-mq similar to what's done with the old request path. This allows us to set up the request properly with a tag from the actually allowed range and ->rq_disk as needed by some drivers. To make life easier we also switch to dynamic allocation of ->flush_rq for the old path. This effectively reverts most of "blk-mq: fix for flush deadlock" and "blk-mq: Don't reserve a tag for flush request" Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-