- 02 3月, 2017 3 次提交
-
-
由 Omar Sandoval 提交于
blk_mq_alloc_request_hctx() allocates a driver request directly, unlike its blk_mq_alloc_request() counterpart. It also crashes because it doesn't update the tags->rqs map. Fix it by making it allocate a scheduler request. Reported-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com> Tested-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Modified by me to also check at driver tag allocation time if the original request was reserved, so we can be sure to allocate a properly reserved tag at that point in time, too. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Shaohua Li 提交于
blk_mq_tags/requests of specific hardware queue are mostly used in specific cpus, which might not be in the same numa node as disk. For example, a nvme card is in node 0. half hardware queue will be used by node 0, the other node 1. Signed-off-by: NShaohua Li <shli@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 28 2月, 2017 2 次提交
-
-
由 Alexey Dobriyan 提交于
Now that %z is standartised in C99 there is no reason to support %Z. Unlike %L it doesn't even make format strings smaller. Use BUILD_BUG_ON in a couple ATM drivers. In case anyone didn't notice lib/vsprintf.o is about half of SLUB which is in my opinion is quite an achievement. Hopefully this patch inspires someone else to trim vsprintf.c more. Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
Fix typos and add the following to the scripts/spelling.txt: embeded||embedded Link: http://lkml.kernel.org/r/1481573103-11329-12-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 2月, 2017 4 次提交
-
-
由 Omar Sandoval 提交于
In blk_mq_sched_dispatch_requests(), we call blk_mq_sched_mark_restart() after we dispatch requests left over on our hardware queue dispatch list. This is so we'll go back and dispatch requests from the scheduler. In this case, it's only necessary to restart the hardware queue that we are running; there's no reason to run other hardware queues just because we are using shared tags. So, split out blk_mq_sched_mark_restart() into two operations, one for just the hardware queue and one for the whole request queue. The core code only needs the hctx variant, but I/O schedulers will want to use both. This also requires adjusting blk_mq_sched_restart_queues() to always check the queue restart flag, not just when using shared tags. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
Commit 50e1dab8 ("blk-mq-sched: fix starvation for multiple hardware queues and shared tags") fixed one starvation issue for shared tags. However, we can still get into a situation where we fail to allocate a tag because all tags are allocated but we don't have any pending requests on any hardware queue. One solution for this would be to restart all queues that share a tag map, but that really sucks. Ideally, we could just block and wait for a tag, but that isn't always possible from blk_mq_dispatch_rq_list(). However, we can still use the struct sbitmap_queue wait queues with a custom callback instead of blocking. This has a few benefits: 1. It avoids iterating over all hardware queues when completing an I/O, which the current restart code has to do. 2. It benefits from the existing rolling wakeup code. 3. It avoids punting to another thread just to have it block. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Scott Bauer 提交于
During an error on a comannd, ex: user provides wrong pw to unlock range, we will gracefully terminate the opal session. We want to propagate the original error to userland instead of the result of the session termination, which is almost always a success. Signed-off-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Scott Bauer 提交于
Before we free the opal structure we need to clean up any saved locking ranges that the user had told us to unlock from a suspend. Signed-off-by: NScott Bauer <scott.bauer@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 23 2月, 2017 6 次提交
-
-
由 Tetsuo Handa 提交于
IOPRIO_WHO_USER case in sys_ioprio_set()/sys_ioprio_get() are using while_each_thread(), which is unsafe under RCU lock according to commit 0c740d0a ("introduce for_each_thread() to replace the buggy while_each_thread()"). Use for_each_thread() (via for_each_process_thread()) which is safe under RCU lock. Link: http://lkml.kernel.org/r/201702011947.DBD56740.OMVHOLOtSJFFFQ@I-love.SAKURA.ne.jp Link: http://lkml.kernel.org/r/1486041779-4401-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jens Axboe 提交于
The wording in the entries were poor and not understandable by even deities. Kill the selection for default block scheduler, and impose a policy with sane defaults. Architected-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jon Derrick 提交于
By embedding the function data with the function sequence, we can eliminate the external function data and state variable code. It also made obvious some other small cleanups. Signed-off-by: NJon Derrick <jonathan.derrick@intel.com> Reviewed-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jon Derrick 提交于
Add a buffer size check against discovery and response header lengths before we loop over their buffers. Signed-off-by: NJon Derrick <jonathan.derrick@intel.com> Reviewed-by: NScott Bauer <scott.bauer@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jon Derrick 提交于
Add helper which verifies the response token is valid and matches the expected value. Merges token_type and response_get_token. Signed-off-by: NJon Derrick <jonathan.derrick@intel.com> Reviewed-by: NScott Bauer <scott.bauer@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jon Derrick 提交于
The short atom parser can return an errno from decoding but does not currently return the error as a signed value. Convert all of the parsers to ssize_t. Signed-off-by: NJon Derrick <jonathan.derrick@intel.com> Reviewed-by: NScott Bauer <scott.bauer@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 2月, 2017 2 次提交
-
-
由 Jan Kara 提交于
Iteration over partitions in del_gendisk() omits part0. Add bdev_unhash_inode() call for the whole device. Otherwise if the device number gets reused, bdev inode will be still associated with the old (stale) bdi. Tested-by: NLekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jan Kara 提交于
Move bdev_unhash_inode() after invalidate_partition() as invalidate_partition() looks up bdev and it cannot find the right bdev inode after bdev_unhash_inode() is called. Thus invalidate_partition() would not invalidate page cache of the previously used bdev. Also use part_devt() when calling bdev_unhash_inode() instead of manually creating the device number. Tested-by: NLekshmi Pillai <lekshmicpillai@in.ibm.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 2月, 2017 7 次提交
-
-
由 Christoph Hellwig 提交于
Insted of bloating the containing structure with it all the time this allocates struct opal_dev dynamically. Additionally this allows moving the definition of struct opal_dev into sed-opal.c. For this a new private data field is added to it that is passed to the send/receive callback. After that a lot of internals can be made private as well. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NScott Bauer <scott.bauer@intel.com> Reviewed-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Not having OPAL or a sub-feature supported is an entirely normal condition for many drives, so don't warn about it. Keep the messages, but tone them down to debug only. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
For blk-mq with scheduling, we can potentially end up with ALL driver tags assigned and sitting on the flush queues. If we defer because of an inlfight data request, then we can deadlock if that data request doesn't already have a tag assigned. This fixes a deadlock with running the xfs/297 xfstest, where thousands of syncs can cause the drive queue to stall. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
Usually we don't ask the scheduler for work, if we already have leftovers on the dispatch list. This is done to leave work on the scheduler side for as long as possible, for proper merging. But if we do have work leftover but didn't dispatch anything, then we should ask the scheduler since we could potentially issue requests from that. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
The current request insertion machinery works just fine for directly inserting flushes, so no need to special case this anymore. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
If we are currently out of driver tags, we don't want to add a new flush (without a tag) to the head of the requeue list. We want to add it to the back, behind the others that are potentially also waiting for a tag. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
Currently we're almost there, but if we dispatch nothing, then we still return success. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
- 16 2月, 2017 2 次提交
-
-
由 Jens Axboe 提交于
wbt_disable_default() calls del_timer_sync() to wait for the wbt timer to finish before disabling throttling. We can't do this with IRQs disable. This fixes a lockdep splat on boot, if non-root cgroups are used. Reported-by: NGabriel C <nix.or.die@gmail.com> Fixes: 87760e5e ("block: hook up writeback throttling") Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
md still need bio clone(not the fast version) for behind write, and it is more efficient to use bio_clone_bioset_partial(). The idea is simple and just copy the bvecs range specified from parameters. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJens Axboe <axboe@fb.com> Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 15 2月, 2017 2 次提交
-
-
由 Tahsin Erdogan 提交于
When a new disk shows up, sysfs queue directory is created before elevator is registered. This allows a user to attempt a scheduler switch even though the initial registration hasn't completed yet. In one scenario, blk_register_queue() calls elv_register_queue() and right before cfq_registered_queue() is called, another process executes elevator_switch() and replaces q->elevator with deadline scheduler. When cfq_registered_queue() executes it interprets e->elevator_data as struct cfq_data even though it is actually struct deadline_data. Grab q->sysfs_lock in blk_register_queue() to synchronize with sysfs callers. Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Scott Bauer 提交于
When CONFIG_KASAN is enabled, compilation fails: block/sed-opal.c: In function 'sed_ioctl': block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Moved all the ioctl structures off the stack and dynamically allocate using _IOC_SIZE() Fixes: 455a7b23 ("block: Add Sed-opal library") Reported-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 2月, 2017 1 次提交
-
-
由 Jens Axboe 提交于
The old elevator= boot parameter blindly attempts to load the same scheduler for mq and !mq devices, leading to a crash if we specify the wrong one. Ensure that we only apply this boot parameter to old !mq devices. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 11 2月, 2017 3 次提交
-
-
由 Omar Sandoval 提交于
None of the other blk-mq elevator hooks are called with this lock held. Additionally, it can lead to circular locking dependencies between queue_lock and the private scheduler lock. Reported-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Josef Bacik 提交于
Calling blk_queue_make_request resets a bunch of settings on the request_queue, but all we really want is to update the make_request_fn, so do this directly so we don't lose things like the logical and physical block sizes. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Paolo Valente 提交于
bio is used in bfq-mq's get_rq_priv, to get the request group. We could pass directly the group here, but I thought that passing the bio was more general, giving the possibility to get other pieces of information if needed. Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 2月, 2017 3 次提交
-
-
由 Christoph Hellwig 提交于
Add a new merge strategy that merges discard bios into a request until the maximum number of discard ranges (or the maximum discard size) is reached from the plug merging code. I/O scheduler merging is not wired up yet but might also be useful, although not for fast devices like NVMe which are the only user for now. Note that for now we don't support limiting the size of each discard range, but if needed that can be added later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Switch these constants to an enum, and make let the compiler ensure that all callers of blk_try_merge and elv_merge handle all potential values. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
This makes it available outside of blk-merge.c, and inlining such a trivial helper seems pretty useful to start with. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 07 2月, 2017 3 次提交
-
-
由 Omar Sandoval 提交于
I noticed that when booting with a default blk-mq I/O scheduler, the /sys/block/*/queue/iosched directory was missing. However, switching after boot did create the directory. This is because we skip the initial elevator register/unregister when we don't have a ->request_fn(), but we should still do it for the ->mq_ops case. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Scott Bauer 提交于
This patch implements the necessary logic to bring an Opal enabled drive out of a factory-enabled into a working Opal state. This patch set also enables logic to save a password to be replayed during a resume from suspend. Signed-off-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NRafael Antognolli <Rafael.Antognolli@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Write Same can return an error asynchronously if it turns out the underlying SCSI device does not support Write Same, which makes a proper fallback to other methods in __blkdev_issue_zeroout impossible. Thus only issue a Write Same from blkdev_issue_zeroout an don't try it at all from __blkdev_issue_zeroout as a non-invasive workaround. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com> Fixes: e73c23ff ("block: add async variant of blkdev_issue_zeroout") Tested-by: NJunichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 2月, 2017 2 次提交
-
-
由 Jens Axboe 提交于
If we end up doing a request-to-request merge when we have completed a bio-to-request merge, we free the request from deep down in that path. For blk-mq-sched, the merge path has to hold the appropriate lock, but we don't need it for freeing the request. And in fact holding the lock is problematic, since we are now calling the mq sched put_rq_private() hook with the lock held. Other call paths do not hold this lock. Fix this inconsistency by ensuring that the caller frees a merged request. Then we can do it outside of the lock, making it both more efficient and fixing the blk-mq-sched problem of invoking parts of the scheduler with an unknown lock state. Reported-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
由 Jens Axboe 提交于
When we attempt to merge request-to-request, we return a 0/1 if we ended up merging or not. Change that to return the pointer to the request that we freed. We will use this to move the freeing of that request out of the merge logic, so that callers can drop locks before freeing the request. There should be no functional changes in this patch. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-