- 08 5月, 2017 1 次提交
-
-
由 Wanpeng Li 提交于
This can be triggered by hot-unplug one cpu. ====================================================== [ INFO: possible circular locking dependency detected ] 4.11.0+ #17 Not tainted ------------------------------------------------------- step_after_susp/2640 is trying to acquire lock: (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110 but task is already holding lock: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (cpu_hotplug.lock){+.+.+.}: lock_acquire+0x11c/0x230 __mutex_lock+0x92/0x990 mutex_lock_nested+0x1b/0x20 get_online_cpus+0x64/0x80 blk_mq_init_allocated_queue+0x3a0/0x4e0 blk_mq_init_queue+0x3a/0x60 loop_add+0xe5/0x280 loop_init+0x124/0x177 do_one_initcall+0x53/0x1c0 kernel_init_freeable+0x1e3/0x27f kernel_init+0xe/0x100 ret_from_fork+0x31/0x40 -> #0 (all_q_mutex){+.+...}: __lock_acquire+0x189a/0x18a0 lock_acquire+0x11c/0x230 __mutex_lock+0x92/0x990 mutex_lock_nested+0x1b/0x20 blk_mq_queue_reinit_work+0x18/0x110 blk_mq_queue_reinit_dead+0x1c/0x20 cpuhp_invoke_callback+0x1f2/0x810 cpuhp_down_callbacks+0x42/0x80 _cpu_down+0xb2/0xe0 freeze_secondary_cpus+0xb6/0x390 suspend_devices_and_enter+0x3b3/0xa40 pm_suspend+0x129/0x490 state_store+0x82/0xf0 kobj_attr_store+0xf/0x20 sysfs_kf_write+0x45/0x60 kernfs_fop_write+0x135/0x1c0 __vfs_write+0x37/0x160 vfs_write+0xcd/0x1d0 SyS_write+0x58/0xc0 do_syscall_64+0x8f/0x710 return_from_SYSCALL_64+0x0/0x7a other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug.lock); lock(all_q_mutex); lock(cpu_hotplug.lock); lock(all_q_mutex); *** DEADLOCK *** 8 locks held by step_after_susp/2640: #0: (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0 #1: (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0 #2: (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0 #3: (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490 #4: (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20 #5: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390 #6: (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0 #7: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0 stack backtrace: CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17 Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016 Call Trace: dump_stack+0x99/0xce print_circular_bug+0x1fa/0x270 __lock_acquire+0x189a/0x18a0 lock_acquire+0x11c/0x230 ? lock_acquire+0x11c/0x230 ? blk_mq_queue_reinit_work+0x18/0x110 ? blk_mq_queue_reinit_work+0x18/0x110 __mutex_lock+0x92/0x990 ? blk_mq_queue_reinit_work+0x18/0x110 ? kmem_cache_free+0x2cb/0x330 ? anon_transport_class_unregister+0x20/0x20 ? blk_mq_queue_reinit_work+0x110/0x110 mutex_lock_nested+0x1b/0x20 ? mutex_lock_nested+0x1b/0x20 blk_mq_queue_reinit_work+0x18/0x110 blk_mq_queue_reinit_dead+0x1c/0x20 cpuhp_invoke_callback+0x1f2/0x810 ? __flow_cache_shrink+0x160/0x160 cpuhp_down_callbacks+0x42/0x80 _cpu_down+0xb2/0xe0 freeze_secondary_cpus+0xb6/0x390 suspend_devices_and_enter+0x3b3/0xa40 ? rcu_read_lock_sched_held+0x79/0x80 pm_suspend+0x129/0x490 state_store+0x82/0xf0 kobj_attr_store+0xf/0x20 sysfs_kf_write+0x45/0x60 kernfs_fop_write+0x135/0x1c0 __vfs_write+0x37/0x160 ? rcu_read_lock_sched_held+0x79/0x80 ? rcu_sync_lockdep_assert+0x2f/0x60 ? __sb_start_write+0xd9/0x1c0 ? vfs_write+0x1ad/0x1d0 vfs_write+0xcd/0x1d0 SyS_write+0x58/0xc0 ? rcu_read_lock_sched_held+0x79/0x80 do_syscall_64+0x8f/0x710 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will contend these two locks in the inversion order. This is due to commit eabe0659 (blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion issue because of hotplug rework, however the hotplug rework is still work-in-progress and lives in a -tip branch and mainline cannot yet trigger that splat. The commit breaks the linus's tree in the merge window, so this patch reverts the lock order and avoids to splat linus's tree. Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 5月, 2017 14 次提交
-
-
由 Omar Sandoval 提交于
Expose the fifo lists, cached next requests, batching state, and dispatch list. It'd also be possible to add the sorted lists, but there aren't already seq_file helpers for rbtrees. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
Expose the domain token pools, asynchronous sbitmap depth, domain request lists, and batching state. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
This provides the infrastructure for schedulers to expose their internal state through debugfs. We add a list of queue attributes and a list of hctx attributes to struct elevator_type and wire them up when switching schedulers. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Add missing seq_file.h header in blk-mq-debugfs.h Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
Originally, I tied debugfs registration/unregistration together with sysfs. There's no reason to do this, and it's getting in the way of letting schedulers define their own debugfs attributes. Instead, tie the debugfs registration to the lifetime of the structures themselves. The saner lifetimes mean we can also get rid of the extra mq directory and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now just nvme0n1/hctx0/tags. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
Preparation for adding more declarations. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
In commit e869b546 ("blk-mq: Unregister debugfs attributes earlier"), we shuffled the debugfs cleanup around so that the "state" attribute was removed before we freed the blk-mq data structures. However, later changes are going to undo that, so we need to explicitly disallow running a dead queue. [Omar: rebased and updated commit message] Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
A large part of blk-mq-debugfs.c is file_operations and seq_file boilerplate. This sucks as is but will suck even more when schedulers can define their own debugfs entries. Factor it all out into a single blk_mq_debugfs_fops which multiplexes as needed. We store the request_queue, blk_mq_hw_ctx, or blk_mq_ctx in the parent directory dentry, which is kind of hacky, but it works. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
It's not clear what these numbered directories represent unless you consult the code. We're about to get rid of the intermediate "mq" directory, so these would be even more confusing without that context. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
Slightly more readable, plus we also strip leading spaces. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
blk_queue_flags_store() currently truncates and returns a short write if the operation being written is too long. This can give us weird results, like here: $ echo "run bar" echo: write error: invalid argument $ dmesg [ 1103.075435] blk_queue_flags_store: unsupported operation bar. Use either 'run' or 'start' Instead, return an error if the user does this. While we're here, make the argument names consistent with everywhere else in this file. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
Make sure the spelled out flag names match the definition. This also adds a missing hctx state, BLK_MQ_S_START_ON_RUN, and a missing cmd_flag, __REQ_NOUNMAP. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Omar Sandoval 提交于
This reads more naturally than spaces. Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Peter Zijlstra 提交于
By poking at /debug/sched_features I triggered the following splat: [] ====================================================== [] WARNING: possible circular locking dependency detected [] 4.11.0-00873-g964c8b7-dirty #694 Not tainted [] ------------------------------------------------------ [] bash/2109 is trying to acquire lock: [] (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff8120cb8b>] static_key_slow_dec+0x1b/0x50 [] [] but task is already holding lock: [] (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170 [] [] which lock already depends on the new lock. [] [] [] the existing dependency chain (in reverse order) is: [] [] -> #2 (&sb->s_type->i_mutex_key#4){+++++.}: [] lock_acquire+0x100/0x210 [] down_write+0x28/0x60 [] start_creating+0x5e/0xf0 [] debugfs_create_dir+0x13/0x110 [] blk_mq_debugfs_register+0x21/0x70 [] blk_mq_register_dev+0x64/0xd0 [] blk_register_queue+0x6a/0x170 [] device_add_disk+0x22d/0x440 [] loop_add+0x1f3/0x280 [] loop_init+0x104/0x142 [] do_one_initcall+0x43/0x180 [] kernel_init_freeable+0x1de/0x266 [] kernel_init+0xe/0x100 [] ret_from_fork+0x31/0x40 [] [] -> #1 (all_q_mutex){+.+.+.}: [] lock_acquire+0x100/0x210 [] __mutex_lock+0x6c/0x960 [] mutex_lock_nested+0x1b/0x20 [] blk_mq_init_allocated_queue+0x37c/0x4e0 [] blk_mq_init_queue+0x3a/0x60 [] loop_add+0xe5/0x280 [] loop_init+0x104/0x142 [] do_one_initcall+0x43/0x180 [] kernel_init_freeable+0x1de/0x266 [] kernel_init+0xe/0x100 [] ret_from_fork+0x31/0x40 [] *** DEADLOCK *** [] [] 3 locks held by bash/2109: [] #0: (sb_writers#11){.+.+.+}, at: [<ffffffff81292bcd>] vfs_write+0x17d/0x1a0 [] #1: (debugfs_srcu){......}, at: [<ffffffff8155a90d>] full_proxy_write+0x5d/0xd0 [] #2: (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170 [] [] stack backtrace: [] CPU: 9 PID: 2109 Comm: bash Not tainted 4.11.0-00873-g964c8b7-dirty #694 [] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 [] Call Trace: [] lock_acquire+0x100/0x210 [] get_online_cpus+0x2a/0x90 [] static_key_slow_dec+0x1b/0x50 [] static_key_disable+0x20/0x30 [] sched_feat_write+0x131/0x170 [] full_proxy_write+0x97/0xd0 [] __vfs_write+0x28/0x120 [] vfs_write+0xb5/0x1a0 [] SyS_write+0x49/0xa0 [] entry_SYSCALL_64_fastpath+0x23/0xc2 This is because of the cpu hotplug lock rework. Break the chain at #1 by reversing the lock acquisition order. This way i_mutex_key#4 no longer depends on cpu_hotplug_lock and things are good. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
A previous commit introduced the sync flush, which we need from internal callers like blk_mq_quiesce_queue(). However, we also call the stop helpers from drivers, particularly from ->queue_rq() when we have to stop processing for a bit. We can't block from those locations, and we don't have to guarantee that we're fully flushed. Fixes: 9f993737 ("blk-mq: unify hctx delayed_run_work and run_work") Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 03 5月, 2017 1 次提交
-
-
由 Ming Lei 提交于
After queue is frozen, no request in this queue can be in use at all, so there can't be any .queue_rq() running on this queue. It isn't necessary to call blk_mq_quiesce_queue() any more, so remove it in both elevator_switch_mq() and blk_mq_update_nr_requests(). Cc: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Fixed up the description a bit. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 5月, 2017 3 次提交
-
-
由 Christoph Hellwig 提交于
Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway. Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
We have update the troublesome driver (mtip32xx) to deal with this appropriately. So kill the hack that bypassed scheduler allocation and insertion for reserved requests. Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
Since commit 84253394 ("remove the mg_disk driver") removed the only caller of elevator_change(), also remove the elevator_change() function itself. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 28 4月, 2017 4 次提交
-
-
由 Dan Williams 提交于
Commit 99e6608c "block: Add badblock management for gendisks" allowed for drivers like pmem and software-raid to advertise a list of bad media areas. However, it inadvertently added a 'badblocks' to all block devices. Lets clean this up by having the 'badblocks' attribute not be visible when the driver has not populated a 'struct badblocks' instance in the gendisk. Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
The only difference between ->run_work and ->delay_work, is that the latter is used to defer running a queue. This is done by marking the queue stopped, and scheduling ->delay_work to run sometime in the future. While the queue is stopped, direct runs or runs through ->run_work will not run the queue. If we combine the handlers, then we need to handle two things: 1) If a delayed/stopped run is scheduled, then we should not run the queue before that has been completed. 2) If a queue is delayed/stopped, the handler needs to restart the queue. Normally a run of a queue with the stopped bit set would be a no-op. Case 1 is handled by modifying a currently pending queue run to the deadline set by the caller of blk_mq_delay_queue(). Subsequent attempts to queue a queue run will find the work item already pending, and direct runs will see a stopped queue as before. Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN, that tells the work handler that it should clear a stopped queue and run the handler. Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
This modifies (or adds, if not currently pending) an existing delayed work item. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
They serve the exact same purpose. Get rid of the non-delayed work variant, and just run it without delay for the normal case. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 27 4月, 2017 10 次提交
-
-
由 Jens Axboe 提交于
At least one driver, mtip32xx, has a hard coded dependency on the value of the reserved tag used for internal commands. While that should really be fixed up, for now let's ensure that we just bypass the scheduler tags an allocation marked as reserved. They are used for house keeping or error handling, so we can safely ignore them in the scheduler. Tested-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
This new callback function will be used in the next patch to show more information about SCSI requests. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
Show the operation name, .cmd_flags and .rq_flags as names instead of numbers. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
This patch does not change any functionality but makes it possible to produce a single line of output with multiple flag-to-name translations. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
Move the "state" attribute from the top level to the "mq" directory as requested by Omar. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
We currently call blk_mq_free_queue() from blk_cleanup_queue() before we unregister the debugfs attributes for that queue in blk_release_queue(). This leaves a window open during which accessing most of the mq debugfs attributes would cause a use-after-free. Additionally, the "state" attribute allows running the queue, which we should not do after the queue has entered the "dead" state. Fix both cases by unregistering the debugfs attributes before freeing queue resources starts. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
Hctx unregistration involves calling kobject_del(). kobject_del() must not be called if kobject_add() has not been called. Hence in the error path only unregister hctxs for which registration succeeded. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
Since the blk_mq_debugfs_*register_hctxs() functions register and unregister all attributes under the "mq" directory, rename these into blk_mq_debugfs_*register_mq(). Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
A later patch will move the call of blk_mq_debugfs_register() to a function to which the queue name is not passed as an argument. To avoid having to add a 'name' argument to multiple callers, let blk_mq_debugfs_register() look up the queue name. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
A later patch in this series will modify blk_mq_debugfs_register() such that it uses q->kobj.parent to determine the name of a request queue. Hence make sure that that pointer is initialized before blk_mq_debugfs_register() is called. To avoid lock inversion, protect sysfs / debugfs registration with the queue sysfs_lock instead of the global mutex all_q_mutex. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 4月, 2017 1 次提交
-
-
由 Dan Williams 提交于
commit d1a5f2b4 ("block: use DAX for partition table reads") was part of a stalled effort to allow dax mappings of block devices. Since then the device-dax mechanism has filled the role of dax-mapping static device ranges. Now that we are moving ->direct_access() from a block_device operation to a dax_inode operation we would need block devices to map and carry their own dax_inode reference. Unless / until we decide to revive dax mapping of raw block devices through the dax_inode scheme, there is no need to carry read_dax_sector(). Its removal in turn allows for the removal of bdev_direct_access() and should have been included in commit 22375701 ("block_dev: remove DAX leftovers"). Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 24 4月, 2017 1 次提交
-
-
由 Mike Snitzer 提交于
When registering an integrity profile: if the template's interval_exp is not 0 use it, otherwise use the ilog2() of logical block size of the provided gendisk. This fixes a long-standing DM linear target bug where it cannot pass integrity data to the underlying device if its logical block size conflicts with the underlying device's logical block size. Cc: stable@vger.kernel.org Reported-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 4月, 2017 2 次提交
-
-
由 Ilya Dryomov 提交于
Commit 25520d55 ("block: Inline blk_integrity in struct gendisk") introduced blk_integrity_revalidate(), which seems to assume ownership of the stable pages flag and unilaterally clears it if no blk_integrity profile is registered: if (bi->profile) disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES; else disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES; It's called from revalidate_disk() and rescan_partitions(), making it impossible to enable stable pages for drivers that support partitions and don't use blk_integrity: while the call in revalidate_disk() can be trivially worked around (see zram, which doesn't support partitions and hence gets away with zram_revalidate_disk()), rescan_partitions() can be triggered from userspace at any time. This breaks rbd, where the ceph messenger is responsible for generating/verifying CRCs. Since blk_integrity_{un,}register() "must" be used for (un)registering the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES setting there. This way drivers that call blk_integrity_register() and use integrity infrastructure won't interfere with drivers that don't but still want stable pages. Fixes: 25520d55 ("block: Inline blk_integrity in struct gendisk") Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.4+, needs backporting Tested-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
Avoid that the following kernel bug gets triggered: BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:349 in_atomic(): 1, irqs_disabled(): 0, pid: 8019, name: find CPU: 10 PID: 8019 Comm: find Tainted: G W I 4.11.0-rc4-dbg+ #2 Call Trace: dump_stack+0x68/0x93 ___might_sleep+0x16e/0x230 __might_sleep+0x4a/0x80 __ext4_get_inode_loc+0x1e0/0x4e0 ext4_iget+0x70/0xbc0 ext4_iget_normal+0x2f/0x40 ext4_lookup+0xb6/0x1f0 lookup_slow+0x104/0x1e0 walk_component+0x19a/0x330 path_lookupat+0x4b/0x100 filename_lookup+0x9a/0x110 user_path_at_empty+0x36/0x40 vfs_statx+0x67/0xc0 SYSC_newfstatat+0x20/0x40 SyS_newfstatat+0xe/0x10 entry_SYSCALL_64_fastpath+0x18/0xad This happens since the big if/else in blk_mq_make_request() doesn't have final else section that also drops the ctx. Add that. Fixes: b00c53e8 ("blk-mq: fix schedule-while-atomic with scheduler attached") Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Omar Sandoval <osandov@fb.com> Added a bit more to the commit log. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 21 4月, 2017 3 次提交
-
-
由 Jens Axboe 提交于
No point in providing and exporting this helper. There's just one (real) user of it, just use rq_data_dir(). Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
commit c13660a0 ("blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()") removed the last user of this function. Hence also remove the function itself. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
If the caller passes in wait=true, it has to be able to block for a driver tag. We just had a bug where flush insertion would block on tag allocation, while we had preempt disabled. Ensure that we catch cases like that earlier next time. Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-