- 09 7月, 2018 26 次提交
-
-
由 Josef Bacik 提交于
blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Josef Bacik 提交于
We need to use blk_rq_stat in the blkcg qos stuff, so export some of these helpers so they can be used by other things. Signed-off-by: NJosef Bacik <jbacik@fb.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Josef Bacik 提交于
Since IO can be issued from literally anywhere it's almost impossible to do throttling without having some sort of adverse effect somewhere else in the system because of locking or other dependencies. The best way to solve this is to do the throttling when we know we aren't holding any other kernel resources. Do this by tracking throttling in a per-blkg basis, and if we require throttling flag the task that it needs to check before it returns to user space and possibly sleep there. This is to address the case where a process is doing work that is generating IO that can't be throttled, whether that is directly with a lot of REQ_META IO, or indirectly by allocating so much memory that it is swamping the disk with REQ_SWAP. We can't use task_add_work as we don't want to induce a memory allocation in the IO path, so simply saving the request queue in the task and flagging it to do the notify_resume thing achieves the same result without the overhead of a memory allocation. Signed-off-by: NJosef Bacik <jbacik@fb.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
For backcharging we need to know who the page belongs to when swapping it out. We don't worry about things that do ->rw_page (zram etc) at the moment, we're only worried about pages that actually go to a block device. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJosef Bacik <jbacik@fb.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Josef Bacik 提交于
blk-iolatency has a few stats that it would like to print out, and instead of adding a bunch of crap to the generic code just provide a helper so that controllers can add stuff to the stat line if they want to. Hide it behind a boot option since it changes the output of io.stat from normal, and these stats are only interesting to developers. Signed-off-by: NJosef Bacik <jbacik@fb.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Josef Bacik 提交于
Currently io.low uses a bi_cg_private to stash its private data for the blkg, however other blkcg policies may want to use this as well. Since we can get the private data out of the blkg, move this to bi_blkg in the bio and make it generic, then we can use bio_associate_blkg() to attach the blkg to the bio. Theoretically we could simply replace the bi_css with this since we can get to all the same information from the blkg, however you have to lookup the blkg, so for example wbc_init_bio() would have to lookup and possibly allocate the blkg for the css it was trying to attach to the bio. This could be problematic and result in us either not attaching the css at all to the bio, or falling back to the root blkcg if we are unable to allocate the corresponding blkg. So for now do this, and in the future if possible we could just replace the bi_css with bi_blkg and update the helpers to do the correct translation. Signed-off-by: NJosef Bacik <jbacik@fb.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
It won't be efficient to dequeue request one by one from sw queue, but we have to do that when queue is busy for better merge performance. This patch takes the Exponential Weighted Moving Average(EWMA) to figure out if queue is busy, then only dequeue request one by one from sw queue when queue is busy. Fixes: b347689f ("blk-mq-sched: improve dispatching from sw queue") Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Reported-by: NKashyap Desai <kashyap.desai@broadcom.com> Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Only attempt to merge bio iff the ctx->rq_list isn't empty, because: 1) for high-performance SSD, most of times dispatch may succeed, then there may be nothing left in ctx->rq_list, so don't try to merge over sw queue if it is empty, then we can save one acquiring of ctx->lock 2) we can't expect good merge performance on per-cpu sw queue, and missing one merge on sw queue won't be a big deal since tasks can be scheduled from one CPU to another. Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Reported-by: NKashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
list_splice_tail_init() is much more faster than inserting each request one by one, given all requets in 'list' belong to same sw queue and ctx->lock is required to insert requests. Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Minwoo Im 提交于
Fix typo in a function blk_mq_alloc_tag_set() comment. if if it too large -> if it's too large. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Minwoo Im 提交于
set->mq_map is now currently cleared if something goes wrong when establishing a queue map in blk-mq-pci.c. It's also cleared before updating a queue map in blk_mq_update_queue_map(). This patch provides an API to clear set->mq_map to make it clear. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Colin Ian King 提交于
Pointer dgrp is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'dgrp' set but not used [-Wunused-but-set-variable] Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Liu Bo 提交于
Once one cgroup has io.low configured, @low_valid becomes true and other cgroups won't switch it back whatsoever. Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
The payload of struct request is stored in the request.bio chain if the RQF_SPECIAL_PAYLOAD flag is not set and in request.special_vec if RQF_SPECIAL_PAYLOAD has been set. However, blk_update_request() iterates over req->bio whether or not RQF_SPECIAL_PAYLOAD has been set. Additionally, the RQF_SPECIAL_PAYLOAD flag is ignored by blk_rq_bytes() which means that the value returned by that function is incorrect if the RQF_SPECIAL_PAYLOAD flag has been set. It is not clear to me whether this is an oversight or whether this happened on purpose. Anyway, document that it is known that both functions ignore RQF_SPECIAL_PAYLOAD. See also commit f9d03f96 ("block: improve handling of the magic discard payload"). Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
SCSI probing may synchronously create and destroy a lot of request_queues for non-existent devices. Any synchronize_rcu() in queue creation or destroy path may introduce long latency during booting, see detailed description in comment of blk_register_queue(). This patch removes one synchronize_rcu() inside blk_cleanup_queue() for this case, commit c2856ae2(blk-mq: quiesce queue before freeing queue) needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but when queue isn't initialized, it isn't necessary to do that since only pass-through requests are involved, no original issue in scsi_execute() at all. Without this patch and previous one, it may take more 20+ seconds for virtio-scsi to complete disk probe. With the two patches, the time becomes less than 100ms. Fixes: c2856ae2 ("blk-mq: quiesce queue before freeing queue") Reported-by: NAndrew Jones <drjones@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Tested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
We have to remove synchronize_rcu() from blk_queue_cleanup(), otherwise long delay can be caused during lun probe. For removing it, we have to avoid to iterate the set->tag_list in IO path, eg, blk_mq_sched_restart(). This patch reverts 5b79413946d (Revert "blk-mq: don't handle TAG_SHARED in restart"). Given we have fixed enough IO hang issue, and there isn't any reason to restart all queues in one tags any more, see the following reasons: 1) blk-mq core can deal with shared-tags case well via blk_mq_get_driver_tag(), which can wake up queues waiting for driver tag. 2) SCSI is a bit special because it may return BLK_STS_RESOURCE if queue, target or host is ready, but SCSI built-in restart can cover all these well, see scsi_end_request(), queue will be rerun after any request initiated from this host/target is completed. In my test on scsi_debug(8 luns), this patch may improve IOPS by 20% ~ 30% when running I/O on these 8 luns concurrently. Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Reported-by: NAndrew Jones <drjones@redhat.com> Tested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Now hctx->lock is only acquired when adding hctx->dispatch_wait to one wait queue, but not held when removing it from the wait queue. IO hang can be observed easily if SCHED RESTART is disabled, that means now RESTART exits just for fixing the issue in blk_mq_mark_tag_wait(). This patch fixes the issue by introducing hctx->dispatch_wait_lock and holding it for removing hctx->dispatch_wait in blk_mq_dispatch_wake(), since we need to avoid acquiring hctx->lock in irq context. Fixes: eb619fdb ("blk-mq: fix issue with shared tag queue re-running") Cc: Christoph Hellwig <hch@lst.de> Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
'hctx' won't be changed at all, so not necessary to pass '**hctx' to blk_mq_mark_tag_wait(). Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
We never pass 'wait' as true to blk_mq_get_driver_tag(), and hence we never change '**hctx' as well. The last use of these went away with the flush cleanup, commit 0c2a6fe4. So cleanup the usage and remove the two extra parameters. Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Tested-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
The actual goal of the function bfq_bfqq_may_idle is to tell whether it is better to perform device idling (more precisely: I/O-dispatch plugging) for the input bfq_queue, either to boost throughput or to preserve service guarantees. This commit improves the name of the function accordingly. Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
If - a bfq_queue Q preempts another queue, because one request of Q arrives in time, - but, after this preemption, Q is not the queue that is set in service, then Q->entity.service is set to 0 when Q is eventually set in service. But Q should have continued receiving service with its old budget (which is why preemption has occurred) and its old service. This commit addresses this issue by resetting service on queue real expiration. Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
For some bfq_queues, BFQ plugs I/O dispatching when the queue becomes idle, and keeps the plug until a new request of the queue arrives, or a timeout fires. BFQ does so either to boost throughput or to preserve service guarantees for the queue. More precisely, for such a queue, plugging starts when the queue happens to have either no request enqueued, or no request in flight, that is, no request already dispatched but not yet completed. On the opposite end, BFQ may happen to expire a queue with no request enqueued, without doing any plugging, if the queue still has some request in flight. Unfortunately, such a premature expiration causes the queue to lose its chance to enjoy dispatch plugging a moment later, i.e., when its in-flight requests finally get completed. This breaks service guarantees for the queue. This commit prevents BFQ from expiring an empty queue if the latter still has in-flight requests. Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
To keep I/O throughput high as often as possible, BFQ performs I/O-dispatch plugging (aka device idling) only when beneficial exactly for throughput, or when needed for service guarantees (low latency, fairness). An important case where the latter condition holds is when the scenario is 'asymmetric' in terms of weights: i.e., when some bfq_queue or whole group of queues has a higher weight, and thus has to receive more service, than other queues or groups. Without dispatch plugging, lower-weight queues/groups may unjustly steal bandwidth to higher-weight queues/groups. To detect asymmetric scenarios, BFQ checks some sufficient conditions. One of these conditions is that active groups have different weights. BFQ controls this condition by maintaining a special set of unique weights of active groups (group_weights_tree). To this purpose, in the function bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a group to/from this set. Unfortunately, the function bfq_active_extract may happen to be invoked also for a group that is still active (to preserve the correct update of the next queue to serve, see comments in function bfq_no_longer_next_in_service() for details). In this case, removing the weight of the group makes the set group_weights_tree inconsistent. Service-guarantee violations follow. This commit addresses this issue by moving group_weights_tree insertions from their previous location (in bfq_active_insert) into the function __bfq_activate_entity, and by moving group_weights_tree extractions from bfq_active_extract to when the entity that represents a group remains throughly idle, i.e., with no request either enqueued or dispatched. Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Exclude zoned block device members from struct request_queue for CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building the code that uses these struct request_queue members if CONFIG_BLK_DEV_ZONED != n. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Since the implementation of blk_queue_nr_zones() is trivial and since it only has a single caller, inline this function. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
No cast is necessary when assigning a non-void pointer to a void pointer. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 6月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
Some devices have different queue limits depending on the type of IO. A classic case is SATA NCQ, where some commands can queue, but others cannot. If we have NCQ commands inflight and encounter a non-queueable command, the driver returns busy. Currently we attempt to dispatch more from the scheduler, if we were able to queue some commands. But for the case where we ended up stopping due to BUSY, we should not attempt to retrieve more from the scheduler. If we do, we can get into a situation where we attempt to queue a non-queueable command, get BUSY, then successfully retrieve more commands from that scheduler and queue those. This can repeat forever, starving the non-queuable command indefinitely. Fix this by NOT attempting to pull more commands from the scheduler, if we get a BUSY return. This should also be more optimal in terms of letting requests stay in the scheduler for as long as possible, if we get a BUSY due to the regular out-of-tags condition. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 6月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
This patch avoids that removing a path controlled by the dm-mpath driver while mkfs is running triggers the following kernel bug: kernel BUG at block/blk-core.c:3347! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 20 PID: 24369 Comm: mkfs.ext4 Not tainted 4.18.0-rc1-dbg+ #2 RIP: 0010:blk_end_request_all+0x68/0x70 Call Trace: <IRQ> dm_softirq_done+0x326/0x3d0 [dm_mod] blk_done_softirq+0x19b/0x1e0 __do_softirq+0x128/0x60d irq_exit+0x100/0x110 smp_call_function_single_interrupt+0x90/0x330 call_function_single_interrupt+0xf/0x20 </IRQ> Fixes: f9d03f96 ("block: improve handling of the magic discard payload") Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 6月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
Make sure that RQF_TIMED_OUT is cleared when a request is reused after a block driver timeout handler has returned BLK_EH_DONE. Fixes: da661267 ("blk-mq: don't time out requests again that are in the timeout handler") Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Andrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 6月, 2018 2 次提交
-
-
由 Dan Carpenter 提交于
resp->num is the number of tokens in resp->tok[]. It gets set in response_parse(). So if n == resp->num then we're reading beyond the end of the data. Fixes: 455a7b23 ("block: Add Sed-opal library") Reviewed-by: NScott Bauer <scott.bauer@intel.com> Tested-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dan Carpenter 提交于
If rq_state == ARRAY_SIZE() then we read one element beyond the end of the blk_mq_rq_state_name_array[] array. Fixes: ec6dcf63 ("blk-mq-debugfs: Show more request state information") Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 6月, 2018 2 次提交
-
-
由 Bart Van Assche 提交于
Commit 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()") breaks the dm driver. end_clone_bio() detects whether or not a bio is the last bio associated with a request by checking the .bi_next field. Commit 0ba99ca4 clears that field before end_clone_bio() has had a chance to inspect that field. Hence revert commit 0ba99ca4. This patch avoids that KASAN reports the following complaint when running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq): ================================================================== BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0 Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 bio_advance+0x11b/0x1d0 blk_update_request+0xa7/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d run_ksoftirqd+0x3f/0x60 smpboot_thread_fn+0x352/0x460 kthread+0x1c1/0x1e0 ret_from_fork+0x24/0x30 Allocated by task 1918: save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0xfe/0x350 mempool_alloc_slab+0x15/0x20 mempool_alloc+0xfb/0x270 bio_alloc_bioset+0x244/0x350 submit_bh_wbc+0x9c/0x2f0 __block_write_full_page+0x299/0x5a0 block_write_full_page+0x16b/0x180 blkdev_writepage+0x18/0x20 __writepage+0x42/0x80 write_cache_pages+0x376/0x8a0 generic_writepages+0xbe/0x110 blkdev_writepages+0xe/0x10 do_writepages+0x9b/0x180 __filemap_fdatawrite_range+0x178/0x1c0 file_write_and_wait_range+0x59/0xc0 blkdev_fsync+0x46/0x80 vfs_fsync_range+0x66/0x100 do_fsync+0x3d/0x70 __x64_sys_fsync+0x21/0x30 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9: save_stack+0x43/0xd0 __kasan_slab_free+0x137/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xd3/0x380 mempool_free_slab+0x17/0x20 mempool_free+0x63/0x160 bio_free+0x81/0xa0 bio_put+0x59/0x60 end_bio_bh_io_sync+0x5d/0x70 bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 end_clone_bio+0xa3/0xd0 [dm_mod] bio_endio+0x1a7/0x360 blk_update_request+0xd0/0x5b0 scsi_end_request+0x56/0x320 [scsi_mod] scsi_io_completion+0x7d6/0xb20 [scsi_mod] scsi_finish_command+0x1c0/0x280 [scsi_mod] scsi_softirq_done+0x19a/0x230 [scsi_mod] blk_mq_complete_request+0x160/0x240 scsi_mq_done+0x50/0x1a0 [scsi_mod] srp_recv_done+0x515/0x1330 [ib_srp] __ib_process_cq+0xa0/0xf0 [ib_core] ib_poll_handler+0x38/0xa0 [ib_core] irq_poll_softirq+0xe8/0x1f0 __do_softirq+0x128/0x60d The buggy address belongs to the object at ffff8801300e0640 which belongs to the cache bio-0 of size 200 The buggy address is located 144 bytes inside of 200-byte region [ffff8801300e0640, ffff8801300e0708) The buggy address belongs to the page: page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0 flags: 0x8000000000008100(slab|head) raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00 raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Cc: Kent Overstreet <kent.overstreet@gmail.com> Fixes: 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()") Acked-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
blk_mq_complete_request can only be called for blk-mq drivers, but when removing the BLK_EH_HANDLED return value, two legacy request timeout methods incorrectly got switched to call blk_mq_complete_request. Call __blk_complete_request instead to reinstance the previous behavior. For that __blk_complete_request needs to be exported. Fixes: 1fc2b62e ("scsi_transport_fc: complete requests from ->timeout") Fixes: 0df0bb08 ("null_blk: complete requests from ->timeout") Reported-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 6月, 2018 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
As we move stuff around, some doc references are broken. Fix some of them via this script: ./scripts/documentation-file-ref-check --fix Manually checked if the produced result is valid, removing a few false-positives. Acked-by: NTakashi Iwai <tiwai@suse.de> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Acked-by: NStephen Boyd <sboyd@kernel.org> Acked-by: NCharles Keepax <ckeepax@opensource.wolfsonmicro.com> Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: NColy Li <colyli@suse.de> Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: NJonathan Corbet <corbet@lwn.net>
-
- 15 6月, 2018 2 次提交
-
-
由 Anatoliy Glagolev 提交于
The existing implementation allows races between bsg_unregister and bsg_open paths. bsg_unregister and request_queue cleanup and deletion may start and complete right after bsg_get_device (in bsg_open path) retrieves bsg_class_device and releases the mutex. Then bsg_open path touches freed memory of bsg_class_device and request_queue. One possible fix is to hold the mutex all the way through bsg_get_device instead of releasing it after bsg_class_device retrieval. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-Off-By: NAnatoliy Glagolev <glagolig@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This function is entirely unused, so remove it and the tag_queue_busy member of struct request_queue. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 6月, 2018 2 次提交
-
-
由 Christoph Hellwig 提交于
Unused now that nvme stopped using it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
We can currently call the timeout handler again on a request that has already been handed over to the timeout handler. Prevent that with a new flag. Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce") Reported-by: NAndrew Randrianasulu <randrianasulu@gmail.com> Tested-by: NAndrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 6月, 2018 2 次提交
-
-
由 Kees Cook 提交于
The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc(a * b) with: vzalloc(array_size(a, b)) as well as handling cases of: vzalloc(a * b * c) with: vzalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vzalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(char) * COUNT + COUNT , ...) | vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc(C1 * C2 * C3, ...) | vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc(C1 * C2, ...) | vzalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This patch replaces cases of: kvmalloc(a * b, gfp) with: kvmalloc_array(a * b, gfp) as well as handling cases of: kvmalloc(a * b * c, gfp) with: kvmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kvmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kvmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kvmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kvmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kvmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kvmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kvmalloc( - sizeof(char) * COUNT + COUNT , ...) | kvmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kvmalloc + kvmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kvmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kvmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kvmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kvmalloc(C1 * C2 * C3, ...) | kvmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kvmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kvmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kvmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kvmalloc(sizeof(THING) * C2, ...) | kvmalloc(sizeof(TYPE) * C2, ...) | kvmalloc(C1 * C2 * C3, ...) | kvmalloc(C1 * C2, ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kvmalloc + kvmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kvmalloc + kvmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-