- 08 12月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
commit ffe81d45 upstream. If we attempt a direct issue to a SCSI device, and it returns BUSY, then we queue the request up normally. However, the SCSI layer may have already setup SG tables etc for this particular command. If we later merge with this request, then the old tables are no longer valid. Once we issue the IO, we only read/write the original part of the request, not the new state of it. This causes data corruption, and is most often noticed with the file system complaining about the just read data being invalid: [ 235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256) because most of it is garbage... This doesn't happen from the normal issue path, as we will simply defer the request to the hardware queue dispatch list if we fail. Once it's on the dispatch list, we never merge with it. Fix this from the direct issue path by flagging the request as REQ_NOMERGE so we don't change the size of it before issue. See also: https://bugzilla.kernel.org/show_bug.cgi?id=201685Tested-by: NGuenter Roeck <linux@roeck-us.net> Fixes: 6ce3dd6e ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 01 12月, 2018 1 次提交
-
-
由 Hannes Reinecke 提交于
[ Upstream commit ca474b73 ] We need to copy the io priority, too; otherwise the clone will run with a different priority than the original one. Fixes: 43b62ce3 ("block: move bio io prio to a new field") Signed-off-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJean Delvare <jdelvare@suse.de> Fixed up subject, and ordered stores. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 27 11月, 2018 1 次提交
-
-
由 Keith Busch 提交于
[ Upstream commit f3587d76da05f68098ddb1cb3c98cc6a9e8a402c ] If the kernel allocates a bounce buffer for user read data, this memory needs to be cleared before copying it to the user, otherwise it may leak kernel memory to user space. Laurence Oberman <loberman@redhat.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 21 11月, 2018 1 次提交
-
-
由 Ming Lei 提交于
commit 8dc765d438f1e42b3e8227b3b09fad7d73f4ec9a upstream. c2856ae2 ("blk-mq: quiesce queue before freeing queue") has already fixed this race, however the implied synchronize_rcu() in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused performance regression. Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") tried to quiesce queue for avoiding unnecessary synchronize_rcu() only when queue initialization is done, because it is usual to see lots of inexistent LUNs which need to be probed. However, turns out it isn't safe to quiesce queue only when queue initialization is done. Because when one SCSI command is completed, the user of sending command can be waken up immediately, then the scsi device may be removed, meantime the run queue in scsi_end_request() is still in-progress, so kernel panic can be caused. In Red Hat QE lab, there are several reports about this kind of kernel panic triggered during kernel booting. This patch tries to address the issue by grabing one queue usage counter during freeing one request and the following run queue. Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()") Cc: Andrew Jones <drjones@redhat.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com> Cc: stable <stable@vger.kernel.org> Cc: jianchao.wang <jianchao.w.wang@oracle.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 14 11月, 2018 4 次提交
-
-
由 Paolo Valente 提交于
[ Upstream commit cbeb869a ] BFQ schedules entities (which represent either per-process queues or groups of queues) as a function of their timestamps. In particular, as a function of their (virtual) finish times. The finish time of an entity is computed as a function of the budget assigned to the entity, assuming, tentatively, that the entity, once in service, will receive an amount of service equal to its budget. Then, when the entity is expired because it finishes to be served, this finish time is updated as a function of the actual service received by the entity. This allows the entity to be correctly charged with only the service received, and then to be correctly re-scheduled. Yet an entity may receive service also while not being the entity in service (in the scheduling environment of its parent entity), for several reasons. If the entity remains with no backlog while receiving this 'unofficial' service, then it is expired. Also on such an expiration, the finish time of the entity should be updated to account for only the service actually received by the entity. Unfortunately, such an update is not performed for an entity expiring without being the entity in service. In a similar vein, the service counter of the entity in service is reset when the entity is expired, to be ready to be used for next service cycle. This reset too should be performed also in case an entity is expired because it remains empty after receiving service while not being the entity in service. But in this case the reset is not performed. This commit performs the above update of the finish time and reset of the service received, also for an entity expiring while not being the entity in service. Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Ming Lei 提交于
commit 34ffec60 upstream. Obviously the created writesame bio has to be aligned with logical block size, and use bio_allowed_max_sectors() to retrieve this number. Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Fixes: b49a0871 ("block: remove split code in blkdev_issue_{discard,write_same}") Tested-by: NRui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Ming Lei 提交于
commit 1adfc5e4 upstream. Obviously the created discard bio has to be aligned with logical block size. This patch introduces the helper of bio_allowed_max_sectors() for this purpose. Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Fixes: 744889b7 ("block: don't deal with discard limit in blkdev_issue_discard()") Fixes: a22c4d7e ("block: re-add discard_granularity and alignment checks") Reported-by: NRui Salvaterra <rsalvaterra@gmail.com> Tested-by: NRui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jens Axboe 提交于
commit 52990a5f upstream. We're only setting up the bounce bio sets if we happen to need bouncing for regular HIGHMEM, not if we only need it for ISA devices. Protect the ISA bounce setup with a mutex, since it's being invoked from driver init functions and can thus be called in parallel. Cc: stable@vger.kernel.org Reported-by: NOndrej Zary <linux@rainbow-software.org> Tested-by: NOndrej Zary <linux@rainbow-software.org> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 18 10月, 2018 1 次提交
-
-
由 Ming Lei 提交于
blk_queue_split() does respect this limit via bio splitting, so no need to do that in blkdev_issue_discard(), then we can align to normal bio submit(bio_add_page() & submit_bio()). More importantly, this patch fixes one issue introduced in a22c4d7e ("block: re-add discard_granularity and alignment checks"), in which zero discard bio may be generated in case of zero alignment. Fixes: a22c4d7e ("block: re-add discard_granularity and alignment checks") Cc: stable@vger.kernel.org Cc: Ming Lin <ming.l@ssi.samsung.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Tested-by: NMariusz Dabrowski <mariusz.dabrowski@intel.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 10月, 2018 1 次提交
-
-
由 Josef Bacik 提交于
Tetsuo brought to my attention that I screwed up the scale_up/scale_down helpers when I factored out the rq-qos code. We need to wake up all the waiters when we add slots for requests to make, not when we shrink the slots. Otherwise we'll end up things waiting forever. This was a mistake and simply puts everything back the way it was. cc: stable@vger.kernel.org Fixes: a7905043 ("blk-rq-qos: refactor out common elements of blk-wbt") eported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 9月, 2018 1 次提交
-
-
由 Ilya Dryomov 提交于
trace_block_unplug() takes true for explicit unplugs and false for implicit unplugs. schedule() unplugs are implicit and should be reported as timer unplugs. While correct in the legacy code, this has been inverted in blk-mq since 4.11. Cc: stable@vger.kernel.org Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers") Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 9月, 2018 1 次提交
-
-
由 Damien Le Moal 提交于
When the deadline scheduler is used with a zoned block device, writes to a zone will be dispatched one at a time. This causes the warning message: deadline: forced dispatching is broken (nr_sorted=X), please report this to be displayed when switching to another elevator with the legacy I/O path while write requests to a zone are being retained in the scheduler queue. Prevent this message from being displayed when executing elv_drain_elevator() for a zoned block device. __blk_drain_queue() will loop until all writes are dispatched and completed, resulting in the desired elevator queue drain without extensive modifications to the deadline code itself to handle forced-dispatch calls. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Fixes: 8dc8146f ("deadline-iosched: Introduce zone locking support") Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 9月, 2018 1 次提交
-
-
由 Keith Busch 提交于
A recent commit runs tag iterator callbacks under the rcu read lock, but existing callbacks do not satisfy the non-blocking requirement. The commit intended to prevent an iterator from accessing a queue that's being modified. This patch fixes the original issue by taking a queue reference instead of reading it, which allows callbacks to make blocking calls. Fixes: f5bbbbe4 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter") Acked-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 9月, 2018 1 次提交
-
-
由 Omar Sandoval 提交于
Klaus Kusche reported that the I/O busy time in /proc/diskstats was not updating properly on 4.18. This is because we started using ktime to track elapsed time, and we convert nanoseconds to jiffies when we update the partition counter. However, this gets rounded down, so any I/Os that take less than a jiffy are not accounted for. Previously in this case, the value of jiffies would sometimes increment while we were doing I/O, so at least some I/Os were accounted for. Let's convert the stats to use nanoseconds internally. We still report milliseconds as before, now more accurately than ever. The value is still truncated to 32 bits for backwards compatibility. Fixes: 522a7775 ("block: consolidate struct request timestamp fields") Cc: stable@vger.kernel.org Reported-by: NKlaus Kusche <klaus.kusche@computerix.info> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 9月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
After merging the iolatency policy, we potentially now have 4 policies being registered, but only support 3. This causes one of them to fail loading. Takashi reports that BFQ no longer works for him, because it fails to load due to policy registration failure. Bump to 5 policies, and also add a warning for when we have exceeded the global amount. If we have to touch this again, we should switch to a dynamic scheme instead. Reported-by: NTakashi Iwai <tiwai@suse.de> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Tested-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 9月, 2018 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
Fix trivial use-after-free. This could be last reference to bfqg. Fixes: 8f9bebc3 ("block, bfq: access and cache blkg data only when safe") Acked-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 9月, 2018 1 次提交
-
-
由 Mikulas Patocka 提交于
It is possible to call fsync on a read-only handle (for example, fsck.ext2 does it when doing read-only check), and this call results in kernel warning. The patch b089cfd9 ("block: don't warn for flush on read-only device") attempted to disable the warning, but it is buggy and it doesn't (op_is_flush tests flags, but bio_op strips off the flags). Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 9月, 2018 3 次提交
-
-
由 Dennis Zhou (Facebook) 提交于
There is a very small change a bio gets caught up in a really unfortunate race between a task migration, cgroup exiting, and itself trying to associate with a blkg. This is due to css offlining being performed after the css->refcnt is killed which triggers removal of blkgs that reach their blkg->refcnt of 0. To avoid this, association with a blkg should use tryget and fallback to using the root_blkg. Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups") Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dennis Zhou (Facebook) 提交于
Currently, blkcg destruction relies on a sequence of events: 1. Destruction starts. blkcg_css_offline() is called and blkgs release their reference to the blkcg. This immediately destroys the cgwbs (writeback). 2. With blkgs giving up their reference, the blkcg ref count should become zero and eventually call blkcg_css_free() which finally frees the blkcg. Jiufei Xue reported that there is a race between blkcg_bio_issue_check() and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent on the completion of all writeback associated with the blkcg. A count of the number of cgwbs is maintained and once that goes to zero, blkg destruction can follow. This should prevent premature blkg destruction related to writeback. The new process for blkcg cleanup is as follows: 1. Destruction starts. blkcg_css_offline() is called which offlines writeback. Blkg destruction is delayed on the cgwb_refcnt count to avoid punting potentially large amounts of outstanding writeback to root while maintaining any ongoing policies. Here, the base cgwb_refcnt is put back. 2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called and handles destruction of blkgs. This is where the css reference held by each blkg is released. 3. Once the blkcg ref count goes to zero, blkcg_css_free() is called. This finally frees the blkg. It seems in the past blk-throttle didn't do the most understandable things with taking data from a blkg while associating with current. So, the simplification and unification of what blk-throttle is doing caused this. Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups") Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dennis Zhou (Facebook) 提交于
This reverts commit 4c699480. Destroying blkgs is tricky because of the nature of the relationship. A blkg should go away when either a blkcg or a request_queue goes away. However, blkg's pin the blkcg to ensure they remain valid. To break this cycle, when a blkcg is offlined, blkgs put back their css ref. This eventually lets css_free() get called which frees the blkcg. The above commit (4c699480) breaks this order of events by trying to destroy blkgs in css_free(). As the blkgs still hold references to the blkcg, css_free() is never called. The race between blkcg_bio_issue_check() and cgroup_rmdir() will be addressed in the following patch by delaying destruction of a blkg until all writeback associated with the blkcg has been finished. Fixes: 4c699480 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()") Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NDennis Zhou <dennisszhou@gmail.com> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 8月, 2018 5 次提交
-
-
由 John Pittman 提交于
Currently, variable ref_count within the bsg_device struct is of type atomic_t. For variables being used as reference counters, the refcount API should be used instead of atomic. The newer refcount API works to prevent counter overflows and use-after-free bugs. So, move this varable from the atomic API to refcount, potentially avoiding the issues mentioned. Signed-off-by: NJohn Pittman <jpittman@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Chengguang Xu 提交于
kmem_cache_destroy() can handle NULL pointer correctly, so there is no need to check e->icq_cache before calling kmem_cache_destroy(). Signed-off-by: NChengguang Xu <cgxu519@gmx.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We already note and mark discard and swap IO from bio_to_wbt_flags(). Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We have two potential issues: 1) After commit 2887e41b, we only wake one process at the time when we finish an IO. We really want to wake up as many tasks as can queue IO. Before this commit, we woke up everyone, which could cause a thundering herd issue. 2) A task can potentially consume two wakeups, causing us to (in practice) miss a wakeup. Fix both by providing our own wakeup function, which stops __wake_up_common() from waking up more tasks if we fail to get a queueing token. With the strict ordering we have on the wait list, this wakes the right tasks and the right amount of tasks. Based on a patch from Jianchao Wang <jianchao.w.wang@oracle.com>. Tested-by: NAgarwal, Anchal <anchalag@amazon.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Prep patch for calling the handler from a different context, no functional changes in this patch. Tested-by: NAgarwal, Anchal <anchalag@amazon.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 8月, 2018 4 次提交
-
-
由 Jens Axboe 提交于
A previous commit removed the ability to have per-rq flags. We used those flags to maintain inflight counts. Since we don't have those anymore, we have to always maintain inflight counts, even if wbt is disabled. This is clearly suboptimal. Add a queue quiesce around changing the wbt latency settings from sysfs to work around this. With that, we can reliably put the enabled check in our bio_to_wbt_flags(), since we know the WBT_TRACKED flag will be consistent for the lifetime of the request. Fixes: c1c80384 ("block: remove external dependency on wbt_flags") Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We need to do this inside the loop as well, or we can allow new IO to supersede previous IO. Tested-by: NAnchal Agarwal <anchalag@amazon.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We need the memory barrier before checking the list head, use the appropriate helper for this. The matching queue side memory barrier is provided by set_current_state(). Tested-by: NAnchal Agarwal <anchalag@amazon.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Check it in one place, instead of in multiple places. Tested-by: NAnchal Agarwal <anchalag@amazon.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 8月, 2018 2 次提交
-
-
由 Jianchao Wang 提交于
For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to account the inflight requests. It will access the queue_hw_ctx and nr_hw_queues w/o any protection. When updating nr_hw_queues and blk_mq_in_flight/rw occur concurrently, panic comes up. Before update nr_hw_queues, the q will be frozen. So we could use q_usage_counter to avoid the race. percpu_ref_is_zero is used here so that we will not miss any in-flight request. The access to nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are under rcu critical section, __blk_mq_update_nr_hw_queues could use synchronize_rcu to ensure the zeroed q_usage_counter to be globally visible. Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jianchao Wang 提交于
Currently, when update nr_hw_queues, IO scheduler's init_hctx will be invoked before the mapping between ctx and hctx is adapted correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber) may depend on this mapping and get wrong result and panic finally. A simply way to fix this is that switch the IO scheduler to 'none' before update the nr_hw_queues, and then switch it back after update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due to nobody use them any more. Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 8月, 2018 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
This patch removes the duplicate initialization of q->queue_head in the blk_alloc_queue_node(). This removes the 2nd initialization so that we preserve the initialization order same as declaration present in struct request_queue. Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 8月, 2018 6 次提交
-
-
由 Chengguang Xu 提交于
Because blk_do_io_stat() only does a judgement about the request contributes to IO statistics, it better changes return type to bool. Signed-off-by: NChengguang Xu <cgxu519@gmx.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Maciej S. Szmigiero 提交于
The value that struct cftype .write() method returns is then directly returned to userspace as the value returned by write() syscall, so it should be the number of bytes actually written (or consumed) and not zero. Returning zero from write() syscall makes programs like /bin/echo or bash spin. Signed-off-by: NMaciej S. Szmigiero <mail@maciej.szmigiero.name> Fixes: e21b7a0b ("block, bfq: add full hierarchical scheduling and cgroups support") Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
bfq_bfqq_charge_time contains some lengthy and redundant code. This commit trims and condenses that code. Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
When a sync request is dispatched, the queue that contains that request, and all the ancestor entities of that queue, are charged with the number of sectors of the request. In constrast, if the request is async, then the queue and its ancestor entities are charged with the number of sectors of the request, multiplied by an overcharge factor. This throttles the bandwidth for async I/O, w.r.t. to sync I/O, and it is done to counter the tendency of async writes to steal I/O throughput to reads. On the opposite end, the lower this parameter, the stabler I/O control, in the following respect. The lower this parameter is, the less the bandwidth enjoyed by a group decreases - when the group does writes, w.r.t. to when it does reads; - when other groups do reads, w.r.t. to when they do writes. The fixes "block, bfq: always update the budget of an entity when needed" and "block, bfq: readd missing reset of parent-entity service" improved I/O control in bfq to such an extent that it has been possible to revise this overcharge factor downwards. This commit introduces the resulting, new value. Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
When the next child entity to serve changes for a given parent entity, the budget of that parent entity must be updated accordingly. Unfortunately, this update is not performed, by mistake, for the entities that happen to switch from having no child entity to serve, to having one child entity to serve. Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
The received-service counter needs to be equal to 0 when an entity is set in service. Unfortunately, commit "block, bfq: fix service being wrongly set to zero in case of preemption" mistakenly removed the resetting of this counter for the parent entities of the bfq_queue being set in service. This commit fixes this issue by resetting service for parent entities, directly on the expiration of the in-service bfq_queue. Fixes: 9fae8dd5 ("block, bfq: fix service being wrongly set to zero in case of preemption") Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 8月, 2018 2 次提交
-
-
由 Ming Lei 提交于
On wbt invariant is that if one IO is tracked via WBT_TRACKED, rqw->inflight should be updated for tracking this IO. But commit c1c80384 ("block: remove external dependency on wbt_flags") forgets to remove the early handling of !rwb_enabled(rwb) inside wbt_wait(), then the inflight counter may not be increased in wbt_wait(), but decreased in wbt_done() for this kind of IO, so this counter may become negative, then wbt_wait() may wait forever. This patch fixes the report in the following link: https://marc.info/?l=linux-block&m=153221542021033&w=2 Fixes: c1c80384 ("block: remove external dependency on wbt_flags") Cc: Josef Bacik <jbacik@fb.com> Reported-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Don't warn for a flush issued to a read-only device. It's not strictly a writable command, as it doesn't change any on-media data by itself. Reported-by: NStefan Agner <stefan@agner.ch> Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-