- 22 10月, 2015 7 次提交
-
-
由 Ming Lei 提交于
Most of times, flush plug should be the hottest I/O path, so mark ctx as pending after all requests in the list are inserted. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
The trace point is for tracing plug event of each request queue instead of each task, so we should check the request count in the plug list from current queue instead of current task. Signed-off-by: NMing Lei <ming.lei@canonical.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
After bio splitting is introduced, one bio can be splitted and it is marked as NOMERGE because it is too fat to be merged, so check bio_mergeable() earlier to avoid to try to merge it unnecessarily. Signed-off-by: NMing Lei <ming.lei@canonical.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
It isn't necessary to try to merge the bio which is marked as NOMERGE. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
The splitted bio has been already too fat to merge, so mark it as NOMERGE. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lei 提交于
The number of bio->bi_phys_segments is always obtained during bio splitting, so it is natural to setup it just after bio splitting, then we can avoid to compute nr_segment again during merge. Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jeff Moyer 提交于
Request queues with merging disabled will not flush the plug list after BLK_MAX_REQUEST_COUNT requests have been queued, since the code relies on blk_attempt_plug_merge to compute the request_count. Fix this by computing the number of queued requests even for nomerge queues. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 10 10月, 2015 2 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Kosuke Tatsukawa 提交于
blk_mq_tag_update_depth() seems to be missing a memory barrier which might cause the waker to not notice the waiter and fail to send a wake_up as in the following figure. blk_mq_tag_update_depth bt_get ------------------------------------------------------------------------ if (waitqueue_active(&bs->wait)) /* The CPU might reorder the test for the waitqueue up here, before prior writes complete */ prepare_to_wait(&bs->wait, &wait, TASK_UNINTERRUPTIBLE); tag = __bt_get(hctx, bt, last_tag, tags); /* Value set in bt_update_count not visible yet */ bt_update_count(&tags->bitmap_tags, tdepth); /* blk_mq_tag_wakeup_all(tags, false); */ bt = &tags->bitmap_tags; wake_index = atomic_read(&bt->wake_index); ... io_schedule(); ------------------------------------------------------------------------ This patch adds the missing memory barrier. I found this issue when I was looking through the linux source code for places calling waitqueue_active() before wake_up*(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Signed-off-by: NKosuke Tatsukawa <tatsu@ab.jp.nec.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 10月, 2015 2 次提交
-
-
由 Christoph Hellwig 提交于
And replace the blk_mq_tag_busy_iter with it - the driver use has been replaced with a new helper a while ago, and internal to the block we only need the new version. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
blk_mq_complete_request may be a no-op if the request has already been completed by others means (e.g. a timeout or cancellation), but currently drivers have to set rq->errors before calling blk_mq_complete_request, which might leave us with the wrong error value. Add an error parameter to blk_mq_complete_request so that we can defer setting rq->errors until we known we won the race to complete the request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 9月, 2015 6 次提交
-
-
由 Akinobu Mita 提交于
CPU hotplug handling for blk-mq (blk_mq_queue_reinit) acquires all_q_mutex in blk_mq_queue_reinit_notify() and then removes sysfs entries by blk_mq_sysfs_unregister(). Removing sysfs entry needs to be blocked until the active reference of the kernfs_node to be zero. On the other hand, reading blk_mq_hw_sysfs_cpu sysfs entry (e.g. /sys/block/nullb0/mq/0/cpu_list) acquires all_q_mutex in blk_mq_hw_sysfs_cpus_show(). If these happen at the same time, a deadlock can happen. Because one can wait for the active reference to be zero with holding all_q_mutex, and the other tries to acquire all_q_mutex with holding the active reference. The reason that all_q_mutex is acquired in blk_mq_hw_sysfs_cpus_show() is to avoid reading an imcomplete hctx->cpumask. Since reading sysfs entry for blk-mq needs to acquire q->sysfs_lock, we can avoid deadlock and reading an imcomplete hctx->cpumask by protecting q->sysfs_lock while hctx->cpumask is being updated. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NMing Lei <tom.leiming@gmail.com> Cc: Ming Lei <tom.leiming@gmail.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Akinobu Mita 提交于
Notifier callbacks for CPU_ONLINE action can be run on the other CPU than the CPU which was just onlined. So it is possible for the process running on the just onlined CPU to insert request and run hw queue before establishing new mapping which is done by blk_mq_queue_reinit_notify(). This can cause a problem when the CPU has just been onlined first time since the request queue was initialized. At this time ctx->index_hw for the CPU, which is the index in hctx->ctxs[] for this ctx, is still zero before blk_mq_queue_reinit_notify() is called by notifier callbacks for CPU_ONLINE action. For example, there is a single hw queue (hctx) and two CPU queues (ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and a request is inserted into ctx1->rq_list and set bit0 in pending bitmap as ctx1->index_hw is still zero. And then while running hw queue, flush_busy_ctxs() finds bit0 is set in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list. But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is ignored. Fix it by ensuring that new mapping is established before onlined cpu starts running. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NMing Lei <tom.leiming@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Akinobu Mita 提交于
CPU hotplug handling for blk-mq (blk_mq_queue_reinit) accesses q->mq_usage_counter while freezing all request queues in all_q_list. On the other hand, q->mq_usage_counter is deinitialized in blk_mq_free_queue() before deleting the queue from all_q_list. So if CPU hotplug event occurs in the window, percpu_ref_kill() is called with q->mq_usage_counter which has already been marked dead, and it triggers warning. Fix it by deleting the queue from all_q_list earlier than destroying q->mq_usage_counter. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NMing Lei <tom.leiming@gmail.com> Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Akinobu Mita 提交于
CPU hotplug handling for blk-mq (blk_mq_queue_reinit) updates q->mq_map by blk_mq_update_queue_map() for all request queues in all_q_list. On the other hand, q->mq_map is released before deleting the queue from all_q_list. So if CPU hotplug event occurs in the window, invalid memory access can happen. Fix it by releasing q->mq_map in blk_mq_release() to make it happen latter than removal from all_q_list. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Suggested-by: NMing Lei <tom.leiming@gmail.com> Reviewed-by: NMing Lei <tom.leiming@gmail.com> Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Akinobu Mita 提交于
There is a race between cpu hotplug handling and adding/deleting gendisk for blk-mq, where both are trying to register and unregister the same sysfs entries. null_add_dev --> blk_mq_init_queue --> blk_mq_init_allocated_queue --> add to 'all_q_list' (*) --> add_disk --> blk_register_queue --> blk_mq_register_disk (++) null_del_dev --> del_gendisk --> blk_unregister_queue --> blk_mq_unregister_disk (--) --> blk_cleanup_queue --> blk_mq_free_queue --> del from 'all_q_list' (*) blk_mq_queue_reinit --> blk_mq_sysfs_unregister (-) --> blk_mq_sysfs_register (+) While the request queue is added to 'all_q_list' (*), blk_mq_queue_reinit() can be called for the queue anytime by CPU hotplug callback. But blk_mq_sysfs_unregister (-) and blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--) is finished. Because '/sys/block/*/mq/' is not exists. There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can be used to track these sysfs stuff, but it is only fixing this issue partially. In order to fix it completely, we just need per-queue flag instead of per-hctx flag with appropriate locking. So this introduces q->mq_sysfs_init_done which is properly protected with all_q_mutex. Also, we need to ensure that blk_mq_map_swqueue() is called with all_q_mutex is held. Since hctx->nr_ctx is reset temporarily and updated in blk_mq_map_swqueue(), so we should avoid blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value in CPU hotplug handling or adding/deleting gendisk . Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NMing Lei <tom.leiming@gmail.com> Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Akinobu Mita 提交于
When unmapped hw queue is remapped after CPU topology is changed, hctx->tags->cpumask has to be set after hctx->tags is setup in blk_mq_map_swqueue(), otherwise it causes null pointer dereference. Fixes: f26cdc85 ("blk-mq: Shared tag enhancements") Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 24 9月, 2015 1 次提交
-
-
由 Catalin Marinas 提交于
The pages allocated for struct request contain pointers to other slab allocations (via ops->init_request). Since kmemleak does not track/scan page allocations, the slab objects will be reported as leaks (false positives). This patch adds kmemleak callbacks to allow tracking of such pages. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Reported-by: NBart Van Assche <bart.vanassche@sandisk.com> Tested-by: Bart Van Assche<bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 9月, 2015 1 次提交
-
-
由 Ming Lei 提交于
When bio bounce is involved, one new bio and its biovecs are cloned from the comming bio, which can be one fast-cloned bio from upper layer(such as dm). So it is obviously wrong to assume the start index of the coming( original) bio's io vector is zero, which can be any value between 0 and (bi_max_vecs - 1), especially in case of bio split. This patch fixes Fedora's booting oops on i386, often with the following kernel log together: > [ 9.026738] systemd[1]: Switching root. > [ 9.036467] systemd-journald[149]: Received SIGTERM from PID 1 > (systemd). > [ 9.082262] BUG: Bad page state in process kworker/u5:1 pfn:372ac > [ 9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 > index:0x16a > [ 9.085755] flags: 0x40020021(locked|lru|mappedtodisk) > [ 9.087284] page dumped because: page still charged to cgroup > [ 9.088772] bad because of flags: > [ 9.089731] flags: 0x21(locked|lru) > [ 9.090818] page->mem_cgroup:f2c3e400 Reported-by: NJosh Boyer <jwboyer@fedoraproject.org> Tested-by: NAdam Williamson <awilliam@redhat.com> Cc: Ming Lin <mlin@kernel.org> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 17 9月, 2015 1 次提交
-
-
由 Ming Lei 提交于
biovecs has become immutable since v3.13, so it isn't necessary to allocate biovecs for the new cloned bios, then we can save one extra biovecs allocation/copy, and the allocation is often not fixed-length and a bit more expensive. For example, if the 'max_sectors_kb' of null blk's queue is set as 16(32 sectors) via sysfs just for making more splits, this patch can increase throught about ~70% in the sequential read test over null_blk(direct io, bs: 1M). Cc: Christoph Hellwig <hch@infradead.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Ming Lin <ming.l@ssi.samsung.com> Cc: Dongsu Park <dpark@posteo.net> Signed-off-by: NMing Lei <ming.lei@canonical.com> This fixes a performance regression introduced by commit 54efd50b, and allows us to take full advantage of the fact that we have immutable bio_vecs. Hand applied, as it rejected violently with commit 5014c311. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 11 9月, 2015 4 次提交
-
-
由 Tejun Heo 提交于
While making the root blkg unconditional, ec13b1d6 ("blkcg: always create the blkcg_gq for the root blkcg") removed the part which clears q->root_blkg and ->root_rl.blkg during q exit. This leaves the two pointers dangling after blkg_destroy_all(). blk-throttle exit path performs blkg traversals and dereferences ->root_blkg and can lead to the following oops. BUG: unable to handle kernel NULL pointer dereference at 0000000000000558 IP: [<ffffffff81389746>] __blkg_lookup+0x26/0x70 ... task: ffff88001b4e2580 ti: ffff88001ac0c000 task.ti: ffff88001ac0c000 RIP: 0010:[<ffffffff81389746>] [<ffffffff81389746>] __blkg_lookup+0x26/0x70 ... Call Trace: [<ffffffff8138d14a>] blk_throtl_drain+0x5a/0x110 [<ffffffff8138a108>] blkcg_drain_queue+0x18/0x20 [<ffffffff81369a70>] __blk_drain_queue+0xc0/0x170 [<ffffffff8136a101>] blk_queue_bypass_start+0x61/0x80 [<ffffffff81388c59>] blkcg_deactivate_policy+0x39/0x100 [<ffffffff8138d328>] blk_throtl_exit+0x38/0x50 [<ffffffff8138a14e>] blkcg_exit_queue+0x3e/0x50 [<ffffffff8137016e>] blk_release_queue+0x1e/0xc0 ... While the bug is a straigh-forward use-after-free bug, it is tricky to reproduce because blkg release is RCU protected and the rest of exit path usually finishes before RCU grace period. This patch fixes the bug by updating blkg_destro_all() to clear q->root_blkg and ->root_rl.blkg. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: N"Richard W.M. Jones" <rjones@redhat.com> Reported-by: NJosh Boyer <jwboyer@fedoraproject.org> Link: http://lkml.kernel.org/g/CA+5PVA5rzQ0s4723n5rHBcxQa9t0cW8BPPBekr_9aMRoWt2aYg@mail.gmail.com Fixes: ec13b1d6 ("blkcg: always create the blkcg_gq for the root blkcg") Cc: stable@vger.kernel.org # v4.2+ Tested-by: NRichard W.M. Jones <rjones@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sagi Grimberg 提交于
For drivers that don't support gaps in the SG lists handed to them we must bounce (copy the user buffers) and pass a bio that does not include gaps. This doesn't matter for any current user, but will help to allow iser which can't handle gaps to use the block virtual boundary instead of using driver-local bounce buffering when handling SG_IO commands. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sagi Grimberg 提交于
This is only theoretical at the moment given that the only subsystems that generate integrity payloads are the block layer itself and the scsi target (which generate well aligned integrity payloads). But when we will expose integrity meta-data to user-space, we'll need to refuse appending a page with a gap (if the queue virtual boundary is set). Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sagi Grimberg 提交于
If a driver sets the block queue virtual boundary mask, it means that it cannot handle gaps so we must not allow those in the integrity payload as well. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Fixed up by me to have duplicate integrity merge functions, depending on whether block integrity is enabled or not. Fixes a compilations issue with CONFIG_BLK_DEV_INTEGRITY unset. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 9月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
We are checking for gaps to previous bio_vec, which can only detect back merges gaps. Moreover, at the point where we check for a gap, we don't know if we will attempt a back or a front merge. Thus, check for gap to prev in a back merge attempt and check for a gap to next in a front merge attempt. Signed-off-by: NJens Axboe <axboe@fb.com> [sagig: Minor rename change] Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
-
- 03 9月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
The compiler can't figure out that bvprv is initialized whenever 'prev' is set to 1 as well. Use a pointer to bvprv instead, setting it to NULL initially, and get rid of the 'prev' tracking. This dumbs it down enough that gcc is happy. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 9月, 2015 1 次提交
-
-
由 Keith Busch 提交于
Corrects a coding error from earlier patch. Reported by: Sagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: NKeith Busch <keith.busch@intel.com> Fixes: 03100aad ("block: Replace SG_GAPS with new queue limits mask") Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 20 8月, 2015 1 次提交
-
-
由 Keith Busch 提交于
The SG_GAPS queue flag caused checks for bio vector alignment against PAGE_SIZE, but the device may have different constraints. This patch adds a queue limits so a driver with such constraints can set to allow requests that would have been unnecessarily split. The new gaps check takes the request_queue as a parameter to simplify the logic around invoking this function. This new limit makes the queue flag redundant, so removing it and all usage. Device-mappers will inherit the correct settings through blk_stack_limits(). Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 19 8月, 2015 12 次提交
-
-
由 Tejun Heo 提交于
cgroup is trying to make interface consistent across different controllers. For weight based resource control, the knob should have the range [1, 10000] and default to 100. This patch updates cfq-iosched so that the weight range conforms. The internal calculations have enough range and the widening of the weight range shouldn't cause any problem. * blkcg_policy->cpd_bind_fn() is added. If present, this is invoked when blkcg is attached to a hierarchy. * cfq_cpd_init() is updated to use the new default value on the unified hierarchy. * cfq_cpd_bind() callback is implemented to clear per-blkg configs and apply the default config matching the hierarchy type. * cfqd->root_group->[leaf_]weight initialization in cfq_init_queue() is moved into !CONFIG_CFQ_GROUP_IOSCHED block. cfq_cpd_bind() is now responsible for initializing the initial weights when blkcg is enabled. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
blkcg is gonna switch to cgroup common weight range as defined by CGROUP_WEIGHT_* on the unified hierarchy. In preparation, rename CFQ_WEIGHT_* constants to CFQ_WEIGHT_LEGACY_*. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
blkcg interface grew to be the biggest of all controllers and unfortunately most inconsistent too. The interface files are inconsistent with a number of cloes duplicates. Some files have recursive variants while others don't. There's distinction between normal and leaf weights which isn't intuitive and there are a lot of stat knobs which don't make much sense outside of debugging and expose too much implementation details to userland. In the unified hierarchy, everything is always hierarchical and internal nodes can't have tasks rendering the two structural issues twisting the current interface. The interface has to be updated in a significant anyway and this is a good chance to revamp it as a whole. This patch implements blkcg interface for the unified hierarchy. * (from a previous patch) blkcg is identified by "io" instead of "blkio" on the unified hierarchy. Given that the whole interface is updated anyway, the rename shouldn't carry noticeable conversion overhead. * The original interface consisted of 27 files is replaced with the following three files. blkio.stat : per-blkcg stats blkio.weight : per-cgroup and per-cgroup-queue weight settings blkio.max : per-cgroup-queue bps and iops max limits Documentation/cgroups/unified-hierarchy.txt updated accordingly. v2: blkcg_policy->dfl_cftypes wasn't removed on blkcg_policy_unregister() corrupting the cftypes list. Fixed. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
* Export blkg_dev_name() * Drop unnecessary @cft from __cfq_set_weight(). Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
tg_set_conf() is largely consisted of parsing and setting the new config and the follow-up application and propagation. This patch separates out the latter part into tg_conf_updated(). This will be used to implement interface for the unified hierarchy. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
Currently, blkg_conf_prep() expects input to be of the following form MAJ:MIN NUM and reads the NUM part into blkg_conf_ctx->v. This is quite restrictive and gets in the way in implementing blkcg interface for the unified hierarchy. This patch updates blkg_conf_prep() so that it expects MAJ:MIN BODY_STR where BODY_STR is an arbitrary string. blkg_conf_ctx->v is replaced with ->body which is a char pointer pointing to the start of BODY_STR. Parsing of the body is moved to blkg_conf_prep()'s callers. To allow using, for example, strsep() on blkg_conf_ctx->val, it is a non-const pointer and to accommodate that const is dropped from @input too. This doesn't cause any behavior changes. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
blkcg is about to grow interface for the unified hierarchy. Add legacy to existing cftypes. * blkcg_policy->cftypes -> blkcg_policy->legacy_cftypes * blk-cgroup.c:blkcg_files -> blkcg_legacy_files * cfq-iosched.c:cfq_blkcg_files -> cfq_blkcg_legacy_files * blk-throttle.c:throtl_files -> throtl_legacy_files Pure renames. No functional change. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
blkio interface has become messy over time and is currently the largest. In addition to the inconsistent naming scheme, it has multiple stat files which report more or less the same thing, a number of debug stat files which expose internal details which shouldn't have been part of the public interface in the first place, recursive and non-recursive stats and leaf and non-leaf knobs. Both recursive vs. non-recursive and leaf vs. non-leaf distinctions don't make any sense on the unified hierarchy as only leaf cgroups can contain processes. cgroups is going through a major interface revision with the unified hierarchy involving significant fundamental usage changes and given that a significant portion of the interface doesn't make sense anymore, it's a good time to reorganize the interface. As the first step, this patch renames the external visible subsystem name from "blkio" to "io". This is more concise, matches the other two major subsystem names, "cpu" and "memory", and better suited as blkcg will be involved in anything writeback related too whether an actual block device is involved or not. As the subsystem legacy_name is set to "blkio", the only userland visible change outside the unified hierarchy is that blkcg is reported as "io" instead of "blkio" in the subsystem initialized message during boot. On the unified hierarchy, blkcg now appears as "io". Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
blkcg currently returns -EINVAL for most errors which can be pretty confusing given that the failure modes are quite varied. Update the error returns so that * -EINVAL only for syntactic errors. * -ERANGE if the value is out of range. * -ENODEV if the target device can't be found. * -EOPNOTSUPP if the policy is not enabled on the target device. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
blkg_to_cfqg() and blkcg_to_cfqgd() on a valid blkg with the policy enabled are guaranteed to return non-NULL and the counterpart in blk-throttle doesn't have these checks either. Remove the spurious NULL checks. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
The recent percpu conversion of blkg_rwstat triggered the following warning in certain configurations. block/blk-cgroup.c:654:1: warning: the frame size of 1360 bytes is larger than 1024 bytes This is because blkg_rwstat now contains four percpu_counter which can be pretty big depending on debug options although it shouldn't be a problem in production configs. This patch removes one of the two local blkg_rwstat variables used by blkg_rwstat_recursive_sum() to reduce stack usage. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Link: http://article.gmane.org/gmane.linux.kernel.cgroups/13835Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Tejun Heo 提交于
cfq_stats->sectors is a blkg_stat which keeps track of the total number of sectors serviced; however, this can be trivially calculated from blkcg_gq->stat_bytes. The only thing necessary is adding up READs and WRITEs and then dividing by sector size. Remove cfqg_stats->sectors and make cfq print "sectors" and "sectors_recursive" from stat_bytes. While this is a bit more code, it removes duplicate stat allocations and updates and ensures that the reported stats stay in tune with each other. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-