提交 · a29a171e7c46c60842b85729280e2f5690372683 · OpenHarmony / kernel_linux

21 5月, 2011 1 次提交

blk-throttle: Do the new group initialization with the help of a function · a29a171e

由 Vivek Goyal 提交于 5月 19, 2011

Group initialization code seems to be at two places. root group
initialization in blk_throtl_init() and dynamically allocated group
in throtl_find_alloc_tg(). Create a common function and use at both
the places.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

a29a171e

18 5月, 2011 2 次提交

block: don't delay blk_run_queue_async · 3ec717b7

由 Shaohua Li 提交于 5月 18, 2011

Let's check a scenario:
1. blk_delay_queue(q, SCSI_QUEUE_DELAY);
2. blk_run_queue_async();
the second one will became a noop, because q->delay_work already has
WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after
SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed
work runs immediately.

Fix this by doing a cancel on potentially pending delayed work
before queuing an immediate run of the workqueue.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

3ec717b7

block: Fix discard topology stacking and reporting · a934a00a

由 Martin K. Petersen 提交于 5月 18, 2011

In some cases we would end up stacking discard_zeroes_data incorrectly.
Fix this by enabling the feature by default for stacking drivers and
clearing it for low-level drivers. Incorporating a device that does not
support dzd will then cause the feature to be disabled in the stacking
driver.

Also ensure that the maximum discard value does not overflow when
exported in sysfs and return 0 in the alignment and dzd fields for
devices that don't support discard.
Reported-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

a934a00a

16 5月, 2011 1 次提交

blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup · 70087dc3

由 Vivek Goyal 提交于 5月 16, 2011

Currentlly we first map the task to cgroup and then cgroup to
blkio_cgroup. There is a more direct way to get to blkio_cgroup
from task using task_subsys_state(). Use that.

The real reason for the fix is that it also avoids a race in generic
cgroup code. During remount/umount rebind_subsystems() is called and
it can do following with and rcu protection.

cgrp->subsys[i] = NULL;

That means if somebody got hold of cgroup under rcu and then it tried
to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
is wrong. I was running into this race condition with ltp running on a
upstream derived kernel and that lead to crash.

So ideally we should also fix cgroup generic code to wait for rcu
grace period before setting pointer to NULL. Li Zefan is not very keen
on introducing synchronize_wait() as he thinks it will slow
down moun/remount/umount operations.

So for the time being atleast fix the kernel crash by taking a more
direct route to blkio_cgroup.

One tester had reported a crash while running LTP on a derived kernel
and with this fix crash is no more seen while the test has been
running for over 6 days.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

70087dc3

07 5月, 2011 5 次提交

blkdev: Do not return -EOPNOTSUPP if discard is supported · 8af1954d

由 Lukas Czerner 提交于 5月 06, 2011

Currently we return -EOPNOTSUPP in blkdev_issue_discard() if any of the
bio fails due to underlying device not supporting discard request.
However, if the device is for example dm device composed of devices
which some of them support discard and some of them does not, it is ok
for some bios to fail with EOPNOTSUPP, but it does not mean that discard
is not supported at all.

This commit removes the check for bios failed with EOPNOTSUPP and change
blkdev_issue_discard() to return operation not supported if and only if
the device does not actually supports it, not just part of the device as
some bios might indicate.

This change also fixes problem with BLKDISCARD ioctl() which now works
correctly on such dm devices.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
CC: Jens Axboe <jaxboe@fusionio.com>
CC: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

8af1954d

blkdev: Simple cleanup in blkdev_issue_zeroout() · 5baebe5c

由 Lukas Czerner 提交于 5月 06, 2011

In blkdev_issue_zeroout() we are submitting regular WRITE bios, so we do
not need to check for -EOPNOTSUPP specifically in case of error. Also
there is no need to have label submit: because there is no way to jump
out from the while cycle without an error and we really want to exit,
rather than try again. And also remove the check for (sz == 0) since at
that point sz can never be zero.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
CC: Dmitry Monakhov <dmonakhov@openvz.org>
CC: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5baebe5c

blkdev: Submit discard bio in batches in blkdev_issue_discard() · 5dba3089

由 Lukas Czerner 提交于 5月 06, 2011

Currently we are waiting for every submitted REQ_DISCARD bio separately,
but it can have unwanted consequences of repeatedly flushing the queue,
so we rather submit bios in batches and wait for the entire batch, hence
narrowing the window of other ios going in.

Use bio_batch_end_io() and struct bio_batch for that purpose, the same
is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
always set !BIO_UPTODATE in the case of error and remove the check for
bb, since we are the only user of this function and we always set this.

Remove bio_get()/bio_put() from the blkdev_issue_discard() since
bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
not needed anymore.

I have done simple dd testing with surprising results. The script I have
used is:

for i in $(seq 10); do
        echo $i
        dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
        sleep 5
done
/usr/bin/time -f %e ./blkdiscard /dev/sdc1

Running time of BLKDISCARD on the whole device:
with patch              without patch
0.95                    15.58

So we can see that in this artificial test the kernel with the patch
applied is approx 16x faster in discarding the device.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
CC: Dmitry Monakhov <dmonakhov@openvz.org>
CC: Jens Axboe <jaxboe@fusionio.com>
CC: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5dba3089

block: hold queue if flush is running for non-queueable flush drive · 3ac0cc45

由 shaohua.li@intel.com 提交于 5月 06, 2011

In some drives, flush requests are non-queueable. When flush request is
running, normal read/write requests can't run. If block layer dispatches
such request, driver can't handle it and requeue it.  Tejun suggested we
can hold the queue when flush is running. This can avoid unnecessary
requeue.  Also this can improve performance. For example, we have
request flush1, write1, flush 2. flush1 is dispatched, then queue is
hold, write1 isn't inserted to queue. After flush1 is finished, flush2
will be dispatched. Since disk cache is already clean, flush2 will be
finished very soon, so looks like flush2 is folded to flush1.

In my test, the queue holding completely solves a regression introduced by
commit 53d63e6b:

    block: make the flush insertion use the tail of the dispatch list

    It's not a preempt type request, in fact we have to insert it
    behind requests that do specify INSERT_FRONT.

which causes about 20% regression running a sysbench fileio
workload.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

3ac0cc45

block: add a non-queueable flush flag · f3876930

由 shaohua.li@intel.com 提交于 5月 06, 2011

flush request isn't queueable in some drives. Add a flag to let driver
notify block layer about this. We can optimize flush performance with the
knowledge.

Stable: 2.6.39 only

Cc: stable@kernel.org
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

f3876930

06 5月, 2011 2 次提交

iosched: remove redundant sprintf · 490b94be

由 Kees Cook 提交于 5月 05, 2011

After the anticipatory scheduler was dropped, there was no need to
special-case the request_module string. As such, drop the redundant
sprintf and stack variable.
Signed-off-by: NKees Cook <kees.cook@canonical.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

490b94be

block: Remove 'plug/unplug' comment in blk_execute_rq_nowait · addd0a09

由 Tao Ma 提交于 5月 05, 2011

unplug is replaced with blk_run_queue now in blk_execute_rq_nowait,
so change the comment accordingly.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

addd0a09

22 4月, 2011 2 次提交

block: don't propagate unlisted DISK_EVENTs to userland · 7c88a168

由 Tejun Heo 提交于 4月 21, 2011

DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices.  Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances.  For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.

This patch updates disk event infrastructure such that it never
propagates events not listed in disk->events to userland.  Those
events are processed the same for internal purposes but uevent
generation is suppressed.

This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7c88a168

elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too · 3aa72873

由 Jens Axboe 提交于 4月 21, 2011

The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

3aa72873

19 4月, 2011 6 次提交

block: Remove the extra check in queue_requests_store · 60735b63

由 Tao Ma 提交于 4月 19, 2011

In queue_requests_store, the code looks like
	if (rl->count[BLK_RW_SYNC] >= q->nr_requests) {
		blk_set_queue_full(q, BLK_RW_SYNC);
	} else if (rl->count[BLK_RW_SYNC]+1 <= q->nr_requests) {
		blk_clear_queue_full(q, BLK_RW_SYNC);
		wake_up(&rl->wait[BLK_RW_SYNC]);
	}
If we don't satify the situation of "if", we can get that
rl->count[BLK_RW_SYNC} < q->nr_quests. It is the same as
rl->count[BLK_RW_SYNC]+1 <= q->nr_requests.
All the "else" should satisfy the "else if" check so it isn't
needed actually.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

60735b63

block, blk-sysfs: Fix an err return path in blk_register_queue() · ed5302d3

由 Liu Yuan 提交于 4月 19, 2011

We do not call blk_trace_remove_sysfs() in err return path
if kobject_add() fails. This path fixes it.

Cc: stable@kernel.org
Signed-off-by: NLiu Yuan <tailai.ly@taobao.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

ed5302d3

block: remove stale kerneldoc member from __blk_run_queue() · d350e6b6

由 Jens Axboe 提交于 4月 19, 2011

We don't pass in a 'force_kblockd' anymore, get rid of the
stsale comment.
Reported-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

d350e6b6

block: get rid of QUEUE_FLAG_REENTER · c21e6beb

由 Jens Axboe 提交于 4月 19, 2011

We are currently using this flag to check whether it's safe
to call into ->request_fn(). If it is set, we punt to kblockd.
But we get a lot of false positives and excessive punts to
kblockd, which hurts performance.

The only real abuser of this infrastructure is SCSI. So export
the async queue run and convert SCSI over to use that. There's
room for improvement in that SCSI need not always use the async
call, but this fixes our performance issue and they can fix that
up in due time.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c21e6beb

cfq-iosched: read_lock() does not always imply rcu_read_lock() · 5f45c695

由 Jens Axboe 提交于 4月 19, 2011

For some configurations of CONFIG_PREEMPT that is not true. So
get rid of __call_for_each_cic() and always uses the explicitly
rcu_read_lock() protected call_for_each_cic() instead.

This fixes a potential bug related to IO scheduler removal or
online switching.

Thanks to Paul McKenney for clarifying this.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5f45c695

block: kill blk_flush_plug_list() export · bd900d45

由 Jens Axboe 提交于 4月 18, 2011

With all drivers and file systems converted, we only have
in-core use of this function. So remove the export.
Reporteed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

bd900d45

18 4月, 2011 5 次提交

block: add blk_run_queue_async · 24ecfbe2

由 Christoph Hellwig 提交于 4月 18, 2011

Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly.  I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

24ecfbe2

J
block: blk_delay_queue() should use kblockd workqueue · 4521cc4e
由 Jens Axboe 提交于 4月 18, 2011
```
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
```
4521cc4e

block: drop queue lock before calling __blk_run_queue() for kblockd punt · 99e22598

由 Jens Axboe 提交于 4月 18, 2011

If we know we are going to punt to kblockd, we can drop the queue
lock before calling into __blk_run_queue() since it only does a
safe bit test and a workqueue call. Since kblockd needs to grab
this very lock as one of the first things it does, it's a good
optimization to drop the lock before waking kblockd.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

99e22598

Revert "block: add callback function for unplug notification" · b4cb290e

由 Jens Axboe 提交于 4月 18, 2011

MD can't use this since it really requires us to be able to
keep more than a single piece of state for the unplug. Commit
048c9374 added the required support for MD, so get rid of this
now unused code.

This reverts commit f7566457.

Conflicts:

	block/blk-core.c
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

b4cb290e

block: Enhance new plugging support to support general callbacks · 048c9374

由 NeilBrown 提交于 4月 18, 2011

md/raid requires an unplug callback, but as it does not uses
requests the current code cannot provide one.

So allow arbitrary callbacks to be attached to the blk_plug.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

048c9374

16 4月, 2011 1 次提交

block: make unplug timer trace event correspond to the schedule() unplug · 49cac01e

由 Jens Axboe 提交于 4月 16, 2011

It's a pretty close match to what we had before - the timer triggering
would mean that nobody unplugged the plug in due time, in the new
scheme this matches very closely what the schedule() unplug now is.
It's essentially the difference between an explicit unplug (IO unplug)
or an implicit unplug (timer unplug, we scheduled with pending IO
queued).
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

49cac01e

15 4月, 2011 2 次提交

block: only force kblockd unplugging from the schedule() path · f6603783

由 Jens Axboe 提交于 4月 15, 2011

For the explicit unplugging, we'd prefer to kick things off
immediately and not pay the penalty of the latency to switch
to kblockd. So let blk_finish_plug() do the run inline, while
the implicit-on-schedule-out unplug will punt to kblockd.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

f6603783

block: cleanup the block plug helper functions · 88b996cd

由 Christoph Hellwig 提交于 4月 15, 2011

It's a bit of a mess currently. task->plug is being cleared
and reset in __blk_finish_plug(), and blk_finish_plug() is
testing for a NULL plug which cannot happen even from schedule()
anymore since it uses blk_needs_flush_plug() to determine
whether to call into this function at all.

So get rid of some of the cruft.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

88b996cd

14 4月, 2011 1 次提交

block, blk-sysfs: Use the variable directly instead of a function call · 80656b67

由 Liu Yuan 提交于 4月 13, 2011

In the function blk_register_queue(), var _dev_ is already assigned by
disk_to_dev().So use it directly instead of calling disk_to_dev() again.
Signed-off-by: NLiu Yuan <tailai.ly@taobao.com>

Modified by me to delete an empty line in the same function while
in there anyway.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

80656b67

12 4月, 2011 6 次提交

block: move queue run on unplug to kblockd · f4af3c3d

由 Jens Axboe 提交于 4月 12, 2011

There are worries that we are now consuming a lot more stack in
some cases, since we potentially call into IO dispatch from
schedule() or io_schedule(). We can reduce this problem by moving
the running of the queue to kblockd, like the old plugging scheme
did as well.

This may or may not be a good idea from a performance perspective,
depending on how many tasks have queue plugs running at the same
time. For even the slightly contended case, doing just a single
queue run from kblockd instead of multiple runs directly from the
unpluggers will be faster.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

f4af3c3d

block: kill queue_sync_plugs() · cf82c798

由 Jens Axboe 提交于 4月 12, 2011

The original use for this dates back to when we had to track write
requests for serializing around barriers. That's not needed anymore,
so kill it.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

cf82c798

block: readd plug trace event · dc6d36c9

由 Jens Axboe 提交于 4月 12, 2011

This was removed with the queue plug state. But we can easily readd
by checking if this is the first request going to this queue. It's
good information to have when tracing to see how effective the
plugging is.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

dc6d36c9

block: add callback function for unplug notification · f7566457

由 Jens Axboe 提交于 4月 12, 2011

MD would like to know when a queue is unplugged, so it can flush
it's bitmap writes. Add such a callback.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

f7566457

J
block: add comment on why we save and disable interrupts in flush_plug_list() · 18811272
由 Jens Axboe 提交于 4月 12, 2011
```
It's done at the top to avoid doing it for every queue we unplug.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
```
18811272

block: fixup block IO unplug trace call · 94b5eb28

由 Jens Axboe 提交于 4月 12, 2011

It was removed with the on-stack plugging, readd it and track the
depth of requests added when flushing the plug.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

94b5eb28

11 4月, 2011 1 次提交

block: splice plug list to local context · 109b8129

由 NeilBrown 提交于 4月 11, 2011

If the request_fn ends up blocking, we could be re-entering
the plug flush. Since the list is protected by explicitly
not allowing schedule events, this isn't a terribly good idea.

Additionally, it can cause us to recurse. As request_fn called by
__blk_run_queue is allowed to 'schedule()' (after dropping the queue
lock of course), it is possible to get a recursive call:

 schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
      -> __blk_run_queue -> request_fn -> schedule

We must make sure that the second schedule does not call into
blk_flush_plug again.  So instead of leaving the list of requests on
blk_plug->list, move them to a separate list leaving blk_plug->list
empty.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

109b8129

06 4月, 2011 5 次提交

block: fix request sorting at unplug · f83e8261

由 Konstantin Khlebnikov 提交于 4月 04, 2011

Comparison function for list_sort() must be anticommutative,
otherwise it is not sorting in ordinary meaning.

But fortunately list_sort() always check ((*cmp)(priv, a, b) <= 0)
it not distinguish negative and zero, so comparison function can
implement only less-or-equal instead of full three-way comparison.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

f83e8261

dm: improve block integrity support · a63a5cf8

由 Mike Snitzer 提交于 4月 01, 2011

The current block integrity (DIF/DIX) support in DM is verifying that
all devices' integrity profiles match during DM device resume (which
is past the point of no return).  To some degree that is unavoidable
(stacked DM devices force this late checking).  But for most DM
devices (which aren't stacking on other DM devices) the ideal time to
verify all integrity profiles match is during table load.

Introduce the notion of an "initialized" integrity profile: a profile
that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
template.  Add blk_integrity_is_initialized() to allow checking if a
profile was initialized.

Update DM integrity support to:
- check all devices with _initialized_ integrity profiles match
  during table load; uninitialized profiles (e.g. for underlying DM
  device(s) of a stacked DM device) are ignored.
- disallow a table load that would result in an integrity profile that
  conflicts with a DM device's existing (in-use) integrity profile
- avoid clearing an existing integrity profile
- validate all integrity profiles match during resume; but if they
  don't all we can do is report the mismatch (during resume we're past
  the point of no return)
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

a63a5cf8

blk-throttle: don't call xchg on bool · 6f037937

由 Andreas Schwab 提交于 3月 30, 2011

xchg does not work portably with smaller than 32bit types.
Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6f037937

block: make the flush insertion use the tail of the dispatch list · 53d63e6b

由 Jens Axboe 提交于 3月 30, 2011

It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

53d63e6b

block: get rid of elv_insert() interface · b710a480

由 Jens Axboe 提交于 3月 30, 2011

Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

b710a480

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多