提交 · 25985edcedea6396277003854657b5f3cb31a628 · openeuler / Kernel

31 3月, 2011 1 次提交

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

26 3月, 2011 2 次提交

block: fix issue with calling blk_stop_queue() from the request_fn handler · ad3d9d7e

由 Jens Axboe 提交于 3月 25, 2011

When the queue work handler was converted to delayed work, the
stopping was inadvertently made sync as well. Change this back
to being async stop, using __cancel_delayed_work() instead of
cancel_delayed_work().
Reported-by: NJeremy Fitzhardinge <jeremy@goop.org>
Reported-by: NChris Mason <chris.mason@oracle.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

ad3d9d7e

block: fix bug with inserting flush requests as sort/merge · 401a18e9

由 Jens Axboe 提交于 3月 25, 2011

With the introduction of the on-stack plugging, we would assume
that any request being inserted was a normal file system request.
As flush/fua requires a special insert mode, this caused problems.

Fix this up by checking for this in flush_plug_list() and use
the appropriate insert mechanism.

Big thanks goes to Markus Tripplesdorf for tirelessly testing
patches, and to Sergey Senozhatsky for helping find the real
issue.
Reported-by: NMarkus Tripplesdorf <markus@trippelsdorf.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

401a18e9

23 3月, 2011 5 次提交

cfq-iosched: removing unnecessary think time checking · c4ade94f

由 Li, Shaohua 提交于 3月 23, 2011

Removing think time checking. A high thinktime queue might means the queue
dispatches several requests and then do away. Limitting such queue seems
meaningless. And also this can simplify code. This is suggested by Vivek.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c4ade94f

cfq-iosched: Don't clear queue stats when preempt. · 62a37f6b

由 Justin TerAvest 提交于 3月 23, 2011

For v2, I added back lines to cfq_preempt_queue() that were removed
during updates for accounting unaccounted_time. Thanks for pointing out
that I'd missed these, Vivek.

Previous commit "cfq-iosched: Don't set active queue in preempt" wrongly
cleared stats for preempting queues when it shouldn't have, because when
we choose a queue to preempt, it still isn't necessarily scheduled next.

Thanks to Vivek Goyal for figuring this out and understanding how the
preemption code works.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

62a37f6b

blk-throttle: Reset group slice when limits are changed · 04521db0

由 Vivek Goyal 提交于 3月 22, 2011

Lina reported that if throttle limits are initially very high and then
dropped, then no new bio might be dispatched for a long time. And the
reason being that after dropping the limits we don't reset the existing
slice and do the rate calculation with new low rate and account the bios
dispatched at high rate. To fix it, reset the slice upon rate change.

https://lkml.org/lkml/2011/3/10/298

Another problem with very high limit is that we never queued the
bio on throtl service tree. That means we kept on extending the
group slice but never trimmed it. Fix that also by regulary
trimming the slice even if bio is not being queued up.
Reported-by: NLina Lu <lulina_nuaa@foxmail.com>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

04521db0

blk-cgroup: Only give unaccounted_time under debug · 9026e521

由 Justin TerAvest 提交于 3月 22, 2011

This change moves unaccounted_time to only be reported when
CONFIG_DEBUG_BLK_CGROUP is true.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

9026e521

cfq-iosched: Don't set active queue in preempt · eda5e0c9

由 Justin TerAvest 提交于 3月 22, 2011

Commit "Add unaccounted time to timeslice_used" changed the behavior of
cfq_preempt_queue to set cfqq active. Vivek pointed out that other
preemption rules might get involved, so we shouldn't manually set which
queue is active.

This cleans up the code to just clear the queue stats at preemption
time.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

eda5e0c9

21 3月, 2011 1 次提交

block: attempt to merge with existing requests on plug flush · 5e84ea3a

由 Jens Axboe 提交于 3月 21, 2011

One of the disadvantages of on-stack plugging is that we potentially
lose out on merging since all pending IO isn't always visible to
everybody. When we flush the on-stack plugs, right now we don't do
any checks to see if potential merge candidates could be utilized.

Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE.
It works just ELEVATOR_INSERT_SORT, but first checks whether we can
merge with an existing request before doing the insertion (if we fail
merging).

This fixes a regression with multiple processes issuing IO that
can be merged.

Thanks to Shaohua Li <shaohua.li@intel.com> for testing and fixing
an accounting bug.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5e84ea3a

17 3月, 2011 1 次提交

cfq-iosched: Don't update group weights when on service tree · 8184f93e

由 Justin TerAvest 提交于 3月 17, 2011

Version 3 is updated to apply to for-2.6.39/core.

For version 2, I took Vivek's advice and made sure we update the group
weight from cfq_group_service_tree_add().

If a weight was updated while a group is on the service tree, the
calculation for the total weight of the service tree can be adjusted
improperly, which either leads to bad service tree weights, or
potentially crashes (if total_weight becomes 0).

This patch defers updates to the weight until a group is off the service
tree.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

8184f93e

12 3月, 2011 2 次提交

blk-cgroup: Add unaccounted time to timeslice_used. · 167400d3

由 Justin TerAvest 提交于 3月 12, 2011

There are two kind of times that tasks are not charged for: the first
seek and the extra time slice used over the allocated timeslice. Both
of these exported as a new unaccounted_time stat.

I think it would be good to have this reported in 'time' as well, but
that is probably a separate discussion.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

167400d3

block: remove obsolete comments for blkdev_issue_zeroout. · eba2ed9c

由 Tao Ma 提交于 3月 11, 2011

barrier is already removed, so remove the obsolete comments
in blkdev_issue_zeroout.

Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

eba2ed9c

11 3月, 2011 1 次提交

block: fix mis-synchronisation in blkdev_issue_zeroout() · 0aeea189

由 Lukas Czerner 提交于 3月 11, 2011

BZ29402
https://bugzilla.kernel.org/show_bug.cgi?id=29402

We can hit serious mis-synchronization in bio completion path of
blkdev_issue_zeroout() leading to a panic.

The problem is that when we are going to wait_for_completion() in
blkdev_issue_zeroout() we check if the bb.done equals issued (number of
submitted bios). If it does, we can skip the wait_for_completition()
and just out of the function since there is nothing to wait for.
However, there is a ordering problem because bio_batch_end_io() is
calling atomic_inc(&bb->done) before complete(), hence it might seem to
blkdev_issue_zeroout() that all bios has been completed and exit. At
this point when bio_batch_end_io() is going to call complete(bb->wait),
bb and wait does not longer exist since it was allocated on stack in
blkdev_issue_zeroout() ==> panic!

(thread 1)                      (thread 2)
bio_batch_end_io()              blkdev_issue_zeroout()
  if(bb) {                      ...
    if (bb->end_io)             ...
      bb->end_io(bio, err);     ...
    atomic_inc(&bb->done);      ...
    ...                         while (issued != atomic_read(&bb.done))
    ...                         (let issued == bb.done)
    ...                         (do the rest of the function)
    ...                         return ret;
    complete(bb->wait);
    ^^^^^^^^
    panic

We can fix this easily by simplifying bio_batch and completion counting.

Also remove bio_end_io_t *end_io since it is not used.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reported-by: NEric Whitney <eric.whitney@hp.com>
Tested-by: NEric Whitney <eric.whitney@hp.com>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
CC: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

0aeea189

10 3月, 2011 6 次提交

blk-throttle: Use blk_plug in throttle dispatch · 69d60eb9

由 Vivek Goyal 提交于 3月 09, 2011

Use plug in throttle dispatch also as we are dispatching a bunch of
bios in throttle context and some of them might merge.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

69d60eb9

block: kill off REQ_UNPLUG · 721a9602

由 Jens Axboe 提交于 3月 09, 2011

With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

721a9602

block: remove per-queue plugging · 7eaceacc

由 Jens Axboe 提交于 3月 10, 2011

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7eaceacc

block: initial patch for on-stack per-task plugging · 73c10101

由 Jens Axboe 提交于 3月 08, 2011

This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.

The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.

The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

73c10101

block: add API for delaying work/request_fn a little bit · 3cca6dc1

由 Jens Axboe 提交于 3月 02, 2011

Currently we use plugging for that, but as plugging is going away,
we need an alternative mechanism.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

3cca6dc1

block: Don't implicitly trigger event check on disk_unblock_events() · facc31dd

由 Tejun Heo 提交于 3月 09, 2011

Currently, disk_unblock_events() implicitly kick event check if the
block count reaches zero.  This behavior is not described in the
comment and hinders with future changes.  Make the unblocker
explicitly check events by calling disk_check_events() as necessary.

This patch doesn't cause any behavior difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>

facc31dd

09 3月, 2011 1 次提交

blk-cgroup: Lower minimum weight from 100 to 10. · df457f84

由 Justin TerAvest 提交于 3月 08, 2011

We've found that we still get good, useful isolation at weights this
low. I'd like to adjust the minimum so that any other changes can take
these values into account.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

df457f84

08 3月, 2011 2 次提交

blk-throttle: Some cleanups and race fixes in limit update code · de701c74

由 Vivek Goyal 提交于 3月 07, 2011

When throttle group limits are updated through cgroups, a thread is
woken up to process these updates. While reviewing that code, oleg noted
couple of race conditions existed in the code and he also suggested that
code can be simplified.

This patch fixes the races simplifies the code based on Oleg's suggestions:

	- Use xchg().
	- Introduced a common function throtl_update_blkio_group_common()
          which is shared now by all iops/bps update functions.
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>

Fixed a merge issue, throtl_schedule_delayed_work() takes throtl_data
as the argument now, not the queue.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

de701c74

blk-throttle: process limit change only through one function · 231d704b

由 Vivek Goyal 提交于 3月 07, 2011

With the help of cgroup interface one can go and upate the bps/iops
limits of existing group. Once the limits are udpated, a thread is
woken up to see if some blocked group needs recalculation based on new
limits and needs to be requeued.

There was also a piece of code where I was checking for group limit
update when a fresh bio comes in. This patch gets rid of that piece of
code and keeps processing the limit change at one place
throtl_process_limit_change().  It just keeps the code simple and easy
to understand.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

231d704b

07 3月, 2011 3 次提交

cfq-iosched: Fix update_vdisktime logic · a6032710

由 Gui Jianfeng 提交于 3月 07, 2011

The update_vdisktime logic is broken since commit
b54ce60e, st->min_vdisktime never makes
a progress. Fix it.

Thanks Vivek for pointing it out.
Signed-off-by: NGui Jianfeng <guijianfen@cn.fujitsu.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

a6032710

cfq-iosched: give busy sync queue no dispatch limit · ef8a41df

由 Shaohua Li 提交于 3月 07, 2011

If there are a sync and an async queue and the sync queue's think time
is small, we can ignore the sync queue's dispatch quantum. Because the
sync queue will always preempt the async queue, we don't need to care
about async's latency.  This can fix a performance regression of
aiostress test, which is introduced by commit f8ae6e3e. The issue
should exist even without the commit, but the commit amplifies the
impact.

The initial post does the same optimization for RT queue too, but since
I have no real workload for it, Vivek suggests to drop it.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

ef8a41df

cfq-iosched: fix race in cfq_set_request() · 93803e01

由 Jens Axboe 提交于 3月 07, 2011

We need to hold the queue lock over the reference increment,
it's not atomic anymore.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

93803e01

03 3月, 2011 3 次提交

block: Move blk_throtl_exit() call to blk_cleanup_queue() · da527770

由 Vivek Goyal 提交于 3月 02, 2011

Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is
written in such a way that it needs queue lock. In blk_release_queue()
there is no gurantee that ->queue_lock is still around.

Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported
one problem.

  https://lkml.org/lkml/2010/10/23/86

  And a quick fix moved blk_throtl_exit() to blk_release_queue().

        commit 7ad58c02
        Author: Jens Axboe <jaxboe@fusionio.com>
        Date:   Sat Oct 23 20:40:26 2010 +0200

        block: fix use-after-free bug in blk throttle code

This patch reverts above change and does not try to shutdown the
throtl work in blk_sync_queue(). By avoiding call to
throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid
the problem reported by Ingo.

blk_sync_queue() seems to be used only by md driver and it seems to be
using it to make sure q->unplug_fn is not called as md registers its
own unplug functions and it is about to free up the data structures
used by unplug_fn(). Block throttle does not call back into unplug_fn()
or into md. So there is no need to cancel blk throttle work.

In fact I think cancelling block throttle work is bad because it might
happen that some bios are throttled and scheduled to be dispatched later
with the help of pending work and if work is cancelled, these bios might
never be dispatched.

Block layer also uses blk_sync_queue() during blk_cleanup_queue() and
blk_release_queue() time. That should be safe as we are also calling
blk_throtl_exit() which should make sure all the throttling related
data structures are cleaned up.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

da527770

block: Initialize ->queue_lock to internal lock at queue allocation time · c94a96ac

由 Vivek Goyal 提交于 3月 02, 2011

There does not seem to be a clear convention whether q->queue_lock is
initialized or not when blk_cleanup_queue() is called. In the past it
was not necessary but now blk_throtl_exit() takes up queue lock by
default and needs queue lock to be available.

In fact elevator_exit() code also has similar requirement just that it
is less stringent in the sense that elevator_exit() is called only if
elevator is initialized.

Two problems have been noticed because of ambiguity about spin lock
status.

      - If a driver calls blk_alloc_queue() and then soon calls
        blk_cleanup_queue() almost immediately, (because some other
	driver structure allocation failed or some other error happened)
	then blk_throtl_exit() will run into issues as queue lock is not
	initialized. Loop driver ran into this issue recently and I
	noticed error paths in md driver too. Similar error paths should
	exist in other drivers too.

      - If some driver provided external spin lock and zapped the lock
        before blk_cleanup_queue(), then it can lead to issues.

So this patch initializes the default queue lock at queue allocation time.

block throttling code is one of the users of queue lock and it is
initialized at the queue allocation time, so it makes sense to
initialize ->queue_lock also to internal lock. A driver can overide that
lock later. This will take care of the issue where a driver does not have
to worry about initializing the queue lock to default before calling
blk_cleanup_queue()
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c94a96ac

block/genhd: Change some numerals into macros · 53f22956

由 Liu Yuan 提交于 3月 02, 2011

Rename the numerals in the diskstats_show() into the macros.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NLiu Yuan <tailai.ly@taobao.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

53f22956

02 3月, 2011 5 次提交

block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue() · 255bb490

由 Tejun Heo 提交于 3月 02, 2011

blk-flush decomposes a flush into sequence of multiple requests.  On
completion of a request, the next one is queued; however, block layer
must not implicitly call into q->request_fn() directly from completion
path.  This makes the queue behave unexpectedly when seen from the
drivers and violates the assumption that q->request_fn() is called
with process context + queue_lock.

This patch makes blk-flush the following two changes to make sure
q->request_fn() is not called directly from request completion path.

- blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
  use kblockd instead of calling directly into q->request_fn().

- queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
  ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
  request queue directly.

Reported by Jan in the following threads.

 http://thread.gmane.org/gmane.linux.ide/48778
 http://thread.gmane.org/gmane.linux.ide/48786

stable: applicable to v2.6.37.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJan Beulich <JBeulich@novell.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

255bb490

block: add @force_kblockd to __blk_run_queue() · 1654e741

由 Tejun Heo 提交于 3月 02, 2011

__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd.  Add @force_kblockd.

All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.

stable: This is prerequisite for fixing ide oops caused by the new
        blk-flush implementation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

1654e741

cfq-iosched: Always provide group isolation. · 0bbfeb83

由 Justin TerAvest 提交于 3月 01, 2011

Effectively, make group_isolation=1 the default and remove the tunable.
The setting group_isolation=0 was because by default we idle on
sync-noidle tree and on fast devices, this can be very harmful for
throughput.

However, this problem can also be addressed by tuning slice_idle and
possibly group_idle on faster storage devices.

This change simplifies the CFQ code by removing the feature entirely.
Signed-off-by: NJustin TerAvest <teravest@google.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

0bbfeb83

block: fix kernel-doc format for blkdev_issue_zeroout · 291d24f6

由 Ben Hutchings 提交于 3月 01, 2011

Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

291d24f6

blk-throttle: Do not use kblockd workqueue for throtl work · 450adcbe

由 Vivek Goyal 提交于 3月 01, 2011

o Dominik Klein reported a system hang issue while doing some blkio
  throttling testing.

  https://lkml.org/lkml/2011/2/24/173

o Some tracing revealed that CFQ was not dispatching any more jobs as
  queue unplug was not happening. And queue unplug was not happening
  because unplug work was not being called as there was one throttling
  work on same cpu which as not finished yet. And throttling work had not
  finished as it was tyring to dispatch a bio to CFQ but all the request
  descriptors were consume to it was put to sleep.

o So basically it is a cyclic dependecny between CFQ unplug work and
  throtl dispatch work. Tejun suggested that use separate workqueue for
  such cases.

o This patch uses a separate workqueue for throttle related work and
  does not rely on kblockd workqueue anymore.

Cc: stable@kernel.org
Reported-by: NDominik Klein <dk@in-telegence.net>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

450adcbe

25 2月, 2011 1 次提交

block: fix refcounting in BLKBSZSET · 3c522ced

由 Miklos Szeredi 提交于 2月 24, 2011

Adam Kovari and others reported that disconnecting an USB drive with
an ntfs-3g filesystem would cause "kernel BUG at fs/inode.c:1421!" to
be triggered.

The BUG could be traced back to ioctl(BLKBSZSET), which would
erroneously decrement the refcount on the bdev.  This is because
blkdev_get() expects the refcount to be already incremented and either
returns success or decrements the refcount and returns an error.

The bug was introduced by e525fd89 (block: make blkdev_get/put()
handle exclusive access), which didn't take into account this behavior
of blkdev_get().

This fixes
  https://bugzilla.kernel.org/show_bug.cgi?id=29202
(and likely 29792 too)
Reported-by: NAdam Kovari <kovariadam@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3c522ced

24 2月, 2011 1 次提交

Fix over-zealous flush_disk when changing device size. · 93b270f7

由 NeilBrown 提交于 2月 24, 2011

There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data.  In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef1 which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Cc: stable@kernel.org
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Cc: Andrew Patterson <andrew.patterson@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NNeilBrown <neilb@suse.de>

93b270f7

13 2月, 2011 1 次提交

[SCSI] block: improve detail in I/O error messages · 79775567

由 Hannes Reinecke 提交于 1月 18, 2011

Classify severity of I/O errors for target, nexus, and
transport errors.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NHannes Reinecke <hare@suse.de>
Acked-by: NJens Axboe <jaxboe@fusionio.com>
Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>

79775567

11 2月, 2011 2 次提交

block: share request flush fields with elevator_private · c186794d

由 Mike Snitzer 提交于 2月 11, 2011

Flush requests are never put on the IO scheduler.  Convert request
structure's elevator_private* into an array and have the flush fields
share a union with it.

Reclaim the space lost in 'struct request' by moving 'completion_data'
back in the union with 'rb_node'.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c186794d

block: skip elevator data initialization for flush requests · 9d5a4e94

由 Mike Snitzer 提交于 2月 11, 2011

Skip elevator initialization for flush requests by passing priv=0 to
blk_alloc_request() in get_request().  As such elv_set_request() is
never called for flush requests.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

9d5a4e94

09 2月, 2011 1 次提交

cfq-iosched: Don't wait if queue already has requests. · 02a8f01b

由 Justin TerAvest 提交于 2月 09, 2011

Commit 7667aa06 added logic to wait for
the last queue of the group to become busy (have at least one request),
so that the group does not lose out for not being continuously
backlogged. The commit did not check for the condition that the last
queue already has some requests. As a result, if the queue already has
requests, wait_busy is set. Later on, cfq_select_queue() checks the
flag, and decides that since the queue has a request now and wait_busy
is set, the queue is expired.  This results in early expiration of the
queue.

This patch fixes the problem by adding a check to see if queue already
has requests. If it does, wait_busy is not set. As a result, time slices
do not expire early.

The queues with more than one request are usually buffered writers.
Testing shows improvement in isolation between buffered writers.

Cc: stable@kernel.org
Signed-off-by: NJustin TerAvest <teravest@google.com>
Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

02a8f01b

25 1月, 2011 1 次提交

block: reimplement FLUSH/FUA to support merge · ae1b1539

由 Tejun Heo 提交于 1月 25, 2011

The current FLUSH/FUA support has evolved from the implementation
which had to perform queue draining.  As such, sequencing is done
queue-wide one flush request after another.  However, with the
draining requirement gone, there's no reason to keep the queue-wide
sequential approach.

This patch reimplements FLUSH/FUA support such that each FLUSH/FUA
request is sequenced individually.  The actual FLUSH execution is
double buffered and whenever a request wants to execute one for either
PRE or POSTFLUSH, it queues on the pending queue.  Once certain
conditions are met, a flush request is issued and on its completion
all pending requests proceed to the next sequence.

This allows arbitrary merging of different type of flushes.  How they
are merged can be primarily controlled and tuned by adjusting the
above said 'conditions' used to determine when to issue the next
flush.

This is inspired by Darrick's patches to merge multiple zero-data
flushes which helps workloads with highly concurrent fsync requests.

* As flush requests are never put on the IO scheduler, request fields
  used for flush share space with rq->rb_node.  rq->completion_data is
  moved out of the union.  This increases the request size by one
  pointer.

  As rq->elevator_private* are used only by the iosched too, it is
  possible to reduce the request size further.  However, to do that,
  we need to modify request allocation path such that iosched data is
  not allocated for flush requests.

* FLUSH/FUA processing happens on insertion now instead of dispatch.

- Comments updated as per Vivek and Mike.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

ae1b1539

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功