提交 · 80268ee9270ebe4847365a7426de91d179e870d0 · openeuler / raspberrypi-kernel

13 10月, 2008 3 次提交

md: Don't try to set an array to 'read-auto' if it is already in that state. · 80268ee9

由 NeilBrown 提交于 10月 13, 2008

'read-auto' is a variant of 'readonly' which will switch to writable
on the first write attempt.

Calling do_md_stop to set the array readonly when it is already readonly
returns an error.  So make sure not to do that.
Signed-off-by: NNeilBrown <neilb@suse.de>

80268ee9

md: Allow metadata_version to be updated for externally managed metadata. · ea43ddd8

由 NeilBrown 提交于 10月 13, 2008

For externally managed metadata, the 'metadata_version' sysfs
attribute is really just a channel for user-space programs to
communicate about how the array is being managed.
It can be useful for this to be changed while the array is active.

Normally changes to metadata_version are not permitted while the array
is active.  Change that so that if the metadata is externally managed,
the metadata_version can be changed to a different flavour of external
management.
Signed-off-by: NNeilBrown <neilb@suse.de>

ea43ddd8

md: Fix rdev_size_store with size == 0 · 7d3c6f87

由 Chris Webb 提交于 10月 13, 2008


Fix rdev_size_store with size == 0.
size == 0 means to use the largest size allowed by the
underlying device and is used when modifying an active array.

This fixes a regression introduced by
 commit d7027458

Cc: <stable@kernel.org>
Signed-off-by: NChris Webb <chris@arachsys.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7d3c6f87

10 10月, 2008 18 次提交

dm: detect lost queue · 0c2322e4

由 Alasdair G Kergon 提交于 10月 10, 2008

Detect and report buggy drivers that destroy their request_queue.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
Cc: Stefan Raspl <raspl@linux.vnet.ibm.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

0c2322e4

dm: publish dm_vcalloc · 54160904

由 Mikulas Patocka 提交于 10月 10, 2008

Publish dm_vcalloc in include/linux/device-mapper.h because this function is
used by targets.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

54160904

dm: publish dm_table_unplug_all · ea0ec640

由 Mikulas Patocka 提交于 10月 10, 2008

Publish dm_table_unplug_all in include/linux/device-mapper.h because this
function is used by targets.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

ea0ec640

dm: publish dm_get_mapinfo · 89343da0

由 Mikulas Patocka 提交于 10月 10, 2008

Publish dm_get_mapinfo in include/linux/device-mapper.h because this function
is used by targets.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

89343da0

dm: export struct dm_dev · 82b1519b

由 Mikulas Patocka 提交于 10月 10, 2008

Split struct dm_dev in two and publish the part that other targets need in
include/linux/device-mapper.h.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

82b1519b

dm crypt: avoid unnecessary wait when splitting bio · 933f01d4

由 Milan Broz 提交于 10月 10, 2008

Don't wait between submitting crypt requests for a bio unless
we are short of memory.

There are two situations when we must split an encrypted bio:
  1) there are no free pages;
  2) the new bio would violate underlying device restrictions
(e.g. max hw segments).

In case (2) we do not need to wait.

Add output variable to crypt_alloc_buffer() to distinguish between
these cases.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

933f01d4

dm crypt: tidy ctx pending · c8081618

由 Milan Broz 提交于 10月 10, 2008

Move the initialisation of ctx->pending into one place, at the
start of crypt_convert().

Introduce crypt_finished to indicate whether or not the encryption
is finished, for use in a later patch.

No functional change.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c8081618

dm crypt: fix async inc_pending · 4e594098

由 Milan Broz 提交于 10月 10, 2008

The pending reference count must be incremented *before* the async work is
queued to another thread, not after. Otherwise there's a race if the
work completes and decrements the reference count before it gets incremented.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4e594098

dm crypt: move dec_pending on error into write_io_submit · 6c031f41

由 Milan Broz 提交于 10月 10, 2008

Make kcryptd_crypt_write_io_submit() responsible for decrementing
the pending count after an error.

Also fixes a bug in the async path that forgot to decrement it.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6c031f41

dm crypt: remove inc_pending from write_io_submit · 1e37bb8e

由 Alasdair G Kergon 提交于 10月 10, 2008

Make the caller reponsible for incrementing the pending count before calling
kcryptd_crypt_write_io_submit() in the non-async case to bring it into line
with the async case.
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1e37bb8e

dm crypt: tidy write loop pending · fc5a5e9a

由 Milan Broz 提交于 10月 10, 2008

Move kcryptd_crypt_write_convert_loop inside kcryptd_crypt_write_convert.
This change is needed for a later patch.

No functional change.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fc5a5e9a

dm crypt: tidy crypt alloc · dc440d1e

由 Milan Broz 提交于 10月 10, 2008

Factor out crypt io allocation code.
Later patches will call it from another place.

No functional change.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

dc440d1e

dm crypt: tidy inc pending · 3e1a8bdd

由 Milan Broz 提交于 10月 10, 2008

Move io pending to one place.

No functional change, usefull to simplify debugging.
Signed-off-by: NMilan Broz <mbroz@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

3e1a8bdd

dm exception store: use chunk_t for_areas · fd14acf6

由 Mikulas Patocka 提交于 10月 10, 2008

Change uint32_t into chunk_t to remove 32-bit limitation on the
number of chunks on systems with 64-bit sector numbers.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fd14acf6

dm exception store: introduce area_location function · a481db78

由 Mikulas Patocka 提交于 10月 10, 2008

Move this logic to a function, because it will be reused later.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a481db78

dm raid1: kcopyd should stop on error if errors handled · f7c83e2e

由 Jonathan Brassow 提交于 10月 10, 2008

dm-raid1 is setting the 'DM_KCOPYD_IGNORE_ERROR' flag unconditionally
when assigning kcopyd work.  kcopyd is responsible for copying an
assigned section of disk to one or more other disks.  The
'DM_KCOPYD_IGNORE_ERROR' flag affects kcopyd in the following way:

When not set:
kcopyd will immediately stop the copy operation when an error is
encountered.

When set:
kcopyd will try to proceed regardless of errors and try to continue
copying any remaining amount.

Since dm-raid1 tracks regions of the address space that are (or
are not) in sync and it now has the ability to handle these
errors, we can safely enable this optimization.  This optimization
is conditional on whether mirror error handling has been enabled.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f7c83e2e

dm mpath: remove is_active from struct dm_path · 6680073d

由 Kiyoshi Ueda 提交于 10月 10, 2008

This patch moves 'is_active' from struct dm_path to struct pgpath
as it does not need exporting.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6680073d

dm mpath: use more error codes · 01460f35

由 Benjamin Marzinski 提交于 10月 10, 2008

This patch allows path errors from the multipath ctr function to
propagate up to userspace as errno values from the ioctl() call.

This is in response to
  https://www.redhat.com/archives/dm-devel/2008-May/msg00000.html
and
  https://bugzilla.redhat.com/show_bug.cgi?id=444421

The patch only lets through the errors that it needs to in order to
get the path errors from parse_path().
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

01460f35

09 10月, 2008 11 次提交

block: mark bio_split_pool static · 6feef531

由 Denis ChengRq 提交于 10月 09, 2008

Since all bio_split calls refer the same single bio_split_pool, the bio_split
function can use bio_split_pool directly instead of the mempool_t parameter;

then the mempool_t parameter can be removed from bio_split param list, and
bio_split_pool is only referred in fs/bio.c file, can be marked static.
Signed-off-by: NDenis ChengRq <crquan@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6feef531

dm: Call blk_abort_queue on failed paths · 224cb3e9

由 Mike Anderson 提交于 8月 29, 2008

Signed-off-by: NMike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

224cb3e9

block: move stats from disk to part0 · 074a7aca

由 Tejun Heo 提交于 8月 25, 2008

Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...

* part_stat_*() now updates part0 together if the specified partition
  is not part0.  ie. part_stat_*() are now essentially all_stat_*().

* {disk|all}_stat_*() are gone.

* part_round_stats() is updated similary.  It handles part0 stats
  automatically and disk_round_stats() is killed.

* part_{inc|dec}_in_fligh() is implemented which automatically updates
  part0 stats for parts other than part0.

* disk_map_sector_rcu() is updated to return part0 if no part matches.
  Combined with the above changes, this makes NULL special case
  handling in callers unnecessary.

* Separate stats show code paths for disk are collapsed into part
  stats show code paths.

* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()

While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

074a7aca

block: always set bdev->bd_part · 0762b8bd

由 Tejun Heo 提交于 8月 25, 2008

Till now, bdev->bd_part is set only if the bdev was for parts other
than part0.  This patch makes bdev->bd_part always set so that code
paths don't have to differenciate common handling.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0762b8bd

block: move policy from disk to part0 · b7db9956

由 Tejun Heo 提交于 8月 25, 2008

Move disk->policy to part0->policy.  Implement and use get_disk_ro().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b7db9956

block: implement and use {disk|part}_to_dev() · ed9e1982

由 Tejun Heo 提交于 8月 25, 2008

Implement {disk|part}_to_dev() and use them to access generic device
instead of directly dereferencing {disk|part}->dev.  To make sure no
user is left behind, rename generic devices fields to __dev.

This is in preparation of unifying partition 0 handling with other
partitions.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ed9e1982

block: fix diskstats access · c9959059

由 Tejun Heo 提交于 8月 25, 2008

There are two variants of stat functions - ones prefixed with double
underbars which don't care about preemption and ones without which
disable preemption before manipulating per-cpu counters.  It's unclear
whether the underbarred ones assume that preemtion is disabled on
entry as some callers don't do that.

This patch unifies diskstats access by implementing disk_stat_lock()
and disk_stat_unlock() which take care of both RCU (for partition
access) and preemption (for per-cpu counter access).  diskstats access
should always be enclosed between the two functions.  As such, there's
no need for the versions which disables preemption.  They're removed
and double underbars ones are renamed to drop the underbars.  As an
extra argument is added, there's no danger of using the old version
unconverted.

disk_stat_lock() uses get_cpu() and returns the cpu index and all
diskstat functions which access per-cpu counters now has @cpu
argument to help RT.

This change adds RCU or preemption operations at some places but also
collapses several preemption ops into one at others.  Overall, the
performance difference should be negligible as all involved ops are
very lightweight per-cpu ones.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c9959059

block: don't depend on consecutive minor space · f331c029

由 Tejun Heo 提交于 9月 03, 2008

* Implement disk_devt() and part_devt() and use them to directly
  access devt instead of computing it from ->major and ->first_minor.

  Note that all references to ->major and ->first_minor outside of
  block layer is used to determine devt of the disk (the part0) and as
  ->major and ->first_minor will continue to represent devt for the
  disk, converting these users aren't strictly necessary.  However,
  convert them for consistency.

* Implement disk_max_parts() to avoid directly deferencing
  genhd->minors.

* Update bdget_disk() such that it doesn't assume consecutive minor
  space.

* Move devt computation from register_disk() to add_disk() and make it
  the only one (all other usages use the initially determined value).

These changes clean up the code and will help disk->part dereference
fix and extended block device numbers.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f331c029

block: make bi_phys_segments an unsigned int instead of short · 5b99c2ff

由 Jens Axboe 提交于 8月 15, 2008

raid5 can overflow with more than 255 stripes, and we can increase it
to an int for free on both 32 and 64-bit archs due to the padding.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5b99c2ff

J
block: raid fixups for removal of bi_hw_segments · 960e739d
由 Jens Axboe 提交于 8月 15, 2008
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
960e739d

drop vmerge accounting · 5df97b91

由 Mikulas Patocka 提交于 8月 15, 2008

Remove hw_segments field from struct bio and struct request. Without virtual
merge accounting they have no purpose.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5df97b91

01 10月, 2008 3 次提交

dm mpath: add missing path switching locking · 7253a334

由 Chandra Seetharaman 提交于 10月 01, 2008

Moving the path activation to workqueue along with scsi_dh patches introduced
a race. It is due to the fact that the current_pgpath (in the multipath data
structure) can be modified if changes happen in any of the paths leading to
the lun. If the changes lead to current_pgpath being set to NULL, then it
leads to the invalid access which results in the panic below.

This patch fixes that by storing the pgpath to activate in the multipath data
structure and properly protecting it.

Note that if activate_path is called twice in succession with different pgpath,
with the second one being called before the first one is done, then activate
path will be called twice for the second pgpath, which is fine.

Unable to handle kernel paging request for data at address 0x00000020
Faulting instruction address: 0xd000000000aa1844
cpu 0x1: Vector: 300 (Data Access) at [c00000006b987a80]
    pc: d000000000aa1844: .activate_path+0x30/0x218 [dm_multipath]
    lr: c000000000087a2c: .run_workqueue+0x114/0x204
    sp: c00000006b987d00
   msr: 8000000000009032
   dar: 20
 dsisr: 40000000
  current = 0xc0000000676bb3f0
  paca    = 0xc0000000006f3680
    pid   = 2528, comm = kmpath_handlerd
enter ? for help
[c00000006b987da0] c000000000087a2c .run_workqueue+0x114/0x204
[c00000006b987e40] c000000000088b58 .worker_thread+0x120/0x144
[c00000006b987f00] c00000000008ca70 .kthread+0x78/0xc4
[c00000006b987f90] c000000000027cc8 .kernel_thread+0x4c/0x68
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

7253a334

dm: cope with access beyond end of device in dm_merge_bvec · b01cd5ac

由 Mikulas Patocka 提交于 10月 01, 2008

If for any reason dm_merge_bvec() is given an offset beyond the end of the
device, avoid an oops and always allow one page to be added to an empty bio.
We'll reject the I/O later after the bio is submitted.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b01cd5ac

dm: always allow one page in dm_merge_bvec · 5037108a

由 Mikulas Patocka 提交于 10月 01, 2008

Some callers assume they can always add at least one page to an empty bio,
so dm_merge_bvec should not return 0 in this case: we'll reject the I/O
later after the bio is submitted.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

5037108a

19 9月, 2008 1 次提交

md: Don't wait UNINTERRUPTIBLE for other resync to finish · 9744197c

由 NeilBrown 提交于 9月 19, 2008

When two md arrays share some block device (e.g each uses different
partitions on the one device), a resync of one array will wait for
the resync on the other to finish.

This can be a long time and as it currently waits TASK_UNINTERRUPTIBLE,
the softlockup code notices and complains.

So use TASK_INTERRUPTIBLE instead and make sure to flush signals
before calling schedule.
Signed-off-by: NNeilBrown <neilb@suse.de>

9744197c

01 9月, 2008 2 次提交

Fix problem with waiting while holding rcu read lock in md/bitmap.c · b2d2c4ce

由 NeilBrown 提交于 9月 01, 2008

A recent patch to protect the rdev list with rcu locking leaves us
with a problem because we can sleep on memalloc while holding the
rcu lock.

The rcu lock is only needed while walking the linked list as
uninteresting devices (failed or spares) can be removed at any time.

So only take the rcu lock while actually walking the linked list.
Take a refcount on the rdev during the time when we drop the lock
and do the memalloc to start IO.
When we return to the locked code, all the interesting devices
on the list will not have moved, so we can simply use
list_for_each_continue_rcu to pick up where we left off.
Signed-off-by: NNeilBrown <neilb@suse.de>

b2d2c4ce

Remove invalidate_partition call from do_md_stop. · 271f5a9b

由 NeilBrown 提交于 9月 01, 2008

When stopping an md array, or just switching to read-only, we
currently call invalidate_partition while holding the mddev lock.
The main reason for this is probably to ensure all dirty buffers
are flushed (invalidate_partition calls fsync_bdev).

However if any dirty buffers are found, it will almost certainly cause
a deadlock as starting writeout will require an update to the
superblock, and performing that updates requires taking the mddev
lock - which is already held.

This deadlock can be demonstrated by running "reboot -f -n" with
a root filesystem on md/raid, and some dirty buffers in memory.

All other calls to stop an array should already happen after a flush.
The normal sequence is to stop using the array (e.g. umount) which
will cause __blkdev_put to call sync_blockdev.  Then open the
array and issue the STOP_ARRAY ioctl while the buffers are all still
clean.

So this invalidate_partition is normally a no-op, except for one case
where it will cause a deadlock.

So remove it.

This patch possibly addresses the regression recored in
   http://bugzilla.kernel.org/show_bug.cgi?id=11460
and
   http://bugzilla.kernel.org/show_bug.cgi?id=11452

though it isn't yet clear how it ever worked.
Signed-off-by: NNeilBrown <neilb@suse.de>

271f5a9b

08 8月, 2008 1 次提交

md: cancel check/repair requests when recovery is needed · 56ac36d7

由 Dan Williams 提交于 8月 07, 2008

If a 'repair' is requested when an array is in a position to 'recover' raid1
will perform the repair while md believes a recovery is happening.  Address
this at both ends, i.e. cancel check/repair requests upon detecting a
recover condition and do not call ->spare_active after completing a
check/repair.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

56ac36d7

05 8月, 2008 1 次提交

Allow raid10 resync to happening in larger chunks. · 0310fa21

由 NeilBrown 提交于 8月 05, 2008

The raid10 resync/recovery code currently limits the amount of
in-flight resync IO to 2Meg.  This was copied from raid1 where
it seems quite adequate.  However for raid10, some layouts require
a bit of seeking to perform a resync, and allowing a larger buffer
size means that the seeking can be significantly reduced.

There is probably no real need to limit the amount of in-flight
IO at all.  Any shortage of memory will naturally reduce the
amount of buffer space available down to a set minimum, and any
concurrent normal IO will quickly cause resync IO to back off.

The only problem would be that normal IO has to wait for all resync IO
to finish, so a very large amount of resync IO could cause unpleasant
latency when normal IO starts up.

So: increase RESYNC_DEPTH to allow 32Meg of buffer (if memory is
available) which seems to be a good amount.  Also reduce the amount
of memory reserved as there is no need to keep 2Meg just for resync if
memory is tight.

Thanks to Keld for the suggestion.

Cc: Keld Jørn Simonsen <keld@dkuug.dk>
Signed-off-by: NNeilBrown <neilb@suse.de>

0310fa21