提交 · 449aad3e25358812c43afc60918c5ad3819488e7 · OpenHarmony / kernel_linux

03 8月, 2009 5 次提交

md: Use revalidate_disk to effect changes in size of device. · 449aad3e

由 NeilBrown 提交于 8月 03, 2009

As revalidate_disk calls check_disk_size_change, it will cause
any capacity change of a gendisk to be propagated to the blockdev
inode.  So use that instead of mucking about with locks and
i_size_write.

Also add a call to revalidate_disk in do_md_run and a few other places
where the gendisk capacity is changed.
Signed-off-by: NNeilBrown <neilb@suse.de>

449aad3e

md: Handle growth of v1.x metadata correctly. · 70471daf

由 NeilBrown 提交于 8月 03, 2009

The v1.x metadata does not have a fixed size and can grow
when devices are added.
If it grows enough to require an extra sector of storage,
we need to update the 'sb_size' to match.

Without this, md can write out an incomplete superblock with a
bad checksum, which will be rejected when trying to re-assemble
the array.

Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

70471daf

md: avoid array overflow with bad v1.x metadata · 3673f305

由 NeilBrown 提交于 8月 03, 2009

We trust the 'desc_nr' field in v1.x metadata enough to use it
as an index in an array.  This isn't really safe.
So range-check the value first.
Signed-off-by: NNeilBrown <neilb@suse.de>

3673f305

md: when a level change reduces the number of devices, remove the excess. · 3a981b03

由 NeilBrown 提交于 8月 03, 2009

When an array is changed from RAID6 to RAID5, fewer drives are
needed.  So any device that is made superfluous by the level
conversion must be marked as not-active.
For the RAID6->RAID5 conversion, this will be a drive which only
has 'Q' blocks on it.

Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

3a981b03

md: Push down data integrity code to personalities. · ac5e7113

由 Andre Noll 提交于 8月 03, 2009

This patch replaces md_integrity_check() by two new public functions:
md_integrity_register() and md_integrity_add_rdev() which are both
personality-independent.

md_integrity_register() is called from the ->run and ->hot_remove
methods of all personalities that support data integrity.  The
function iterates over the component devices of the array and
determines if all active devices are integrity capable and if their
profiles match. If this is the case, the common profile is registered
for the mddev via blk_integrity_register().

The second new function, md_integrity_add_rdev() is called from the
->hot_add_disk methods, i.e. whenever a new device is being added
to a raid array. If the new device does not support data integrity,
or has a profile different from the one already registered, data
integrity for the mddev is disabled.

For raid0 and linear, only the call to md_integrity_register() from
the ->run method is necessary.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

ac5e7113

09 7月, 2009 1 次提交

Remove multiple KERN_ prefixes from printk formats · ad361c98

由 Joe Perches 提交于 7月 06, 2009

Commit 5fd29d6c ("printk: clean up
handling of log-levels and newlines") changed printk semantics.  printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.

<level> is now included in the output on each additional use.

Remove all uses of multiple KERN_<level>s in formats.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad361c98

01 7月, 2009 4 次提交

md: use interruptible wait when duration is controlled by userspace. · e62e58a5

由 NeilBrown 提交于 7月 01, 2009

User space can set various limits on an md array so that resync waits
when it gets to a certain point, or so that I/O is blocked for a short
while.
When md is waiting against one of these limit, it should use an
interruptible wait so as not to add to the load average, and so are
not to trigger a warning if the wait goes on for too long.
Signed-off-by: NNeilBrown <neilb@suse.de>

e62e58a5

md: tidy up error paths in md_alloc · 0909dc44

由 NeilBrown 提交于 7月 01, 2009

As the recent bug in md_alloc showed, having a single exit path for
unlocking and putting is a good idea.  So restructure md_alloc to have
a single mutex_unlock and mddev_put, and use gotos where necessary.
Found-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0909dc44

md: fix error path when duplicate name is found on md device creation. · 1ec22eb2

由 NeilBrown 提交于 7月 01, 2009

When an md device is created by name (rather than number) we need to
check that the name is not already in use.  If this check finds a
duplicate, we return an error without dropping the lock or freeing
the newly create mddev.
This patch fixes that.

Cc: stable@kernel.org
Found-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

1ec22eb2

md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes. · b8d966ef

由 NeilBrown 提交于 7月 01, 2009

If we try to modify one of the md/ sysfs files
  suspend_lo or suspend_hi
when the array is not active, we dereference a NULL.
Protect against that.

Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

b8d966ef

18 6月, 2009 8 次提交

md: Move check for bitmap presence to personality code. · 0894cc30

由 Andre Noll 提交于 6月 18, 2009

If the superblock of a component device indicates the presence of a
bitmap but the corresponding raid personality does not support bitmaps
(raid0, linear, multipath, faulty), then something is seriously wrong
and we'd better refuse to run such an array.

Currently, this check is performed while the superblocks are examined,
i.e. before entering personality code. Therefore the generic md layer
must know which raid levels support bitmaps and which do not.

This patch avoids this layer violation without adding identical code
to various personalities. This is accomplished by introducing a new
public function to md.c, md_check_no_bitmap(), which replaces the
hard-coded checks in the superblock loading functions.

A call to md_check_no_bitmap() is added to the ->run method of each
personality which does not support bitmaps and assembly is aborted
if at least one component device contains a bitmap.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

0894cc30

md: remove chunksize rounding from common code. · 8190e754

由 NeilBrown 提交于 6月 18, 2009

It is easiest to round sizes to multiples of chunk size in
the personality code for those personalities which care.
Those personalities now do the rounding, so we can
remove that function from common code.

Also remove the upper bound on the size of a chunk, and the lower
bound on the size of a device (1 chunk), neither of which really buy
us anything.
Signed-off-by: NNeilBrown <neilb@suse.de>

8190e754

md: move assignment of ->utime so that it never gets skipped. · 1b57f132

由 NeilBrown 提交于 6月 18, 2009

Currently the assignment to utime gets skipped for 'external'
metadata.  So move it to the top of the function so that it
always gets effected.
This is of largely cosmetic interest.  Nothing actually depends
on ->utime being right for external arrays.
"mdadm --monitor" does use it for 0.90 and 1.x arrays, but with
mdadm-3.0, this is not important for external metadata.
Signed-off-by: NNeilBrown <neilb@suse.de>

1b57f132

md: Push down reconstruction log message to personality code. · 8c6ac868

由 Andre Noll 提交于 6月 18, 2009

Currently, the md layer checks in analyze_sbs() if the raid level
supports reconstruction (mddev->level >= 1) and if reconstruction is
in progress (mddev->recovery_cp != MaxSector).

Move that printk into the personality code of those raid levels that
care (levels 1, 4, 5, 6, 10).
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

8c6ac868

md: merge reconfig and check_reshape methods. · 50ac168a

由 NeilBrown 提交于 6月 18, 2009

The difference between these two methods is artificial.
Both check that a pending reshape is valid, and perform any
aspect of it that can be done immediately.
'reconfig' handles chunk size and layout.
'check_reshape' handles raid_disks.

So make them just one method.
Signed-off-by: NNeilBrown <neilb@suse.de>

50ac168a

md: remove unnecessary arguments from ->reconfig method. · 597a711b

由 NeilBrown 提交于 6月 18, 2009

Passing the new layout and chunksize as args is not necessary as
the mddev has fields for new_check and new_layout.

This is preparation for combining the check_reshape and reconfig
methods
Signed-off-by: NNeilBrown <neilb@suse.de>

597a711b

md: Convert mddev->new_chunk to sectors. · 664e7c41

由 Andre Noll 提交于 6月 18, 2009

A straight-forward conversion which gets rid of some
multiplications/divisions/shifts. The patch also introduces a couple
of new ones, most of which are due to conf->chunk_size still being
represented in bytes. This will be cleaned up in subsequent patches.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

664e7c41

md: Make mddev->chunk_size sector-based. · 9d8f0363

由 Andre Noll 提交于 6月 18, 2009

This patch renames the chunk_size field to chunk_sectors with the
implied change of semantics.  Since

	is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9)
				  = is_power_of_2(chunk_sectors)

these bits don't need an adjustment for the shift.
Signed-off-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

9d8f0363

16 6月, 2009 1 次提交

md: prepare for non-power-of-two chunk sizes · 2ac06c33

由 raz ben yehuda 提交于 6月 16, 2009

Remove chunk size check from md as this is now performed in the run
function in each personality.

Replace chunk size power 2 code calculations by a regular division.

Signed-off-by: raziebe@gmail.com
Signed-off-by: NNeilBrown <neilb@suse.de>

2ac06c33

26 5月, 2009 5 次提交

md: don't use locked_ioctl. · b492b852

由 NeilBrown 提交于 5月 26, 2009

md has no need for the BKL - it does its own locking.
So md_ioctl doesn't need to be a locked_ioctl.
Signed-off-by: NNeilBrown <neilb@suse.de>

b492b852

md: don't update curr_resync_completed without also updating reshape_position. · 7a91ee1f

由 NeilBrown 提交于 5月 26, 2009

In order for the metadata to always be consistent, we mustn't updated
curr_resync_completed without also updating reshape_position.

The reshape code updates both at the same time.  However since
commit 97e4f42d
the common md_do_sync will sometimes update curr_resync_completed
but is not in a position to update reshape_position.
So if MD_RECOVERY_RESHAPE is set (indicating that a reshape is
happening, so reshape_position might change), don't update
curr_resync_completed in md_do_sync, leave it to the per-personality
reshape code.
Signed-off-by: NNeilBrown <neilb@suse.de>

7a91ee1f

md: export 'frozen' resync state through sysfs · b6a9ce68

由 NeilBrown 提交于 5月 26, 2009

The md resync engine has a 'frozen' state which ensures that
no resync/recovery.  This is used to avoid races.

Export this state through the 'sync_action' sysfs attribute
so that user-space can benefit and also avoid some races.
Signed-off-by: NNeilBrown <neilb@suse.de>

b6a9ce68

md: improve errno return when setting array_size · 2b69c839

由 NeilBrown 提交于 5月 26, 2009

Instead of always returns EINVAL if anything goes wrong
when setting the array size, add the option of
  E2BIG
if the size requested is too large.  This makes it easier
for user-space to be sure what went wrong.
Signed-off-by: NNeilBrown <neilb@suse.de>

2b69c839

md: always update level / chunk_size / layout when writing v1.x metadata. · 62e1e389

由 NeilBrown 提交于 5月 26, 2009

We previously didn't update these fields when writing the metadata
because they could never change.  They can now, so we better write
them.
v0.90 metadata always updated these fields.
Signed-off-by: NNeilBrown <neilb@suse.de>

62e1e389

23 5月, 2009 1 次提交

block: Do away with the notion of hardsect_size · e1defc4f

由 Martin K. Petersen 提交于 5月 22, 2009

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case.  The
sector size will be 4KB but the logical block size will remain
512-bytes.  Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e1defc4f

07 5月, 2009 4 次提交

md: remove rd%d links immediately after stopping an array. · c4647292

由 NeilBrown 提交于 5月 07, 2009

md maintains link in sys/mdXX/md/ to identify which device has
which role in the array. e.g.
   rd2 -> dev-sda

indicates that the device with role '2' in the array is sda.

These links are only present when the array is active.  They are
created immediately after ->run is called, and so should be removed
immediately after ->stop is called.
However they are currently removed a little bit later, and it is
possible for ->run to be called again, thus adding these links, before
they are removed.

So move the removal earlier so they are consistently only present when
the array is active.
Signed-off-by: NNeilBrown <neilb@suse.de>

c4647292

md: remove ability to explicit set an inactive array to 'clean'. · 5bf29597

由 NeilBrown 提交于 5月 07, 2009

Being able to write 'clean' to an 'array_state' of an inactive array
to activate it in 'clean' mode is both unnecessary and inconvenient.

It is unnecessary because the same can be achieved by writing
'active'.  This activates and array, but it still remains 'clean'
until the first write.

It is inconvenient because writing 'clean' is more often used to
cause an 'active' array to revert to 'clean' mode (thus blocking
any writes until a 'write-pending' is promoted to 'active').

Allowing 'clean' to both activate an array and mark an active array as
clean can lead to races:  One program writes 'clean' to mark the
active array as clean at the same time as another program writes
'inactive' to deactivate (stop) and active array.  Depending on which
writes first, the array could be deactivated and immediately
reactivated which isn't what was desired.

So just disable the use of 'clean' to activate an array.

This avoids a race that can be triggered with mdadm-3.0 and external
metadata, so it suitable for -stable.
Reported-by: NRafal Marszewski <rafal.marszewski@intel.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

5bf29597

md: constify VFTs · 110518bc

由 Jan Engelhardt 提交于 5月 07, 2009

Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

110518bc

md: tidy up status_resync to handle large arrays. · dd71cf6b

由 NeilBrown 提交于 5月 07, 2009

Two problems in status_resync.
1/ It still used Kilobytes as the basic block unit, while most code
   now uses sectors uniformly.
2/ It doesn't allow for the possibility that max_sectors exceeds
   the range of "unsigned long".

So
 - change "max_blocks" to "max_sectors", and store sector numbers
   in there and in 'resync'
 - Make 'rt' a 'sector_t' so it can temporarily hold the number of
   remaining sectors.
 - use sector_div rather than normal division.
 - change the magic '100' used to preserve precision to '32'.
   + making it a power of 2 makes division easier
   + it doesn't need to be as large as it was chosen when we averaged
     speed over the entire run.  Now we average speed over the last 30
     seconds or so.
Reported-by: N"Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE>
Signed-off-by: NNeilBrown <neilb@suse.de>

dd71cf6b

17 4月, 2009 1 次提交

md: update sync_completed and reshape_position even more often. · c03f6a19

由 NeilBrown 提交于 4月 17, 2009

There are circumstances when a user-space process might need to
"oversee" a resync/reshape process.  For example when doing an
in-place reshape of a raid5, it is prudent to take a backup of each
section before reshaping it as this is the only way to provide
safety against an unplanned shutdown (i.e. crash/power failure).

The sync_max sysfs value can be used to stop the resync from
advancing beyond a particular point.
So user-space can:
  suspend IO to the first section and back it up
  set 'sync_max' to the end of the section
  wait for 'sync_completed' to reach that point
  resume IO on the first section and move on to the next section.

However this process requires the kernel and user-space to run in
lock-step which could introduce unnecessary delays.

It would be better if a 'double buffered' approach could be used with
userspace and kernel space working on different sections with the
'next' section always ready when the 'current' section is finished.

One problem with implementing this is that sync_completed is only
guaranteed to be updated when the sync process reaches sync_max.
(it is updated on a time basis at other times, but it is hard to rely
on that).  This defeats some of the double buffering.

With this patch, sync_completed (and reshape_position) get updated as
the current position approaches sync_max, so there is room for
userspace to advance sync_max early without losing updates.

To be precise, sync_completed is updated when the current sync
position reaches half way between the current value of sync_completed
and the value of sync_max.  This will usually be a good time for user
space to update sync_max.

If sync_max does not get updated, the updates to sync_completed
(together with associated metadata updates) will occur at an
exponentially increasing frequency which will get unreasonably fast
(one update every page) immediately before the process hits sync_max
and stops.  So the update rate will be unreasonably fast only for an
insignificant period of time.
Signed-off-by: NNeilBrown <neilb@suse.de>

c03f6a19

14 4月, 2009 2 次提交

md: improve usefulness and accuracy of sysfs file md/sync_completed. · acb180b0

由 NeilBrown 提交于 4月 14, 2009

The sync_completed file reports how much of a resync (or recovery or
reshape) has been completed.
However due to the possibility of out-of-order completion of writes,
it is not certain to be accurate.

We have an internal value - mddev->curr_resync_completed - which is an
accurate value (though it might not always be quite so uptodate).

So:
 - make curr_resync_completed be uptodate a little more often,
   particularly when raid5 reshape updates status in the metadata
 - report curr_resync_completed in the sysfs file
 - allow poll/select to report all updates to md/sync_completed.

This makes sync_completed completed usable by any external metadata
handler that wants to record this status information in its metadata.
Signed-off-by: NNeilBrown <neilb@suse.de>

acb180b0

md: allow setting newly added device to 'in_sync' via sysfs. · 6d56e278

由 NeilBrown 提交于 4月 14, 2009

When adding devices to an active array via sysfs, there is currently
no way to mark a device as 'in-sync' which is useful when
incrementally assembling an array.

So add that option.
Signed-off-by: NNeilBrown <neilb@suse.de>

6d56e278

31 3月, 2009 8 次提交

md: don't display meaningless values in sysfs files resync_start and sync_speed · d1a7c503

由 NeilBrown 提交于 3月 31, 2009

When no resync if happening, both of these files currently have
meaningless values (is slightly different ways).
Change them to "none" in that case.
Signed-off-by: NNeilBrown <neilb@suse.de>

d1a7c503

md: add explicit method to signal the end of a reshape. · cea9c228

由 NeilBrown 提交于 3月 31, 2009

Currently raid5 (the only module that supports restriping)
notices that the reshape has finished be sync_request being
given a large value, and handles any cleanup them.

This patch changes it so md_check_recovery calls into an
explicit finish_reshape method as well.

The clean-up from sync_request can do things that need to be
done promptly, typically things local to the raid5_conf_t
structure.

The "finish_reshape" method is called under the mddev_lock
so it can do things involving reconfiguring the device.

This allows us to get rid of md_set_array_sectors_locked, which
would have caused a deadlock if you tried to stop and array
while a reshape was happening.
Signed-off-by: NNeilBrown <neilb@suse.de>

cea9c228

md: 'array_size' sysfs attribute · b522adcd

由 Dan Williams 提交于 3月 31, 2009

Allow userspace to set the size of the array according to the following
semantics:

1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
   a) If size is set before the array is running, do_md_run will fail
      if size is greater than the default size
   b) A reshape attempt that reduces the default size to less than the set
      array size should be blocked
2/ once userspace sets the size the kernel will not change it
3/ writing 'default' to this attribute returns control of the size to the
   kernel and reverts to the size reported by the personality

Also, convert locations that need to know the default size from directly
reading ->array_sectors to <pers>_size.  Resync/reshape operations
always follow the default size.

Finally, fixup other locations that read a number of 1k-blocks from
userspace to use strict_blocks_to_sectors() which checks for unsigned
long long to sector_t overflow and blocks to sectors overflow.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b522adcd

md: centralize ->array_sectors modifications · 1f403624

由 Dan Williams 提交于 3月 31, 2009

Get personalities out of the business of directly modifying
->array_sectors.  Lays groundwork to introduce policy on when
->array_sectors can be modified.
Reviewed-by: NAndre Noll <maan@systemlinux.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1f403624

md/raid5: allow layout/chunksize to be changed on an active 2-drive raid5. · b3546035

由 NeilBrown 提交于 3月 31, 2009

2-drive raid5's aren't very interesting.  But if you are converting
a raid1 into a raid5, you will at least temporarily have one.  And
that it a good time to set the layout/chunksize for the new RAID5
if you aren't happy with the defaults.

layout and chunksize don't actually affect the placement of data
on a 2-drive raid5, so we just do some internal book-keeping.
Signed-off-by: NNeilBrown <neilb@suse.de>

b3546035

md: add ->takeover method to support changing the personality managing an array · 245f46c2

由 NeilBrown 提交于 3月 31, 2009

Implement this for RAID6 to be able to 'takeover' a RAID5 array.  The
new RAID6 will use a layout which places Q on the last device, and
that device will be missing.
If there are any available spares, one will immediately have Q
recovered onto it.
Signed-off-by: NNeilBrown <neilb@suse.de>

245f46c2

md: enable suspend/resume of md devices. · 409c57f3

由 NeilBrown 提交于 3月 31, 2009

To be able to change the 'level' of an md/raid array, we need to
suspend the device so that no requests are active - then move some
pointers around etc.

The code already keeps counts of active requests and the ->quiesce
function can be used to wait until those counts hit zero.
However the quiesce function blocks new requests once they are all
ready 'inside' the personality module, and that is too late if we want
to replace the personality modules.

So make all md requests come in through a common md_make_request
function that keeps track of how many requests have entered the
modules but may not yet be on the internal reference counts.
Allow md_make_request to be blocked when we want to suspend the
device, and make it possible to wait for all those in-transit requests
to be added to internal lists so that ->quiesce can wait for them.

There is still a problem that when a request completes, we drop the
ref count inside the personality code so there is a short time between
when the refcount hits zero, and when the personality code is no
longer being used.
The personality code never blocks (schedule or spinlock) between
dropping the refcount and exiting the routine, so this should be safe
(as put_module calls synchronize_sched() before unmapping the module
code).
Signed-off-by: NNeilBrown <neilb@suse.de>

409c57f3

md: md_unregister_thread should cope with being passed NULL · e0cf8f04

由 NeilBrown 提交于 3月 31, 2009

Mostly md_unregister_thread is only called when we know that the
thread is NULL, but sometimes we need to check first.  It is safer
to put the check inside md_unregister_thread itself.
Signed-off-by: NNeilBrown <neilb@suse.de>

e0cf8f04

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多