提交 · 1dff2b87a34a1ac1d1898ea109bf97ed396aca53 · openeuler / raspberrypi-kernel

22 5月, 2012 39 次提交

md/bitmap: record the space available for the bitmap in the superblock. · 1dff2b87

由 NeilBrown 提交于 5月 22, 2012

Now that bitmaps can grow and shrink it is best if we record
how much space is available.  This means that when
we reduce the size of the bitmap we won't "lose" the space
for late when we might want to increase the size of the bitmap
again.
Signed-off-by: NNeilBrown <neilb@suse.de>

1dff2b87

md/raid10: Remove extras after reshape to smaller number of devices. · 63aced61

由 NeilBrown 提交于 5月 22, 2012

When a reshape which reduced the number of devices finishes
we must remove the extra devices.

So ensure  that raid10_remove_disk won't try to keep them, and
have raid10_finish_reshape clear the 'in_sync' flag.  Then
remove_and_add_spares will be able to remove them.
Reported-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

63aced61

md/raid5: improve removal of extra devices after reshape. · da7613b8

由 NeilBrown 提交于 5月 22, 2012

After a reshape which reduced the number of devices we need
to disconnect the extra devices.
The code for this doesn't currently handle 'replacement' devices.
It is very unlikely that such devices will be present, but it is
safest to handle them anyway.

So simplify the handling.  Just clear In_sync and leave it
to remove_and_add_spaces (which will be called soon) to do
the real works.
Signed-off-by: NNeilBrown <neilb@suse.de>

da7613b8

md: check the return of mddev_find() · 0c098220

由 Yuanhan Liu 提交于 5月 22, 2012

Check the return of mddev_find(), since it may fail due to out of
memeory or out of usable minor number.

The reason I chose -ENODEV instead of -ENOMEM or something else is
md_alloc() function chose that ;)
Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0c098220

MD RAID1: Further conditionalize 'fullsync' · 4f0a5e01

由 Jonathan Brassow 提交于 5月 22, 2012

A RAID1 device does not necessarily need a fullsync if the bitmap can be used instead.

Similar to commit d6b212f4 in raid5.c, if a raid1
device can be brought back (i.e. from a transient failure) it shouldn't need a
complete resync.  Provided the bitmap is not to old, it will have recorded the areas
of the disk that need recovery.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

4f0a5e01

DM RAID: Use md_error() in place of simply setting Faulty bit · c32fb9e7

由 Jonathan Brassow 提交于 5月 22, 2012

When encountering an error while reading the superblock, call md_error.

We are currently setting the 'Faulty' bit on one of the array devices when an
error is encountered while reading the superblock of a dm-raid array. We should
be calling md_error(), as it handles the error more completely.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c32fb9e7

DM RAID: Record and handle missing devices · 81f382f9

由 Jonathan Brassow 提交于 5月 22, 2012

Missing dm-raid devices should be recorded in the superblock

When specifying the devices that compose a DM RAID array, it is possible to denote
failed or missing devices with '-'s.  When this occurs, we must record this in the
superblock.  We do this by checking if the array position's data device is missing
and then forcing MD to record the superblock by setting 'MD_CHANGE_DEVS' in
'raid_resume'.  If we do not cause the superblock to be rewritten by the resume
function, it is possible for a stale superblock to be written by an out-going
in-active table (during 'raid_dtr').
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

81f382f9

DM RAID: Set recovery flags on resume · 47525e59

由 Jonathan Brassow 提交于 5月 22, 2012

Properly initialize MD recovery flags when resuming device-mapper devices.

When a device-mapper device is suspended, all I/O must stop.  This is done by
calling 'md_stop_writes' and 'mddev_suspend'.  These calls in-turn manipulate
the recovery flags - including setting 'MD_RECOVERY_FROZEN'.  The DM device
may have been suspended while recovery was not yet complete, so the process
needs to pick-up where it left off.  Since 'mddev_resume' does not unset
'MD_RECOVERY_FROZEN' and set 'MD_RECOVERY_NEEDED', we must do it ourselves.
'MD_RECOVERY_NEEDED' can safely be set in 'mddev_resume', but 'MD_RECOVERY_FROZEN'
must be set outside of 'mddev_resume' due to how MD handles RAID reshaping.
(e.g.  It is possible for a user to delay reshaping a RAID5->RAID6 by purposefully
setting 'MD_RECOVERY_FROZEN'.  Clearing it in 'mddev_resume' would override the
desired behavior.)

Because 'mddev_resume' already unconditionally calls 'md_wakeup_thread(mddev->thread)'
there is no need to make this call from 'raid_resume' since it calls 'mddev_resume'.

Also clean up where  level_store calls mddev_resume() - it current
duplicates some of the funcitons of that call. - NB
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

47525e59

md/raid5: Allow reshape while a bitmap is present. · 30b67645

由 NeilBrown 提交于 5月 22, 2012

We always should have allowed this.  A raid5 reshape doesn't change
the size of the bitmap, so not need to restrict it.

Also add a test to make sure we don't try to start a reshape on a
failed array.
Signed-off-by: NNeilBrown <neilb@suse.de>

30b67645

md/raid10: resize bitmap when required during reshape. · bb63a701

由 NeilBrown 提交于 5月 22, 2012

If a reshape changes the size of the array, then we can now
update the bitmap to suit - so do so.
Signed-off-by: NNeilBrown <neilb@suse.de>

bb63a701

md: allow array to be resized while bitmap is present. · a4a6125a

由 NeilBrown 提交于 5月 22, 2012

Now that bitmaps can be resized, we can allow an array to be resized
while the bitmap is present.

This only covers resizing that involves changing the effective size
of member devices, not resizing that changes the number of devices.
Signed-off-by: NNeilBrown <neilb@suse.de>

a4a6125a

md/bitmap: make sure reshape request are reflected in superblock. · b81a0404

由 NeilBrown 提交于 5月 22, 2012

As a reshape may change the sync_size and/or chunk_size, we need
to update these whenever we write out the bitmap superblock.
Signed-off-by: NNeilBrown <neilb@suse.de>

b81a0404

md/bitmap: add bitmap_resize function to allow bitmap resizing. · d60b479d

由 NeilBrown 提交于 5月 22, 2012

This function will allocate the new data structures and copy
bits across from old to new, allowing for the possibility that the
chunksize has changed.

Use the same function for performing the initial allocation
of the structures.  This improves test coverage.

When bitmap_resize is used to resize an existing bitmap, it
only copies '1' bits in, not '0' bits.
So when allocating the bitmap, ensure everything is initialised
to ZERO.
Signed-off-by: NNeilBrown <neilb@suse.de>

d60b479d

N
md/bitmap: use DIV_ROUND_UP instead of open-code · 15702d7f
由 NeilBrown 提交于 5月 22, 2012
```
Also take the opportunity to simplify CHUNK_BLOCK_RATIO.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
15702d7f

md/bitmap: create a 'struct bitmap_counts' substructure of 'struct bitmap' · 40cffcc0

由 NeilBrown 提交于 5月 22, 2012

The new "struct bitmap_counts" contains all the fields that are
related to counting the number of active writes in each bitmap chunk.

Having this separate will make it easier to change the chunksize
or overall size of a bitmap atomically.
Signed-off-by: NNeilBrown <neilb@suse.de>

40cffcc0

md/bitmap: make bitmap bitops atomic. · 63c68268

由 NeilBrown 提交于 5月 22, 2012

This allows us to remove spinlock protection which is
more heavy-weight than simple atomics.
Signed-off-by: NNeilBrown <neilb@suse.de>

63c68268

md/bitmap: make _page_attr bitops atomic. · bdfd1140

由 NeilBrown 提交于 5月 22, 2012

Using e.g. set_bit instead of __set_bit and using test_and_clear_bit
allow us to remove some locking and contract other locked ranges.

It is rare that we set or clear a lot of these bits, so gain should
outweigh any cost.
Signed-off-by: NNeilBrown <neilb@suse.de>

bdfd1140

md/bitmap: merge bitmap_file_unmap and bitmap_file_put. · fae7d326

由 NeilBrown 提交于 5月 22, 2012

There functions really do one thing together: release the
'bitmap_storage'.  So make them just one function.

Since we removed the locking (previous patch), we don't need to zero
any fields before freeing them, so it all becomes a bit simpler.
Signed-off-by: NNeilBrown <neilb@suse.de>

fae7d326

md/bitmap: remove async freeing of bitmap file. · 62f82faa

由 NeilBrown 提交于 5月 22, 2012

There is no real value in freeing things the moment there is an error.
It is just as good to free the bitmap file and pages when the bitmap
is explicitly removed (and replaced?) or at shutdown.

With this gone, the bitmap will only disappear when the array is
quiescent, so we can remove some locking.

As the 'filemap' doesn't disappear now, include extra checks before
trying to write any of it out.
Also remove the check for "has it disappeared" in
bitmap_daemon_write().
Signed-off-by: NNeilBrown <neilb@suse.de>

62f82faa

md/bitmap: convert some spin_lock_irqsave to spin_lock_irq · 74667123

由 NeilBrown 提交于 5月 22, 2012

All of these sites can only be called from process context with
irqs enabled, so using irqsave/irqrestore just adds noise.
Remove it.
Signed-off-by: NNeilBrown <neilb@suse.de>

74667123

md/bitmap: use set_bit, test_bit, etc for operation on bitmap->flags. · b405fe91

由 NeilBrown 提交于 5月 22, 2012

We currently use '&' and '|' which isn't the norm in the kernel
and doesn't allow easy atomicity.
So change to bit numbers and {set,clear,test}_bit.
This allows us to remove a spinlock/unlock (which was dubious anyway)
and some other simplifications.
Signed-off-by: NNeilBrown <neilb@suse.de>

b405fe91

md/bitmap: remove single-bit manipulation on sb->state · 84e92345

由 NeilBrown 提交于 5月 22, 2012

Just do single-bit manipulations on bitmap->flags and copy whole
value between that and sb->state.

This will allow next patch which changes how bit manipulations are
performed on bitmap->flags.

This does result in BITMAP_STALE not being set in sb by
bitmap_read_sb, however as the setting is determined by other
information in the 'sb' we do not lose information this way.
Normally, bitmap_load will be called shortly which will clear
BITMAP_STALE anyway.
Signed-off-by: NNeilBrown <neilb@suse.de>

84e92345

md/bitmap: remove bitmap_mask_state · edbb79df

由 NeilBrown 提交于 5月 22, 2012

This function isn't really needed.  It sets or clears a flag in both
bitmap->flags and sb->state.
However both times it is called, bitmap_update_sb is called soon
afterwards which copies bitmap->flags to sb->state.
So just make changes to bitmap->flags, and open-code those rather than
hiding in a function.
Signed-off-by: NNeilBrown <neilb@suse.de>

edbb79df

md/bitmap: move storage allocation from bitmap_load to bitmap_create. · bc9891a8

由 NeilBrown 提交于 5月 22, 2012

We should allocate memory for the storage-bitmap at create-time, not
load time.
Signed-off-by: NNeilBrown <neilb@suse.de>

bc9891a8

N
md/bitmap: separate bitmap file allocation to its own function. · d1244cb0
由 NeilBrown 提交于 5月 22, 2012
```
This will allow allocation before swapping in a new bitmap.
Signed-off-by: NNeilBrown <neilb@suse.de>
```
d1244cb0

md/bitmap: store bytes in file rather than just in last page. · 9b1215c1

由 NeilBrown 提交于 5月 22, 2012

This number is more generally useful, and bytes-in-last-page is
easily extracted from it.
Signed-off-by: NNeilBrown <neilb@suse.de>

9b1215c1

md/bitmap: move some fields of 'struct bitmap' into a 'storage' substruct. · 1ec885cd

由 NeilBrown 提交于 5月 22, 2012

This new 'struct bitmap_storage' reflects the external storage of the
bitmap.
Having this clearly defined will make it easier to change the storage
used while the array is active.
Signed-off-by: NNeilBrown <neilb@suse.de>

1ec885cd

md/bitmap: change *_page_attr() to take a page number, not a page. · d189122d

由 NeilBrown 提交于 5月 22, 2012

Most often we have the page number, not the page.  And that is what
the  *_page_attr() functions really want.  So change the arguments to
take that number.
Signed-off-by: NNeilBrown <neilb@suse.de>

d189122d

md/bitmap: centralise allocation of bitmap file pages. · 27581e5a

由 NeilBrown 提交于 5月 22, 2012

Instead of allocating pages in read_sb_page, read_page and
bitmap_read_sb, allocate them all in bitmap_init_from disk.

Also replace the hack of calling "attach_page_buffers(page, NULL)" to
ensure that free_buffer() won't complain, by putting a test for
PagePrivate in free_buffer().
Signed-off-by: NNeilBrown <neilb@suse.de>

27581e5a

md/bitmap: allow a bitmap with no backing storage. · ef99bf48

由 NeilBrown 提交于 5月 22, 2012

An md bitmap comprises two parts
 - internal counting of active writes per 'chunk'.
 - external storage of whether there are any active writes on
   each chunk

The second requires the first, but the first doesn't require the
second.

Not having backing storage means that the bitmap cannot expedite
resync after a crash, but it still allows us to expedite the recovery
of a recently-removed device.

So: allow a bitmap to exist even if there is no backing device.
In that case we default to 128M chunks.

A particular value of this is that we can remove and re-add a bitmap
(possibly of a different granularity) on a degraded array, and not
lose the information needed to fast-recover the missing device.

We don't actually activate these bitmaps yet - that will come
in a later patch.
Signed-off-by: NNeilBrown <neilb@suse.de>

ef99bf48

md/bitmap: add new 'space' attribute for bitmaps. · 6409bb05

由 NeilBrown 提交于 5月 22, 2012

If we are to allow bitmaps to be resized when the array is resized,
we need to know how much space there is.

So create an attribute to store this information and set appropriate
defaults.

It can be set more precisely via sysfs, or future metadata extensions
may allow it to be recorded.
Signed-off-by: NNeilBrown <neilb@suse.de>

6409bb05

md/bitmap: disentangle two different 'pending' flags. · bf07bb7d

由 NeilBrown 提交于 5月 22, 2012

There are two different 'pending' concepts in the handling of the
write intent bitmap.

Firstly, a 'page' from the bitmap (which container PAGE_SIZE*8 bits)
may have changes (bits cleared) that should be written in due course.
There is no hurry for these and the page will transition from
PENDING to NEEDWRITE and will then be written, though if it ever
becomes DIRTY it will be written much sooner and PENDING will be
cleared.

Secondly, a page of counters - which contains PAGE_SIZE/2 counters, one
for each bit, can usefully have a 'pending' flag which indicates if
any of the counters are low (2 or 1) and ready to be processed by
bitmap_daemon_work().  If this flag is clear we can skip the whole
page.

These two concepts are currently combined in the bitmap-file flag.
This causes a tighter connection between the counters and the bitmap
file than I would like - as I want to add some flexibility to the
bitmap file.

So introduce a new flag with the page-of-counters, and rewrite
bitmap_daemon_work() so that it handles the two different 'pending'
concepts separately.

This also allows us to clear BITMAP_PAGE_PENDING when we write out
a dirty page, which may occasionally reduce the number of times we
write a page.
Signed-off-by: NNeilBrown <neilb@suse.de>

bf07bb7d

raid5: support sync request · bc0934f0

由 Shaohua Li 提交于 5月 22, 2012

REQ_SYNC is ignored in current raid5 code. Block layer does use it to do
policy,
for example ioscheduler. This patch adds it.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

bc0934f0

raid5: remove unused variables · cceeca43

由 Shaohua Li 提交于 5月 22, 2012

The two variables are useless.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

cceeca43

md/raid10: Fix memleak in r10buf_pool_alloc · 5fdd2cf8

由 majianpeng 提交于 5月 22, 2012

If the allocation of rep1_bio fails, we currently don't free the 'bio'
of the same dev.

Reported by kmemleak.
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

5fdd2cf8

md/raid1: allow fix_read_error to read from recovering device. · da8840a7

由 majianpeng 提交于 5月 22, 2012

When attempting to fix a read error, it is acceptable to read from a
device that is recovering, provided the recovery has got past the
place we are reading from.  This makes the test for "can we read from
here" the same as the test in read_balance.
Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

da8840a7

md: move freeing of badblocks.page into md_rdev_clear · 4fa2f327

由 NeilBrown 提交于 5月 22, 2012

This ensures that it is always freed - there were case where
we failed to free the page.
Reported-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

4fa2f327

md: dm-raid should call helper function to clear rdev. · 545c8795

由 NeilBrown 提交于 5月 22, 2012

dm-raid currently open-codes the freeing of some members of
and rdev.  It is more maintainable to have it call common code
from md.c which does this for all call-sites.

So remove free_disk_sb to md_rdev_clear, export it, and use it in
dm-raid.c
Signed-off-by: NNeilBrown <neilb@suse.de>

545c8795

md/raid10: add reshape support · 3ea7daa5

由 NeilBrown 提交于 5月 22, 2012

A 'near' or 'offset' lay RAID10 array can be reshaped to a different
'near' or 'offset' layout, a different chunk size, and a different
number of devices.
However the number of copies cannot change.

Unlike RAID5/6, we do not support having user-space backup data that
is being relocated during a 'critical section'.  Rather, the
data_offset of each device must change so that when writing any block
to a new location, it will not over-write any data that is still
'live'.

This means that RAID10 reshape is not supportable on v0.90 metadata.

The different between the old data_offset and the new_offset must be
at least the larger of the chunksize multiplied by offset copies of
each of the old and new layout. (for 'near' mode, offset_copies == 1).

A larger difference of around 64M seems useful for in-place reshapes
as more data can be moved between metadata updates.
Very large differences (e.g. 512M) seem to slow the process down due
to lots of long seeks (on oldish consumer graded devices at least).

Metadata needs to be updated whenever the place we are about to write
to is considered - by the current metadata - to still contain data in
the old layout.

[unbalanced locking fix from Dan Carpenter <dan.carpenter@oracle.com>]
Signed-off-by: NNeilBrown <neilb@suse.de>

3ea7daa5

21 5月, 2012 1 次提交

md/raid10: split out interpretation of layout to separate function. · deb200d0

由 NeilBrown 提交于 5月 21, 2012

We will soon be interpreting the layout (and chunksize etc) from
multiple places to support reshape.  So split it out into separate
function.
Signed-off-by: NNeilBrown <neilb@suse.de>

deb200d0