提交 · 0be1fecd7ee61b5a6d2b2e94b052b8a070b946ef · openeuler / Kernel

22 10月, 2012 1 次提交

md faulty: use disk_stack_limits() · 0be1fecd

由 Eric Sandeen 提交于 10月 22, 2012

in:
fe86cdce block: do not artificially constrain max_sectors for stacking drivers

max_sectors defaults to UINT_MAX.  md faulty wasn't using
disk_stack_limits(), so inherited this large value as well.
This triggered a bug in XFS when stressed over md_faulty, when
a very large bio_alloc() failed.

That was on an older kernel, and I can't reproduce exactly the
same thing upstream, but I think the fix is appropriate in any
case.

Thanks to Mike Snitzer for pointing out the problem.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0be1fecd

13 10月, 2012 4 次提交

dm: store dm_target_io in bio front_pad · dba14160

由 Mikulas Patocka 提交于 10月 12, 2012

Use the recently-added bio front_pad field to allocate struct dm_target_io.

Prior to this patch, dm_target_io was allocated from a mempool. For each
dm_target_io, there is exactly one bio allocated from a bioset.

This patch merges these two allocations into one allocation: we create a
bioset with front_pad equal to the size of dm_target_io so that every
bio allocated from the bioset has sizeof(struct dm_target_io) bytes
before it. We allocate a bio and use the bytes before the bio as
dm_target_io.

_tio_cache is removed and the tio_pool mempool is now only used for
request-based devices.

This idea was introduced by Kent Overstreet.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: tj@kernel.org
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Bill Pemberton <wfp5p@viridian.itc.virginia.edu>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

dba14160

dm thin: move bio_prison code to separate module · 4f81a417

由 Mike Snitzer 提交于 10月 12, 2012

The bio prison code will be useful to other future DM targets so
move it to a separate module.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4f81a417

dm thin: prepare to separate bio_prison code · 44feb387

由 Mike Snitzer 提交于 10月 12, 2012

The bio prison code will be useful to share with future DM targets.

Prepare to move this code into a separate module, adding a dm prefix
to structures and functions that will be exported.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

44feb387

dm thin: support discard with non power of two block size · 28eed34e

由 Mike Snitzer 提交于 10月 12, 2012

Support discards when the pool's block size is not a power of 2.
The block layer assumes discard_granularity is a power of 2 (in
blkdev_issue_discard), so we set this to the largest power of 2 that is
a divides into the number of sectors in each block, but never less than
DATA_DEV_BLOCK_SIZE_MIN_SECTORS.

This patch eliminates the "Discard support must be disabled when the
block size is not a power of 2" constraint that was imposed in commit
55f2b8bd ("dm thin: support for non power of 2 pool blocksize").  That
commit was incomplete: using a block size that is not a power of 2
shouldn't mean disabling discard support on the device completely.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

28eed34e

12 10月, 2012 4 次提交

dm persistent data: convert to use le32_add_cpu · 0bcf0879

由 Wei Yongjun 提交于 10月 12, 2012

Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0bcf0879

dm: use ACCESS_ONCE for sysfs values · fe5fe906

由 Mikulas Patocka 提交于 10月 12, 2012

Use the ACCESS_ONCE macro in dm-bufio and dm-verity where a variable
can be modified asynchronously (through sysfs) and we want to prevent
compiler optimizations that assume that the variable hasn't changed.
(See Documentation/atomic_ops.txt.)
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

fe5fe906

dm bufio: use list_move · 54499afb

由 Wei Yongjun 提交于 10月 12, 2012

Use list_move() instead of list_del() + list_add().

spatch with a semantic match was used to find this.
(http://coccinelle.lip6.fr/)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

54499afb

dm mpath: fix check for null mpio in end_io fn · a71a261f

由 Wei Yongjun 提交于 10月 12, 2012

The mpio dereference should be moved below the BUG_ON NULL test
in multipath_end_io().

spatch with a semantic match was used to found this.
(http://coccinelle.lip6.fr/)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a71a261f

11 10月, 2012 28 次提交

md: refine reporting of resync/reshape delays. · 72f36d59

由 NeilBrown 提交于 10月 11, 2012

If 'resync_max' is set to 0 (as is often done when starting a
reshape, so the mdadm can remain in control during a sensitive
period), and if the reshape request is initially delayed because
another array using the same array is resyncing or reshaping etc,
when user-space cannot easily tell when the delay changes from being
due to a conflicting reshape, to being due to resync_max = 0.

So introduce a new state: (curr_resync == 3) to reflect this, make
sure it is visible both via /proc/mdstat and via the "sync_completed"
sysfs attribute, and ensure that the event transition from one delay
state to the other is properly notified.
Signed-off-by: NNeilBrown <neilb@suse.de>

72f36d59

md/raid5: be careful not to resize_stripes too big. · e56108d6

由 NeilBrown 提交于 10月 11, 2012

When a RAID5 is reshaping, conf->raid_disks is increased
before mddev->delta_disks becomes zero.
This can result in check_reshape calling resize_stripes with a
number that is too large.  This particularly happens
when md_check_recovery calls ->check_reshape().

If we use ->previous_raid_disks, we don't risk this.
Signed-off-by: NNeilBrown <neilb@suse.de>

e56108d6

md: make sure manual changes to recovery checkpoint are saved. · db07d85e

由 NeilBrown 提交于 10月 11, 2012

If you make an array bigger but suppress resync of the new region with
  mdadm --grow /dev/mdX --size=max --assume-clean

then stop the array before anything is written to it, the effect of
the "--assume-clean" is lost and the array will resync the new space
when restarted.
So ensure that we update the metadata in the case.
Reported-by: NSebastian Riemer <sebastian.riemer@profitbricks.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

db07d85e

md/raid10: use correct limit variable · 91502f09

由 Dan Carpenter 提交于 10月 11, 2012

Clang complains that we are assigning a variable to itself.  This should
be using bad_sectors like the similar earlier check does.

Bug has been present since 3.1-rc1.  It is minor but could
conceivably cause corruption or other bad behaviour.

Cc: stable@vger.kernel.org
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

91502f09

md: writing to sync_action should clear the read-auto state. · 48c26ddc

由 NeilBrown 提交于 10月 11, 2012

In some cases array are started in 'read-auto' state where in
nothing gets written to any device until the array is written
to.  The purpose of this is to make accidental auto-assembly
of the wrong arrays less of a risk, and to allow arrays to be
started to read suspend-to-disk images without actually changing
anything (as might happen if the array were dirty and a
resync seemed necessary).

Explicitly writing the 'sync_action' for a read-auto array currently
doesn't clear the read-auto state, so the sync action doesn't
happen, which can be confusing.

So allow any successful write to sync_action to clear any read-auto
state.
Reported-by: NAlexander Kühn <alexander.kuehn@nagilum.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

48c26ddc

Subject: [PATCH] md:change resync_mismatches to atomic64_t to avoid races · 7f7583d4

由 Jianpeng Ma 提交于 10月 11, 2012

Now that multiple threads can handle stripes, it is safer to
use an atomic64_t for resync_mismatches, to avoid update races.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7f7583d4

md/raid5: make sure to_read and to_write never go negative. · 1ed850f3

由 NeilBrown 提交于 10月 11, 2012

to_read and to_write are part of the result of analysing
a stripe before handling it.
Their use is to avoid some loops and tests if the values are
known to be zero.  Thus it is not a problem if they are a
little bit larger than they should be.

So decrementing them in handle_failed_stripe serves little value, and
due to races it could cause some loops to be skipped incorrectly.

So remove those decrements.
Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

1ed850f3

md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write. · a7854487

由 Alexander Lyakas 提交于 10月 11, 2012

Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
Suggested-by: NYair Hershko <yair@zadarastorage.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a7854487

md/raid5: protect debug message against NULL derefernce. · b97390ae

由 NeilBrown 提交于 10月 11, 2012

The pr_debug in add_stripe_bio could race with something
changing *bip, so it is best to hold the lock until
after the pr_debug.
Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b97390ae

md/raid5: add some missing locking in handle_failed_stripe. · 143c4d05

由 NeilBrown 提交于 10月 11, 2012

We really should hold the stripe_lock while accessing
'toread' else we could race with add_stripe_bio and corrupt
a list.
Reported-by: N"Jianpeng Ma" <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

143c4d05

MD: raid5 avoid unnecessary zero page for trim · 9e444768

由 Shaohua Li 提交于 10月 11, 2012

We want to avoid zero discarded dev page, because it's useless for discard.
But if we don't zero it, another read/write hit such page in the cache and will
get inconsistent data.

To avoid zero the page, we don't set R5_UPTODATE flag after construction is
done. In this way, discard write request is still issued and finished, but read
will not hit the page. If the stripe gets accessed soon, we need reread the
stripe, but since the chance is low, the reread isn't a big deal.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9e444768

MD: raid5 trim support · 620125f2

由 Shaohua Li 提交于 10月 11, 2012


Discard for raid4/5/6 has limitation. If discard request size is
small, we do discard for one disk, but we need calculate parity and
write parity disk.  To correctly calculate parity, zero_after_discard
must be guaranteed. Even it's true, we need do discard for one disk
but write another disks, which makes the parity disks wear out
fast. This doesn't make sense. So an efficient discard for raid4/5/6
should discard all data disks and parity disks, which requires the
write pattern to be (A, A+chunk_size, A+chunk_size*2...). If A's size
is smaller than chunk_size, such pattern is almost impossible in
practice. So in this patch, I only handle the case that A's size
equals to chunk_size. That is discard request should be aligned to
stripe size and its size is multiple of stripe size.

Since we can only handle request with specific alignment and size (or
part of the request fitting stripes), we can't guarantee
zero_after_discard even zero_after_discard is true in low level
drives.

The block layer doesn't send down correctly aligned requests even
correct discard alignment is set, so I must filter out.

For raid4/5/6 parity calculation, if data is 0, parity is 0. So if
zero_after_discard is true for all disks, data is consistent after
discard.  Otherwise, data might be lost. Let's consider a scenario:
discard a stripe, write data to one disk and write parity disk. The
stripe could be still inconsistent till then depending on using data
from other data disks or parity disks to calculate new parity. If the
disk is broken, we can't restore it. So in this patch, we only enable
discard support if all disks have zero_after_discard.

If discard fails in one disk, we face the similar inconsistent issue
above. The patch will make discard follow the same path as normal
write request. If discard fails, a resync will be scheduled to make
the data consistent. This isn't good to have extra writes, but data
consistency is important.

If a subsequent read/write request hits raid5 cache of a discarded
stripe, the discarded dev page should have zero filled, so the data is
consistent. This patch will always zero dev page for discarded request
stripe. This isn't optimal because discard request doesn't need such
payload. Next patch will avoid it.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

620125f2

J
md/bitmap:Don't use IS_ERR to judge alloc_page(). · 582e2e05
由 Jianpeng Ma 提交于 10月 11, 2012
```
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
```
582e2e05

md/raid1: Don't release reference to device while handling read error. · 7ad4d4a6

由 NeilBrown 提交于 10月 11, 2012

When we get a read error, we arrange for raid1d to handle it.
Currently we release the reference on the device.  This can result
in
   conf->mirrors[read_disk].rdev
being NULL in fix_read_error, if the device happens to get removed
before the read error is handled.

So instead keep the reference until the read error has been fully
handled.
Reported-by: Nhank <pyu@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7ad4d4a6

raid: replace list_for_each_continue_rcu with new interface · fd177481

由 Michael Wang 提交于 10月 11, 2012

This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to save a few lines
of code and allow removing list_for_each_continue_rcu().
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

fd177481

DM RAID: Fix for "sync" directive ineffectiveness · 761becff

由 Jonathan Brassow 提交于 10月 11, 2012

There are two table arguments that can be given to a DM RAID target
that control whether the array is forced to (re)synchronize or skip
initialization: "sync" and "nosync".  When "sync" is given, we set
mddev->recovery_cp to 0 in order to cause the device to resynchronize.
This is insufficient if there is a bitmap in use, because the array
will simply look at the bitmap and see that there is no recovery
necessary.

The fix is to skip over the loading of the superblocks when "sync" is
given, causing new superblocks to be written that will force the array
to go through initialization (i.e. synchronization).
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

761becff

DM RAID: Fix comparison of index and quantity for "rebuild" parameter · 7386199c

由 Jonathan Brassow 提交于 10月 11, 2012

DM RAID: Fix comparison of index and quantity for "rebuild" parameter

The "rebuild" parameter takes an index argument that starts counting from
zero.  The conditional used to validate the index was using '>' rather than
'>=', leaving the door open for an index value that would be 1 too large.
Reported-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

7386199c

DM RAID: Add rebuild capability for RAID10 · 4ec1e369

由 Jonathan Brassow 提交于 10月 11, 2012

DM RAID: Add code to validate replacement slots for RAID10 arrays

RAID10 can handle 'copies - 1' failures for each mirror group. This code
ensures the user has provided a valid array - one whose devices specified for
rebuild do not exceed the amount of redundancy available.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

4ec1e369

DM RAID: Move 'rebuild' checking code to its own function · eb649123

由 Jonathan Brassow 提交于 10月 11, 2012

DM RAID:  Move chunk of code to it's own function

The code that checks whether device replacements/rebuilds are possible given
a specific RAID type is moved to it's own function.  It will further expand
when the code to check RAID10 is added.  A separate function makes it easier
to read.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

eb649123

MD RAID10: Prep for DM RAID10 device replacement capability · 2863b9eb

由 Jonathan Brassow 提交于 10月 11, 2012

MD RAID10:  Fix a couple potential kernel panics if RAID10 is used by dm-raid

When device-mapper uses the RAID10 personality through dm-raid.c, there is no
'gendisk' structure in mddev and some sysfs information is also not populated.

This patch avoids touching those non-existent structures.
Signed-off-by: NJonathan Brassow <jbrassow@rehdat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

2863b9eb

md: avoid taking the mutex on some ioctls. · 1ca69c4b

由 NeilBrown 提交于 10月 11, 2012

Some ioctls don't need to take the mutex and doing so can cause
a delay as it is held during super-block update.
So move those ioctls out of the mutex and rely on rcu locking
to ensure we don't access stale data.
Signed-off-by: NNeilBrown <neilb@suse.de>

1ca69c4b

MD: change the parameter of md thread · 4ed8731d

由 Shaohua Li 提交于 10月 11, 2012

Change the thread parameter, so the thread can carry extra info. Next patch
will use it.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

4ed8731d

md/raid10: submit IO from originating thread instead of md thread. · 57c67df4

由 NeilBrown 提交于 10月 11, 2012

queuing writes to the md thread means that all requests go through the
one processor which may not be able to keep up with very high request
rates.

So use the plugging infrastructure to submit all requests on unplug.
If a 'schedule' is needed, we fall back on the old approach of handing
the requests to the thread for it to handle.

This is nearly identical to a recent patch which provided similar
functionality to RAID1.
Signed-off-by: NNeilBrown <neilb@suse.de>

57c67df4

md: raid 10 supports TRIM · 532a2a3f

由 Shaohua Li 提交于 10月 11, 2012


This makes md raid 10 support TRIM.

If one disk supports discard and another not, or one has
discard_zero_data and another not, there could be inconsistent between
data from such disks. But this should not matter, discarded data is
useless. This will add extra copy in rebuild though.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

532a2a3f

md: raid 1 supports TRIM · 2ff8cc2c

由 Shaohua Li 提交于 10月 11, 2012

This makes md raid 1 support TRIM.
If one disk supports discard and another not, or one has discard_zero_data and
another not, there could be inconsistent between data from such disks. But this
should not matter, discarded data is useless. This will add extra copy in rebuild
though.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

2ff8cc2c

md: raid 0 supports TRIM · c83057a1

由 Shaohua Li 提交于 10月 11, 2012

This makes md raid 0 support TRIM.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c83057a1

md: linear supports TRIM · f1cad2b6

由 Shaohua Li 提交于 10月 11, 2012

This makes md linear support TRIM.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f1cad2b6

md/linear: rcu_dereference outside read-lock section · bc78c573

由 Denis Efremov 提交于 10月 11, 2012

According to the comment in linear_stop function
rcu_dereference in linear_start and linear_stop functions
occurs under reconfig_mutex. The patch represents this
agreement in code and prevents lockdep complaint.

Found by Linux Driver Verification project (linuxtesting.org)
Signed-off-by: NDenis Efremov <yefremov.denis@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

bc78c573

27 9月, 2012 3 次提交

md/raid10: fix "enough" function for detecting if array is failed. · 80b48124

由 NeilBrown 提交于 9月 27, 2012

The 'enough' function is written to work with 'near' arrays only
in that is implicitly assumes that the offset from one 'group' of
devices to the next is the same as the number of copies.
In reality it is the number of 'near' copies.

So change it to make this number explicit.

This bug makes it possible to run arrays without enough drives
present, which is dangerous.
It is appropriate for an -stable kernel, but will almost certainly
need to be modified for some of them.

Cc: stable@vger.kernel.org
Reported-by: NJakub Husák <jakub@gooseman.cz>
Signed-off-by: NNeilBrown <neilb@suse.de>

80b48124

dm verity: fix overflow check · 1d55f6bc

由 Mikulas Patocka 提交于 9月 26, 2012

This patch fixes sector_t overflow checking in dm-verity.

Without this patch, the code checks for overflow only if sector_t is
smaller than long long, not if sector_t and long long have the same size.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1d55f6bc

dm thin: fix discard support for data devices · 0424caa1

由 Mike Snitzer 提交于 9月 26, 2012

The discard limits that get established for a thin-pool or thin device
may be incompatible with the pool's data device.  Avoid this by checking
the discard limits of the pool's data device.  If an incompatibility is
found then the pool's 'discard passdown' feature is disabled.

Change thin_io_hints to ensure that a thin device always uses the same
queue limits as its pool device.

Introduce requested_pf to track whether or not the table line originally
contained the no_discard_passdown flag and use this directly for table
output.  We prepare the correct setting for discard_passdown directly in
bind_control_target (called from pool_io_hints) and store it in
adjusted_pf rather than waiting until we have access to pool->pf in
pool_preresume.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0424caa1

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功