提交 · 1e2a410ff71504a64d1af2e354287ac51aeac1b0 · openeuler / raspberrypi-kernel

09 9月, 2012 2 次提交

block: Ues bi_pool for bio_integrity_alloc() · 1e2a410f

由 Kent Overstreet 提交于 9月 06, 2012

Now that bios keep track of where they were allocated from,
bio_integrity_alloc_bioset() becomes redundant.

Remove bio_integrity_alloc_bioset() and drop bio_set argument from the
related functions and make them use bio->bi_pool.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1e2a410f

block: Generalized bio pool freeing · 395c72a7

由 Kent Overstreet 提交于 9月 06, 2012

With the old code, when you allocate a bio from a bio pool you have to
implement your own destructor that knows how to find the bio pool the
bio was originally allocated from.

This adds a new field to struct bio (bi_pool) and changes
bio_alloc_bioset() to use it. This makes various bio destructors
unnecessary, so they're then deleted.

v6: Explain the temporary if statement in bio_put
Signed-off-by: NKent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: Nicholas Bellinger <nab@linux-iscsi.org>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NNicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

395c72a7

18 8月, 2012 1 次提交

md/raid10: fix problem with on-stack allocation of r10bio structure. · e0ee7785

由 NeilBrown 提交于 8月 18, 2012

A 'struct r10bio' has an array of per-copy information at the end.
This array is declared with size [0] and r10bio_pool_alloc allocates
enough extra space to store the per-copy information depending on the
number of copies needed.

So declaring a 'struct r10bio on the stack isn't going to work.  It
won't allocate enough space, and memory corruption will ensue.

So in the two places where this is done, declare a sufficiently large
structure and use that instead.

The two call-sites of this bug were introduced in 3.4 and 3.5
so this is suitable for both those kernels.  The patch will have to
be modified for 3.4 as it only has one bug.

Cc: stable@vger.kernel.org
Reported-by: NIvan Vasilyev <ivan.vasilyev@gmail.com>
Tested-by: NIvan Vasilyev <ivan.vasilyev@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

e0ee7785

16 8月, 2012 1 次提交

md: Don't truncate size at 4TB for RAID0 and Linear · 667a5313

由 NeilBrown 提交于 8月 16, 2012

commit 27a7b260
   md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.

changed 0.90 metadata handling to truncated size to 4TB as that is
all that 0.90 can record.
However for RAID0 and Linear, 0.90 doesn't need to record the size, so
this truncation is not needed and causes working arrays to become too small.

So avoid the truncation for RAID0 and Linear

This bug was introduced in 3.1 and is suitable for any stable kernels
from then onwards.
As the offending commit was tagged for 'stable', any stable kernel
that it was applied to should also get this patch.  That includes
at least 2.6.32, 2.6.33 and 3.0. (Thanks to Ben Hutchings for
providing that list).

Cc: stable@vger.kernel.org
Signed-off-by: NNeil Brown <neilb@suse.de>

667a5313

02 8月, 2012 4 次提交

md/dm-raid: DM_RAID should select MD_RAID10 · d9f691c3

由 NeilBrown 提交于 8月 02, 2012

Now that DM_RAID supports raid10, it needs to select that code
to ensure it is included.

Cc: Jonathan Brassow <jbrassow@redhat.com>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

d9f691c3

md/raid1: submit IO from originating thread instead of md thread. · f54a9d0e

由 NeilBrown 提交于 8月 02, 2012

queuing writes to the md thread means that all requests go through the
one processor which may not be able to keep up with very high request
rates.

So use the plugging infrastructure to submit all requests on unplug.
If a 'schedule' is needed, we fall back on the old approach of handing
the requests to the thread for it to handle.
Signed-off-by: NNeilBrown <neilb@suse.de>

f54a9d0e

raid5: raid5d handle stripe in batch way · 46a06401

由 Shaohua Li 提交于 8月 02, 2012

Let raid5d handle stripe in batch way to reduce conf->device_lock locking.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

46a06401

raid5: make_request use batch stripe release · 8811b596

由 Shaohua Li 提交于 8月 02, 2012

make_request() does stripe release for every stripe and the stripe usually has
count 1, which makes previous release_stripe() optimization not work. In my
test, this release_stripe() becomes the heaviest pleace to take
conf->device_lock after previous patches applied.

Below patch makes stripe release batch. All the stripes will be released in
unplug. The STRIPE_ON_UNPLUG_LIST bit is to protect concurrent access stripe
lru.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

8811b596

01 8月, 2012 1 次提交

DM RAID: Add support for MD RAID10 · 63f33b8d

由 Jonathan Brassow 提交于 7月 31, 2012

Support the MD RAID10 personality through dm-raid.c
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

63f33b8d

31 7月, 2012 17 次提交

blk: pass from_schedule to non-request unplug functions. · 74018dc3

由 NeilBrown 提交于 7月 31, 2012

This will allow md/raid to know why the unplug was called,
and will be able to act according - if !from_schedule it
is safe to perform tasks which could themselves schedule.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74018dc3

blk: centralize non-request unplug handling. · 9cbb1750

由 NeilBrown 提交于 7月 31, 2012

Both md and umem has similar code for getting notified on an
blk_finish_plug event.
Centralize this code in block/ and allow each driver to
provide its distinctive difference.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9cbb1750

md: remove plug_cnt feature of plugging. · 0021b7bc

由 NeilBrown 提交于 7月 31, 2012

This seemed like a good idea at the time, but after further thought I
cannot see it making a difference other than very occasionally and
testing to try to exercise the case it is most likely to help did not
show any performance difference by removing it.

So remove the counting of active plugs and allow 'pending writes' to
be activated at any time, not just when no plugs are active.

This is only relevant when there is a write-intent bitmap, and the
updating of the bitmap will likely introduce enough delay that
the single-threading of bitmap updates will be enough to collect large
numbers of updates together.

Removing this will make it easier to centralise the unplug code, and
will clear the other for other unplug enhancements which have a
measurable effect.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0021b7bc

md/RAID1: Add missing case for attempting to repair known bad blocks. · d57368af

由 Alexander Lyakas 提交于 7月 17, 2012

When doing resync or repair, attempt to correct bad blocks, according
to WriteErrorSeen policy
Signed-off-by: NAlex Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

d57368af

dm: use memweight() · 8fb980e3

由 Akinobu Mita 提交于 7月 30, 2012

Use memweight() to count the total number of bits set in memory area.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8fb980e3

md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE. · 895e3c5c

由 majianpeng 提交于 7月 31, 2012

'sync' writes set both REQ_SYNC and REQ_NOIDLE.
O_DIRECT writes set REQ_SYNC but not REQ_NOIDLE.

We currently assume that a REQ_SYNC request will not be followed by
more requests and so set STRIPE_PREREAD_ACTIVE to expedite the
request.
This is appropriate for sync requests, but not for O_DIRECT requests.

So make the setting of STRIPE_PREREAD_ACTIVE conditional on REQ_NOIDLE
rather than REQ_SYNC.  This is consistent with the documented meaning
of REQ_NOIDLE:

        __REQ_NOIDLE,           /* don't anticipate more IO after this one */
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

895e3c5c

md/raid1: don't abort a resync on the first badblock. · b7219ccb

由 NeilBrown 提交于 7月 31, 2012

If a resync of a RAID1 array with 2 devices finds a known bad block
one device it will neither read from, or write to, that device for
this block offset.
So there will be one read_target (The other device) and zero write
targets.
This condition causes md/raid1 to abort the resync assuming that it
has finished - without known bad blocks this would be true.

When there are no write targets because of the presence of bad blocks
we should only skip over the area covered by the bad block.
RAID10 already gets this right, raid1 doesn't.  Or didn't.

As this can cause a 'sync' to abort early and appear to have succeeded
it could lead to some data corruption, so it suitable for -stable.

Cc: stable@vger.kernel.org
Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

b7219ccb

md: remove duplicated test on ->openers when calling do_md_stop() · 90cf195d

由 NeilBrown 提交于 7月 31, 2012

do_md_stop tests mddev->openers while holding ->open_mutex,
and fails if this count is too high.
So callers do not need to check mddev->openers and doing so isn't
very meaningful as they don't hold ->open_mutex so the number could
change.

So remove the unnecessary tests on mddev->openers.
These are not called often enough for there to be any gain in
an early test on ->open_mutex to avoid the need for a slightly more
costly mutex_lock call.
Signed-off-by: NNeilBrown <neilb@suse.de>

90cf195d

raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer · 3f9e7c14

由 majianpeng 提交于 7月 31, 2012

Because bios will merge at block-layer,so bios-error may caused by other
bio which be merged into to the same request.
Using this flag,it will find exactly error-sector and not do redundant
operation like re-write and re-read.

V0->V1:Using REQ_FLUSH instead REQ_NOMERGE avoid bio merging at block
layer.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

3f9e7c14

md/raid1: prevent merging too large request · 12cee5a8

由 Shaohua Li 提交于 7月 31, 2012

For SSD, if request size exceeds specific value (optimal io size), request size
isn't important for bandwidth. In such condition, if making request size bigger
will cause some disks idle, the total throughput will actually drop. A good
example is doing a readahead in a two-disk raid1 setup.

So when should we split big requests? We absolutly don't want to split big
request to very small requests. Even in SSD, big request transfer is more
efficient. This patch only considers request with size above optimal io size.

If all disks are busy, is it worth doing a split? Say optimal io size is 16k,
two requests 32k and two disks. We can let each disk run one 32k request, or
split the requests to 4 16k requests and each disk runs two. It's hard to say
which case is better, depending on hardware.

So only consider case where there are idle disks. For readahead, split is
always better in this case. And in my test, below patch can improve > 30%
thoughput. Hmm, not 100%, because disk isn't 100% busy.

Such case can happen not just in readahead, for example, in directio. But I
suppose directio usually will have bigger IO depth and make all disks busy, so
I ignored it.

Note: if the raid uses any hard disk, we don't prevent merging. That will make
performace worse.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

12cee5a8

md/raid1: read balance chooses idlest disk for SSD · 9dedf603

由 Shaohua Li 提交于 7月 31, 2012

SSD hasn't spindle, distance between requests means nothing. And the original
distance based algorithm sometimes can cause severe performance issue for SSD
raid.

Considering two thread groups, one accesses file A, the other access file B.
The first group will access one disk and the second will access the other disk,
because requests are near from one group and far between groups. In this case,
read balance might keep one disk very busy but the other relative idle. For
SSD, we should try best to distribute requests to as many disks as possible.
There isn't spindle move penality anyway.

With below patch, I can see more than 50% throughput improvement sometimes
depending on workloads.

The only exception is small requests can be merged to a big request which
typically can drive higher throughput for SSD too. Such small requests are
sequential reads. Unlike hard disk, sequential read which can't be merged (for
example direct IO, or read without readahead) can be ignored for SSD. Again
there is no spindle move penality. readahead dispatches small requests and such
requests can be merged.

Last patch can help detect sequential read well, at least if concurrent read
number isn't greater than raid disk number. In that case, distance based
algorithm doesn't work well too.

V2: For hard disk and SSD mixed raid, doesn't use distance based algorithm for
random IO too. This makes the algorithm generic for raid with SSD.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

9dedf603

md/raid1: make sequential read detection per disk based · be4d3280

由 Shaohua Li 提交于 7月 31, 2012

Currently the sequential read detection is global wide. It's natural to make it
per disk based, which can improve the detection for concurrent multiple
sequential reads. And next patch will make SSD read balance not use distance
based algorithm, where this change help detect truly sequential read for SSD.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

be4d3280

MD RAID10: Export md_raid10_congested · cc4d1efd

由 Jonathan Brassow 提交于 7月 31, 2012

md/raid10: Export is_congested test.

In similar fashion to commits
	11d8a6e3
	1ed7242e
we export the RAID10 congestion checking function so that dm-raid.c can
make use of it and make use of the personality.  The 'queue' and 'gendisk'
structures will not be available to the MD code when device-mapper sets
up the device, so we conditionalize access to these fields also.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

cc4d1efd

MD: Move macros from raid1*.h to raid1*.c · 473e87ce

由 Jonathan Brassow 提交于 7月 31, 2012

MD RAID1/RAID10: Move some macros from .h file to .c file

There are three macros (IO_BLOCKED,IO_MADE_GOOD,BIO_SPECIAL) which are defined
in both raid1.h and raid10.h. They are only used in there respective .c files.
However, if we wish to make RAID10 accessible to the device-mapper RAID
target (dm-raid.c), then we need to move these macros into the .c files where
they are used so that they do not conflict with each other.

The macros from the two files are identical and could be moved into md.h, but
I chose to leave the duplication and have them remain in the personality
files.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

473e87ce

MD RAID1: rename mirror_info structure · 0eaf822c

由 Jonathan Brassow 提交于 7月 31, 2012

MD RAID1: Rename the structure 'mirror_info' to 'raid1_info'

The same structure name ('mirror_info') is used by raid10.  Each of these
structures are defined in there respective header files.  If dm-raid is
to support both RAID1 and RAID10, the header files will be included and
the structure names must not collide.  While only one of these structure
names needs to change, this patch adds consistency to the naming of the
structure.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0eaf822c

MD RAID10: rename mirror_info structure · dc280d98

由 Jonathan Brassow 提交于 7月 31, 2012

MD RAID10: Rename the structure 'mirror_info' to 'raid10_info'

The same structure name ('mirror_info') is used by raid1.  Each of these
structures are defined in there respective header files.  If dm-raid is
to support both RAID1 and RAID10, the header files will be included and
the structure names must not collide.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

dc280d98

MD RAID10: Fix compiler warning. · 3bbae04b

由 Jonathan Brassow 提交于 7月 31, 2012

MD RAID10:  Fix compiler warning.

Initialize variable to prevent compiler warning.
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

3bbae04b

27 7月, 2012 14 次提交

dm thin: commit before gathering status · 1f4e0ff0

由 Alasdair G Kergon 提交于 7月 27, 2012

Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.
Tested-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

1f4e0ff0

dm thin: add read only and fail io modes · e49e5829

由 Joe Thornber 提交于 7月 27, 2012

Add read-only and fail-io modes to thin provisioning.

If a transaction commit fails the pool's metadata device will transition
to "read-only" mode.  If a commit fails once already in read-only mode
the transition to "fail-io" mode occurs.

Once in fail-io mode the pool and all associated thin devices will
report a status of "Fail".
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

e49e5829

dm thin metadata: introduce dm_pool_abort_metadata · da105ed5

由 Joe Thornber 提交于 7月 27, 2012

Introduce dm_pool_abort_metadata to abort the current metadata
transaction.  Generally this will only be called when bad things are
happening and dm-thin is trying to roll back to a good state for
read-only mode.

It's complicated by the fact that the metadata device may have failed
completely causing the abort to be unable to read the old transaction.
In this case the metadata object is placed in a 'fail' mode and
everything fails apart from destroying it.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

da105ed5

dm thin metadata: introduce dm_pool_metadata_set_read_only · 12ba58af

由 Joe Thornber 提交于 7月 27, 2012

Introduce dm_pool_metadata_set_read_only to put the underlying block
manager into read-only mode.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

12ba58af

dm persistent data: introduce dm_bm_set_read_only · 31097557

由 Joe Thornber 提交于 7月 27, 2012

Introduce dm_bm_set_read_only to switch the block manager into a
read-only mode.  To be used when dm-thin degrades due to io errors on
the metadata device.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

31097557

dm thin: reduce number of metadata commits · 4afdd680

由 Joe Thornber 提交于 7月 27, 2012

Reduce the number of metadata commits by using
dm_thin_changed_this_transaction to check if metadata was changed on a
per thin device granularity.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4afdd680

dm thin metadata: add dm_thin_changed_this_transaction · 40db5a53

由 Joe Thornber 提交于 7月 27, 2012

Introduce dm_thin_changed_this_transaction to dm-thin-metadata to publish a
useful bit of information we're already tracking.  This will help dm thin
decide when to commit.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

40db5a53

dm thin metadata: add format option to dm_pool_metadata_open · 66b1edc0

由 Joe Thornber 提交于 7月 27, 2012

Add a parameter to dm_pool_metadata_open to indicate whether or not an
unformatted metadata area should be formatted.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

66b1edc0

dm thin metadata: tidy up open and format error paths · 0fa5b17b

由 Joe Thornber 提交于 7月 27, 2012

Tidy up error path in __open_metadata and __format_metadata in dm-thin-metadata.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0fa5b17b

dm thin metadata: only check incompat features on open · d73ec525

由 Mike Snitzer 提交于 7月 27, 2012

Factor out __check_incompat_features and only call it once when we open
the metadata device rather than at the beginning of every transaction.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

d73ec525

dm thin metadata: remove duplicate pmd initialisation · b7939951

由 Joe Thornber 提交于 7月 27, 2012

Remove some duplicate initialisation of struct dm_pool_metadata.

These pmd fields are initialised by both:
  __format_metadata's calls to dm_btree_empty
  __write_initial_superblock + __begin_transaction
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

b7939951

dm thin metadata: remove create parameter from __create_persistent_data_objects · 8801e069

由 Joe Thornber 提交于 7月 27, 2012

Remove 'create' parameter from __create_persistent_data_objects() in dm-thin-metadata.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

8801e069

dm thin metadata: move __superblock_all_zeroes to __open_or_format_metadata · 237074c0

由 Joe Thornber 提交于 7月 27, 2012

Move the check for __superblock_all_zeroes from
__create_persistent_data_objects() down to __open_or_format_metadata in
dm-thin-metadata.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

237074c0

dm thin metadata: remove nr_blocks arg from __create_persistent_data_objects · a97e5e6f

由 Joe Thornber 提交于 7月 27, 2012

Remove nr_blocks arg from __create_persistent_data_objects in dm-thin-metadata.
It was always passed as zero.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

a97e5e6f