提交 · 6bee58259fce0baad7b02c7a48146d50fa7f6c3c · openeuler / raspberrypi-kernel

06 11月, 2009 2 次提交

md/raid5: make sure curr_sync_completes is uptodate when reshape starts · 8dee7211

由 NeilBrown 提交于 11月 06, 2009

This value is visible through sysfs and is used by mdadm
when it manages a reshape (backing up data that is about to be
rearranged).  So it is important that it is always correct.
Current it does not get updated properly when a reshape
starts which can cause problems when assembling an array
that is in the middle of being reshaped.

This is suitable for 2.6.31.y stable kernels.

Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

8dee7211

md: don't clear endpoint for resync when resync is interrupted. · 24395a85

由 NeilBrown 提交于 11月 06, 2009

If a 'sync_max' has been set (via sysfs), it is wrong to clear it
until a resync (or reshape or recovery ...) actually reached that
point.
So if a resync is interrupted (e.g. by device failure),
leave 'resync_max' unchanged.

This is particularly important for 'reshape' operations that do not
change the size of the array.  For such operations mdadm needs to
monitor the reshape taking rolling backups of the section being
reshaped.  If resync_max gets cleared, the reshape can get ahead of
mdadm and then the backups that mdadm creates are useless.

This is suitable for 2.6.31.y stable kernels.
Cc: stable@kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

24395a85

20 10月, 2009 1 次提交
- D
  md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning · 6629542e
  由 Dan Williams 提交于 10月 19, 2009
```
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
```
  6629542e
17 10月, 2009 10 次提交

dm snapshot: allow chunk size to be less than page size · c1cc65ca

由 Mikulas Patocka 提交于 10月 16, 2009

Allow the snapshot chunk size to be smaller than the page size
The code is now capable of handling this due to some previous
fixes and enhancements.

As the page size varies between computers, prior to this patch,
the chunk size of a snapshot dictated which machines could read it:
Snapshots created on one machine might not be readable on another.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

c1cc65ca

dm snapshot: use unsigned integer chunk size · df96eee6

由 Mikulas Patocka 提交于 10月 16, 2009

Use unsigned integer chunk size.

Maximum chunk size is 512kB, there won't ever be need to use 4GB chunk size,
so the number can be 32-bit. This fixes compiler failure on 32-bit systems
with large block devices.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

df96eee6

dm snapshot: lock snapshot while supplying status · 4c6fff44

由 Mikulas Patocka 提交于 10月 16, 2009

This patch locks the snapshot when returning status.  It fixes a race
when it could return an invalid number of free chunks if someone
was simultaneously modifying it.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

4c6fff44

dm exception store: fix failed set_chunk_size error path · 0e8c4e4e

由 Mikulas Patocka 提交于 10月 16, 2009

Properly close the device if failing because of an invalid chunk size.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

0e8c4e4e

dm snapshot: require non zero chunk size by end of ctr · 3f2412dc

由 Mikulas Patocka 提交于 10月 16, 2009

If we are creating snapshot with memory-stored exception store, fail if
the user didn't specify chunk size. Zero chunk size would probably crash
a lot of places in the rest of snapshot code.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

3f2412dc

dm: dec_pending needs locking to save error value · f88fb981

由 Kiyoshi Ueda 提交于 10月 16, 2009

Multiple instances of dec_pending() can run concurrently so a lock is
needed when it saves the first error code.

I have never experienced actual problem without locking and just found
this during code inspection while implementing the barrier support
patch for request-based dm.

This patch adds the locking.
I've done compile, boot and basic I/O testings.

Cc: stable@kernel.org
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f88fb981

dm: add missing del_gendisk to alloc_dev error path · 03022c54

由 Zdenek Kabelac 提交于 10月 16, 2009

Add missing del_gendisk() to error path when creation of workqueue fails.
Otherwice there is a resource leak and following warning is shown:

WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xc5/0x160()
sysfs: cannot create duplicate filename '/devices/virtual/block/dm-0'

Cc: stable@kernel.org
Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: NJonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

03022c54

dm log: userspace fix incorrect luid cast in userspace_ctr · bca915aa

由 Andrew Morton 提交于 10月 16, 2009

mips:

drivers/md/dm-log-userspace-base.c: In function `userspace_ctr':
drivers/md/dm-log-userspace-base.c:159: warning: cast from pointer to integer of different size

Cc: stable@kernel.org
Cc: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

bca915aa

dm snapshot: free exception store on init failure · 034a186d

由 Jonathan Brassow 提交于 10月 16, 2009

While initializing the snapshot module, if we fail to register
the snapshot target then we must back-out the exception store
module initialization.

Cc: stable@kernel.org
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

034a186d

dm snapshot: sort by chunk size to fix race · 6d45d93e

由 Mikulas Patocka 提交于 10月 16, 2009

Avoid a race causing corruption when snapshots of the same origin have
different chunk sizes by sorting the internal list of snapshots by chunk
size, largest first.
  https://bugzilla.redhat.com/show_bug.cgi?id=182659

For example, let's have two snapshots with different chunk sizes. The
first snapshot (1) has small chunk size and the second snapshot (2) has
large chunk size.  Let's have chunks A, B, C in these snapshots:
snapshot1: ====A====   ====B====
snapshot2: ==========C==========

(Chunk size is a power of 2. Chunks are aligned.)

A write to the origin at a position within A and C comes along. It
triggers reallocation of A, then reallocation of C and links them
together using A as the 'primary' exception.

Then another write to the origin comes along at a position within B and
C.  It creates pending exception for B.  C already has a reallocation in
progress and it already has a primary exception (A), so nothing is done
to it: B and C are not linked.

If the reallocation of B finishes before the reallocation of C, because
there is no link with the pending exception for C it does not know to
wait for it and, the second write is dispatched to the origin and causes
data corruption in the chunk C in snapshot2.

To avoid this situation, we maintain snapshots sorted in descending
order of chunk size.  This leads to a guaranteed ordering on the links
between the pending exceptions and avoids the problem explained above -
both A and B now get linked to C.

Cc: stable@kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

6d45d93e

16 10月, 2009 10 次提交

md/async: don't pass a memory pointer as a page pointer. · 5dd33c9a

由 NeilBrown 提交于 10月 16, 2009

md/raid6 passes a list of 'struct page *' to the async_tx routines,
which then either DMA map them for offload, or take the page_address
for CPU based calculations.

For RAID6 we sometime leave 'blanks' in the list of pages.
For CPU based calcs, we want to treat theses as a page of zeros.
For offloaded calculations, we simply don't pass a page to the
hardware.

Currently the 'blanks' are encoded as a pointer to
raid6_empty_zero_page.  This is a 4096 byte memory region, not a
'struct page'.  This is mostly handled correctly but is rather ugly.

So change the code to pass and expect a NULL pointer for the blanks.
When taking page_address of a page, we need to check for a NULL and
in that case use raid6_empty_zero_page.
Signed-off-by: NNeilBrown <neilb@suse.de>

5dd33c9a

md: Fix handling of raid5 array which is being reshaped to fewer devices. · 5e5e3e78

由 NeilBrown 提交于 10月 16, 2009

When a raid5 (or raid6) array is being reshaped to have fewer devices,
conf->raid_disks is the latter and hence smaller number of devices.
However sometimes we want to use a number which is the total number of
currently required devices - the larger of the 'old' and 'new' sizes.
Before we implemented reducing the number of devices, this was always
'new' i.e. ->raid_disks.
Now we need max(raid_disks, previous_raid_disks) in those places.

This particularly affects assembling an array that was shutdown while
in the middle of a reshape to fewer devices.

md.c needs a similar fix when interpreting the md metadata.
Signed-off-by: NNeilBrown <neilb@suse.de>

5e5e3e78

N
md: fix problems with RAID6 calculations for DDF. · e4424fee
由 NeilBrown 提交于 10月 16, 2009
```
Signed-off-by: NNeilBrown <neilb@suse.de>
```
e4424fee

md/raid456: downlevel multicore operations to raid_run_ops · 417b8d4a

由 Dan Williams 提交于 10月 16, 2009

The percpu conversion allowed a straightforward handoff of stripe
processing to the async subsytem that initially showed some modest gains
(+4%).  However, this model is too simplistic and leads to stripes
bouncing between raid5d and the async thread pool for every invocation
of handle_stripe().  As reported by Holger this can fall into a
pathological situation severely impacting throughput (6x performance
loss).

By downleveling the parallelism to raid_run_ops the pathological
stripe_head bouncing is eliminated.  This version still exhibits an
average 11% throughput loss for:

	mdadm --create /dev/md0 /dev/sd[b-q] -n 16 -l 6
	echo 1024 > /sys/block/md0/md/stripe_cache_size
	dd if=/dev/zero of=/dev/md0 bs=1024k count=2048

...but the results are at least stable and can be used as a base for
further multicore experimentation.
Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

417b8d4a

md: drivers/md/unroll.pl replaced with awk analog · dce3a7a4

由 Vladimir Dronnikov 提交于 10月 16, 2009

drivers/md/unroll.pl replaced by awk script to drop build-time
dependency on perl
Signed-off-by: NVladimir Dronnikov <dronnikov@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

dce3a7a4

md: remove clumsy usage of do_sync_mapping_range from bitmap code · ae8fa283

由 NeilBrown 提交于 10月 16, 2009

and replace with vfs_fsync which is much neater (but wasn't exported,
or even in existence at the time the code was written).

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NNeilBrown <neilb@suse.de>

ae8fa283

md: raid1/raid10: handle allocation errors during array setup. · ed9bfdf1

由 NeilBrown 提交于 10月 16, 2009

Both raid1 and raid10 create a mempool during startup.
If the 'alloc' function for this mempool fails, unplug_slaves
is called.
If that happens when the pool is being initialised, unplug_slaves
will try to use the 'conf' structure that isn't filled in yet, and
badness will happen.

So ensure that unplug_slaves doesn't get called unless we know
that the conf structure if fully initialised.
Signed-off-by: NNeilBrown <neilb@suse.de>

ed9bfdf1

md/raid5: initialize conf->device_lock earlier · f5efd45a

由 Dan Williams 提交于 10月 16, 2009

Deallocating a raid5_conf_t structure requires taking 'device_lock'.
Ensure it is initialized before it is used, i.e. initialize the lock
before attempting any further initializations that might fail.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f5efd45a

md/raid1/raid10: add a cond_resched · 1d9d5241

由 NeilBrown 提交于 10月 16, 2009

During 'check' of a raid1 or raid10 it is possible for the management
thread to spend a lot of time running 'memcmp' on blocks from
different devices, so make sure the thread has a chance to schedule.
raid5d already has a cond_resched (in process_stripe).
Reported-By: NLee Howard <faxguy@howardsilvan.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

1d9d5241

Revert "md: do not progress the resync process if the stripe was blocked" · 1442577b

由 NeilBrown 提交于 10月 16, 2009

This reverts commit df10cfbc.

This patch was based on a misunderstanding and risks introducing a busy-wait loop.
So revert it.
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

1442577b

07 10月, 2009 1 次提交

block: Seperate read and write statistics of in_flight requests v2 · 316d315b

由 Nikanth Karthikesan 提交于 10月 06, 2009

Commit a9327cac added seperate read
and write statistics of in_flight requests. And exported the number
of read and write requests in progress seperately through sysfs.

But  Corrado Zoccolo <czoccolo@gmail.com> reported getting strange
output from "iostat -kx 2". Global values for service time and
utilization were garbage. For interval values, utilization was always
100%, and service time is higher than normal.

So this was reverted by commit 0f78ab98

The problem was in part_round_stats_single(), I missed the following:
        if (now == part->stamp)
                return;

-       if (part->in_flight) {
+       if (part_in_flight(part)) {
                __part_stat_add(cpu, part, time_in_queue,
                                part_in_flight(part) * (now - part->stamp));
                __part_stat_add(cpu, part, io_ticks, (now - part->stamp));

With this chunk included, the reported regression gets fixed.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>

--
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

316d315b

05 10月, 2009 1 次提交

Revert "Seperate read and write statistics of in_flight requests" · 0f78ab98

由 Jens Axboe 提交于 10月 04, 2009

This reverts commit a9327cac.

Corrado Zoccolo <czoccolo@gmail.com> reports:

"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 	04/10/2009 	_i686_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10,70    0,00    3,16   15,75    0,00   70,38

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda              18,22     0,00    0,67    0,01    14,77     0,02
43,94     0,01   10,53 39043915,03 2629219,87
sdb              60,89     9,68   50,79    3,04  1724,43    50,52
65,95     0,70   13,06 488437,47 2629219,87

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2,72    0,00    0,74    0,00    0,00   96,53

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6,68    0,00    0,99    0,00    0,00   92,33

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,40    0,00    0,73    1,47    0,00   93,40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     4,00    0,00    3,00     0,00    28,00
18,67     0,06   19,50 333,33 100,00

Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.

I bisected it down to:
[a9327cac] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."

So until this is debugged, revert the bad commit.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0f78ab98

03 10月, 2009 3 次提交

P
dm/connector: Only process connector packages from privileged processes · 24836479
由 Philipp Reisner 提交于 10月 02, 2009
```
Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
24836479

connector/dm: Fixed a compilation warning · 18366b05

由 Philipp Reisner 提交于 10月 02, 2009

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Acked-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18366b05

connector: Provide the sender's credentials to the callback · 7069331d

由 Philipp Reisner 提交于 10月 02, 2009

Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Acked-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7069331d

23 9月, 2009 6 次提交

md: raid-1/10: fix RW bits manipulation · 1ef04fef

由 Dmitry Monakhov 提交于 9月 20, 2009

Recently Jens has changed bio_rw_flagged() logic by following
commit 1f98a13f. Now it returns
bool instead of int. This broke raid1/raid10 RW bits manipulation logic.
One of visible result is BUG_ON triggering due to empty barrier
here scsi_lib.c:1108 scsi_setup_fs_cmnd()
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NNeilBrown <neilb@suse.de>

1ef04fef

md: remove unnecessary memset from multipath. · f28f4e27

由 NeilBrown 提交于 9月 23, 2009

Recent commit bbba809e
replaced mempool_create_kzalloc_pool with mempool_create_kmalloc_pool
plus a memset.
This memset is not needed (and we didn't need kzalloc in the first
place).
Ever field of the allocated structure (struct multipath_bh) is
initialised immediately except retry_list, and memset does not
initial a list_head anyway.

To remove the memset.
Signed-off-by: NNeilBrown <neilb@suse.de>

f28f4e27

md: report device as congested when suspended · 3fa841d7

由 NeilBrown 提交于 9月 23, 2009

This should writeback from coming when the device is temporarily
suspended.
Signed-off-by: NNeilBrown <neilb@suse.de>

3fa841d7

md: Improve name of threads created by md_register_thread · 0da3c619

由 NeilBrown 提交于 9月 23, 2009

The management thread for raid4,5,6 arrays are all called
mdX_raid5, independent of the actual raid level, which is wrong and
can be confusion.

So change md_register_thread to use the name from the personality
unless no alternate name (like 'resync' or 'reshape') is given.

This is simpler and more correct.

Cc: Jinzc <zhenchengjin@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

0da3c619

md: remove sparse warnings about lock context. · ee305ace

由 NeilBrown 提交于 9月 23, 2009

There was a real error here on a failure path where we
incorrectly call rcu_read_unlock.
Signed-off-by: NNeilBrown <neilb@suse.de>

ee305ace

md: remove sparse waring "symbol xxx shadows an earlier one" · a9f326eb

由 NeilBrown 提交于 9月 23, 2009

Rename some variable and remove some duplicate definitions
to avoid there warnings.  None of them are actual errors.
Signed-off-by: NNeilBrown <neilb@suse.de>

a9f326eb

22 9月, 2009 2 次提交

md: avoid use of broken kzalloc mempool · bbba809e

由 Sage Weil 提交于 9月 21, 2009

The kzalloc mempool does not re-zero items that have been used and then
returned to the pool.  Manually zero the allocated multipath_bh instead.
Acked-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bbba809e

const: make block_device_operations const · 83d5cde4

由 Alexey Dobriyan 提交于 9月 21, 2009

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83d5cde4

21 9月, 2009 1 次提交

trivial: fix typo "for for" in multiple files · 411c9403

由 Anand Gadiyar 提交于 7月 07, 2009

trivial: fix typo "for for" in multiple files
Signed-off-by: NAnand Gadiyar <gadiyar@ti.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

411c9403

20 9月, 2009 1 次提交

Driver-Core: extend devnode callbacks to provide permissions · e454cea2

由 Kay Sievers 提交于 9月 18, 2009

This allows subsytems to provide devtmpfs with non-default permissions
for the device node. Instead of the default mode of 0600, null, zero,
random, urandom, full, tty, ptmx now have a mode of 0666, which allows
non-privileged processes to access standard device nodes in case no
other userspace process applies the expected permissions.

This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

e454cea2

17 9月, 2009 2 次提交

md/raid6: cleanup ops_run_compute6_2 · 6c910a78

由 Dan Williams 提交于 9月 16, 2009

Neil says:
	"It is correct as it stands, but the fact that every branch in
	 the 'if' part ends with a 'return' isn't immediately obvious,
	 so it is clearer if we are explicit about the if / then / else
	 structure."
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6c910a78

md/raid6: eliminate BUG_ON with side effect · 2d6e4ecc

由 Dan Williams 提交于 9月 16, 2009

As pointed out by Neil it should be possible to build a driver with all
BUG_ON statements deleted.  It's bad form to have a BUG_ON with a side
effect.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

2d6e4ecc