提交 · eece075cda38f55fc5829b5f9ec5fb919c561d81 · openanolis / cloud-kernel

01 9月, 2015 23 次提交

md-cluster: only call complete(&cinfo->completion) when node join cluster · eece075c

由 Guoqing Jiang 提交于 7月 10, 2015

Introduce MD_CLUSTER_BEGIN_JOIN_CLUSTER flag to make sure
complete(&cinfo->completion) is only be invoked when node
join cluster. Otherwise node failure could also call the
complete, and it doesn't make sense to do it.
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

eece075c

md-cluster: add missed lockres_free · 6e6d9f2c

由 Guoqing Jiang 提交于 7月 10, 2015

We also need to free the lock resource before goto out.
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

6e6d9f2c

md-cluster: remove the unused sb_lock · b2b9bfff

由 Guoqing Jiang 提交于 7月 10, 2015

The sb_lock is not used anywhere, so let's remove it.
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

b2b9bfff

md-cluster: init suspend_list and suspend_lock early in join · 9e3072e3

由 Guoqing Jiang 提交于 7月 10, 2015

If the node just join the cluster, and receive the msg from other nodes
before init suspend_list, it will cause kernel crash due to NULL pointer
dereference, so move the initializations early to fix the bug.

md-cluster: Joined cluster 3578507b-e0cb-6d4f-6322-696cd7b1b10c slot 3
BUG: unable to handle kernel NULL pointer dereference at           (null)
... ... ...
Call Trace:
[<ffffffffa0444924>] process_recvd_msg+0x2e4/0x330 [md_cluster]
[<ffffffffa0444a06>] recv_daemon+0x96/0x170 [md_cluster]
[<ffffffffa045189d>] md_thread+0x11d/0x170 [md_mod]
[<ffffffff810768c4>] kthread+0xb4/0xc0
[<ffffffff8151927c>] ret_from_fork+0x7c/0xb0
... ... ...
RIP  [<ffffffffa0443581>] __remove_suspend_info+0x11/0xa0 [md_cluster]
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

9e3072e3

md-cluster: add the error check if failed to get dlm lock · b5ef5678

由 Guoqing Jiang 提交于 7月 10, 2015

In complicated cluster environment, it is possible that the
dlm lock couldn't be get/convert on purpose, the related err
info is added for better debug potential issue.

For lockres_free, if the lock is blocking by a lock request or
conversion request, then dlm_unlock just put it back to grant
queue, so need to ensure the lock is free finally.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

b5ef5678

md-cluster: init completion within lockres_init · b83d51c0

由 Guoqing Jiang 提交于 7月 10, 2015

We should init completion within lockres_init, otherwise
completion could be initialized more than one time during
it's life cycle.
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

b83d51c0

md-cluster: fix deadlock issue on message lock · 66099bb0

由 Guoqing Jiang 提交于 7月 10, 2015

There is problem with previous communication mechanism, and we got below
deadlock scenario with cluster which has 3 nodes.

	Sender                	    Receiver        		Receiver

	token(EX)
       message(EX)
      writes message
   downconverts message(CR)
      requests ack(EX)
		                  get message(CR)            gets message(CR)
                		  reads message                reads message
		               requests EX on message    requests EX on message

To fix this problem, we do the following changes:

1. the sender downconverts MESSAGE to CW rather than CR.
2. and the receiver request PR lock not EX lock on message.

And in case we failed to down-convert EX to CW on message, it is better to
unlock message otherthan still hold the lock.
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NLidong Zhong <ldzhong@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

66099bb0

md-cluster: transfer the resync ownership to another node · dc737d7c

由 Guoqing Jiang 提交于 7月 10, 2015

When node A stops an array while the array is doing a resync, we need
to let another node B take over the resync task.

To achieve the goal, we need the A send an explicit BITMAP_NEEDS_SYNC
message to the cluster. And the node B which received that message will
invoke __recover_slot to do resync.
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

dc737d7c

md-cluster: split recover_slot for future code reuse · 05cd0e51

由 Guoqing Jiang 提交于 7月 10, 2015

Make recover_slot as a wraper to __recover_slot, since the
logic of __recover_slot can be reused for the condition
when other nodes need to take over the resync job.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

05cd0e51

md-cluster: use %pU to print UUIDs · b89f704a

由 Guoqing Jiang 提交于 7月 10, 2015

Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

b89f704a

md: setup safemode_timer before it's being used · 25b2edfa

由 Sasha Levin 提交于 7月 24, 2015

We used to set up the safemode_timer timer in md_run. If md_run
would fail before the timer was set up we'd end up trying to modify
a timer that doesn't have a callback function when we access safe_delay_store,
which would trigger a BUG.

neilb: delete init_timer() call as setup_timer() does that.
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

25b2edfa

md/raid5: handle possible race as reshape completes. · 6cbd8148

由 NeilBrown 提交于 7月 24, 2015

It is possible (though unlikely) for a reshape to be
interrupted between the time that end_reshape is called
and the time when raid5_finish_reshape is called.

This can leave conf->reshape_progress set to MaxSector,
but mddev->reshape_position not.

This combination confused reshape_request() when ->reshape_backwards.
As conf->reshape_progress is so high, it seems the reshape hasn't
really begun.  But assuming MaxSector is a valid address only
leads to sorrow.

So ensure reshape_position and reshape_progress both agree,
and add an extra check in reshape_request() just in case they don't.
Signed-off-by: NNeilBrown <neilb@suse.com>

6cbd8148

md: sync sync_completed has correct value as recovery finishes. · 5ed1df2e

由 NeilBrown 提交于 7月 24, 2015

There can be a small window between the moment that recovery
actually writes the last block and the time when various sysfs
and /proc/mdstat attributes report that it has finished.
During this time, 'sync_completed' can have the wrong value.
This can confuse monitoring software.

So:
 - don't set curr_resync_completed beyond the end of the devices,
 - set it correctly when resync/recovery has completed.
Signed-off-by: NNeilBrown <neilb@suse.com>

5ed1df2e

md: be careful when testing resync_max against curr_resync_completed. · c5e19d90

由 NeilBrown 提交于 7月 17, 2015

While it generally shouldn't happen, it is not impossible for
curr_resync_completed to exceed resync_max.
This can particularly happen when reshaping RAID5 - the current
status isn't copied to curr_resync_completed promptly, so when it
is, it can exceed resync_max.
This happens when the reshape is 'frozen', resync_max is set low,
and reshape is re-enabled.

Taking a difference between two unsigned numbers is always dangerous
anyway, so add a test to behave correctly if
   curr_resync_completed > resync_max
Signed-off-by: NNeilBrown <neilb@suse.com>

c5e19d90

md: set MD_RECOVERY_RECOVER when starting a degraded array. · a4a3d26d

由 NeilBrown 提交于 7月 17, 2015

This ensures that 'sync_action' will show 'recover' immediately the
array is started.  If there is no spare the status will change to
'idle' once that is detected.

Clear MD_RECOVERY_RECOVER for a read-only array to ensure this change
happens.

This allows scripts which monitor status not to get confused -
particularly my test scripts.
Signed-off-by: NNeilBrown <neilb@suse.com>

a4a3d26d

md/raid5: remove incorrect "min_t()" when calculating writepos. · c74c0d76

由 NeilBrown 提交于 7月 15, 2015

This code is calculating:
  writepos, which is the furthest along address (device-space) that we
     *will* be writing to
  readpos, which is the earliest address that we *could* possible read
     from, and
  safepos, which is the earliest address in the 'old' section that we
     might read from after a crash when the reshape position is
     recovered from metadata.

  The first is a precise calculation, so clipping at zero doesn't
  make sense.  As the reshape position is now guaranteed to always be
  a multiple of reshape_sectors and as we already BUG_ON when
  reshape_progress is zero, there is no point in this min_t() call.

  The readpos and safepos are worst case - actual value depends on
  precise geometry.  That worst case could be negative, which is only
  a problem because we are storing the value in an unsigned.
  So leave the min_t() for those.
Signed-off-by: NNeilBrown <neilb@suse.com>

c74c0d76

md/raid5: strengthen check on reshape_position at run. · 05256d98

由 NeilBrown 提交于 7月 15, 2015

When reshaping, we work in units of the largest chunk size.
If changing from a larger to a smaller chunk size, that means we
reshape more than one stripe at a time.  So the required alignment
of reshape_position needs to take into account both the old
and new chunk size.

This means that both 'here_new' and 'here_old' are calculated with
respect to the same (maximum) chunk size, so testing if they are the
same when delta_disks is zero becomes pointless.
Signed-off-by: NNeilBrown <neilb@suse.com>

05256d98

md/raid5: switch to use conf->chunk_sectors in place of mddev->chunk_sectors where possible · 3cb5edf4

由 NeilBrown 提交于 7月 15, 2015

The chunk_sectors and new_chunk_sectors fields of mddev can be changed
any time (via sysfs) that the reconfig mutex can be taken.  So raid5
keeps internal copies in 'conf' which are stable except for a short
locked moment when reshape stops/starts.

So any access that does not hold reconfig_mutex should use the 'conf'
values, not the 'mddev' values.
Several don't.

This could result in corruption if new values were written at awkward
times.

Also use min() or max() rather than open-coding.
Signed-off-by: NNeilBrown <neilb@suse.com>

3cb5edf4

md/raid5: always set conf->prev_chunk_sectors and ->prev_algo · 5cac6bcb

由 NeilBrown 提交于 7月 17, 2015

These aren't really needed when no reshape is happening,
but it is safer to have them always set to a meaningful value.
The next patch will use ->prev_chunk_sectors without checking
if a reshape is happening (because that makes the code simpler),
and this patch makes that safe.
Signed-off-by: NNeilBrown <neilb@suse.com>

5cac6bcb

N
md/raid10: fix a few typos in comments · 02ec5026
由 NeilBrown 提交于 7月 06, 2015
```
Signed-off-by: NNeilBrown <neilb@suse.com>
```
02ec5026

md/raid5: consider updating reshape_position at start of reshape. · 92140480

由 NeilBrown 提交于 7月 06, 2015

md/raid5 only updates ->reshape_position (which is stored in
metadata and is authoritative) occasionally, but particularly
when getting closed to ->resync_max as it must be correct
when ->resync_max is reached.

When mdadm tries to stop an array which is reshaping it will:
 - freeze the reshape,
 - set resync_max to where the reshape has reached.
 - unfreeze the reshape.
When this happens, the reshape is aborted and then restarted.

The restart doesn't check that resync_max is close, and so doesn't
update ->reshape_position like it should.
This results in the reshape stopping, but ->reshape_position being
incorrect.

So on that first call to reshape_request, make sure ->reshape_position
is updated if needed.
Signed-off-by: NNeilBrown <neilb@suse.com>

92140480

md: close some races between setting and checking sync_action. · 985ca973

由 NeilBrown 提交于 7月 06, 2015

When checking sync_action in a script, we want to be sure it is
as accurate as possible.
As resync/reshape etc doesn't always start immediately (a separate
thread is scheduled to do it), it is best if 'action_show'
checks if MD_RECOVER_NEEDED is set (which it does) and in that
case reports what is likely to start soon (which it only sometimes
does).

So:
 - report 'reshape' if reshape_position suggests one might start.
 - set MD_RECOVERY_RECOVER in raid1_reshape(), because that is very
   likely to happen next.
Signed-off-by: NNeilBrown <neilb@suse.com>

985ca973

md: Keep /proc/mdstat reporting recovery until fully DONE. · f7851be7

由 NeilBrown 提交于 7月 02, 2015

Currently when a recovery completes, mdstat shows that it has finished
before the new device is marked as a full member.  Because of this it
can appear to a script that the recovery finished but the array isn't
in sync.

So while MD_RECOVERY_DONE is still set, keep mdstat reporting "recovery".
Once md_reap_sync_thread() completes, the spare will be active and then
MD_RECOVERY_DONE will be cleared.

To ensure this is race-free, set MD_RECOVERY_DONE before clearning
curr_resync.
Signed-off-by: NNeilBrown <neilb@suse.com>

f7851be7

03 8月, 2015 5 次提交

md/raid0: update queue parameter in a safer location. · 199dc6ed

由 NeilBrown 提交于 8月 03, 2015

When a (e.g.) RAID5 array is reshaped to RAID0, the updating
of queue parameters (e.g. max number of sectors per bio) is
done in the wrong place.
It should be part of ->run, but it is actually part of ->takeover.
This means it happens before level_store() calls:

	blk_set_stacking_limits(&mddev->queue->limits);

and so it ineffective.  This can lead to errors from underlying
devices.

So move all the relevant settings out of create_stripe_zones()
and into raid0_run().

As this can lead to a bug-on it is suitable for any -stable
kernel which supports reshape to RAID0.  So 2.6.35 or later.
As the bug has been present for five years there is no urgency,
so no need to rush into -stable.

Fixes: 9af204cf ("md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover")
Cc: stable@vger.kernel.org (v2.6.35+ - please delay until after -final release).
Reported-by: NYi Zhang <yizhan@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

199dc6ed

md: simplify get_bitmap_file now that "file" is zeroed. · 25eafe1a

由 Benjamin Randazzo 提交于 7月 25, 2015

There is no point assigning '\0' to file->pathname[0] as
file is now zeroed out, so remove that branch and
simplify the code.

[Original patch combined this with the change to use
 kzalloc.  I split the two so that the change to kzalloc
 is easier to backport. - neilb]
Signed-off-by: NBenjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NNeilBrown <neilb@suse.com>

25eafe1a

md/raid5: don't let shrink_slab shrink too far. · 49895bcc

由 NeilBrown 提交于 8月 03, 2015

I have a report of drop_one_stripe() called from
raid5_cache_scan() apparently finding ->max_nr_stripes == 0.

This should not be allowed.

So add a test to keep max_nr_stripes above min_nr_stripes.

Also use a 'mask' rather than a 'mod' in drop_one_stripe
to ensure 'hash' is valid even if max_nr_stripes does reach zero.


Fixes: edbe83ab ("md/raid5: allow the stripe_cache to grow and shrink.")
Cc: stable@vger.kernel.org (4.1 - please release with 2d5b569b)
Reported-by: NTomas Papan <tomas.papan@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

49895bcc

md: use kzalloc() when bitmap is disabled · b6878d9e

由 Benjamin Randazzo 提交于 7月 25, 2015

In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".

5769         file = kmalloc(sizeof(*file), GFP_NOIO);
5770         if (!file)
5771                 return -ENOMEM;

This structure is copied to user space at the end of the function.

5786         if (err == 0 &&
5787             copy_to_user(arg, file, sizeof(*file)))
5788                 err = -EFAULT

But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.

5775         /* bitmap disabled, zero the first byte and copy out */
5776         if (!mddev->bitmap_info.file)
5777                 file->pathname[0] = '\0';
Signed-off-by: NBenjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NNeilBrown <neilb@suse.com>

b6878d9e

md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies · 423f04d6

由 NeilBrown 提交于 7月 27, 2015

raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.

This should probably be part of
  Commit: 34cab6f4 ("md/raid1: fix test for 'was read error from
  last working device'.")
as it addresses the same issue.  It fixes the same bug and should go
to -stable for same reasons.

Fixes: 76073054 ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NNeilBrown <neilb@suse.com>

423f04d6

30 7月, 2015 2 次提交

dm cache: fix device destroy hang due to improper prealloc_used accounting · 795e633a

由 Mike Snitzer 提交于 7月 29, 2015

Commit 665022d7 ("dm cache: avoid calls to prealloc_free_structs() if
possible") introduced a regression that caused the removal of a DM cache
device to hang in cache_postsuspend()'s call to wait_for_migrations()
with the following stack trace:

  [<ffffffff81651457>] schedule+0x37/0x80
  [<ffffffffa041e21b>] cache_postsuspend+0xbb/0x470 [dm_cache]
  [<ffffffff810ba970>] ? prepare_to_wait_event+0xf0/0xf0
  [<ffffffffa0006f77>] dm_table_postsuspend_targets+0x47/0x60 [dm_mod]
  [<ffffffffa0001eb5>] __dm_destroy+0x215/0x250 [dm_mod]
  [<ffffffffa0004113>] dm_destroy+0x13/0x20 [dm_mod]
  [<ffffffffa00098cd>] dev_remove+0x10d/0x170 [dm_mod]
  [<ffffffffa00097c0>] ? dev_suspend+0x240/0x240 [dm_mod]
  [<ffffffffa0009f85>] ctl_ioctl+0x255/0x4d0 [dm_mod]
  [<ffffffff8127ac00>] ? SYSC_semtimedop+0x280/0xe10
  [<ffffffffa000a213>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
  [<ffffffff811fd432>] do_vfs_ioctl+0x2d2/0x4b0
  [<ffffffff81117d5f>] ? __audit_syscall_entry+0xaf/0x100
  [<ffffffff81022636>] ? do_audit_syscall_entry+0x66/0x70
  [<ffffffff811fd689>] SyS_ioctl+0x79/0x90
  [<ffffffff81023e58>] ? syscall_trace_leave+0xb8/0x110
  [<ffffffff81654f6e>] entry_SYSCALL_64_fastpath+0x12/0x71

Fix this by accounting for the call to prealloc_data_structs()
immediately _before_ the call as opposed to after.  This is needed
because it is possible to break out of the control loop after the call
to prealloc_data_structs() but before prealloc_used was set to true.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

795e633a

Revert "dm cache: do not wake_worker() in free_migration()" · 3508e659

由 Mike Snitzer 提交于 7月 29, 2015

This reverts commit 386cb7cd.

Taking the wake_worker() out of free_migration() will slow writeback
dramatically, and hence adaptability.

Say we have 10k blocks that need writing back, but are only able to
issue 5 concurrently due to the migration bandwidth: it's imperative
that we wake_worker() immediately after migration completion; waiting
for the next 1 second wake up (via do_waker) means it'll take a long
time to write that all back.
Reported-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3508e659

27 7月, 2015 3 次提交

dm crypt: update wiki page URL · 6ed443c0

由 Baruch Siach 提交于 7月 05, 2015

Cryptsetup moved to gitlab.  This is a leftover from commit e44f23b3
(dm crypt: update URLs to new cryptsetup project page, 2015-04-05).
Signed-off-by: NBaruch Siach <baruch@tkos.co.il>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6ed443c0

dm cache policy smq: fix alloc_bitset check that always evaluates as false · 134bf30c

由 Colin Ian King 提交于 7月 23, 2015

static analysis by cppcheck has found a check on alloc_bitset that
always evaluates as false and hence never finds an allocation failure:

[drivers/md/dm-cache-policy-smq.c:1689]: (warning) Logical conjunction
  always evaluates to false: !EXPR && EXPR.

Fix this by removing the incorrect mq->cache_hit_bits check
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

134bf30c

M
dm thin: return -ENOSPC when erroring retry list due to out of data space · 0a927c2f
由 Mike Snitzer 提交于 7月 21, 2015
```
Otherwise -EIO would be returned when -ENOSPC should be used
consistently.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
0a927c2f

24 7月, 2015 6 次提交

md/raid5: clear R5_NeedReplace when no longer needed. · e6030cb0

由 NeilBrown 提交于 7月 17, 2015

This flag is currently never cleared, which can in rare cases
trigger a warn-on if it is still set but the block isn't
InSync.

So clear it when it isn't need, which includes if the replacement
device has failed.
Signed-off-by: NNeilBrown <neilb@suse.com>

e6030cb0

Fix read-balancing during node failure · 90382ed9

由 Goldwyn Rodrigues 提交于 6月 24, 2015

During a node failure, We need to suspend read balancing so that the
reads are directed to the first device and stale data is not read.
Suspending writes is not required because these would be recorded and
synced eventually.

A new flag MD_CLUSTER_SUSPEND_READ_BALANCING is set in recover_prep().
area_resyncing() will respond true for the entire devices if this
flag is set and the request type is READ. The flag is cleared
in recover_done().
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reported-By: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

90382ed9

md-cluster: fix bitmap sub-offset in bitmap_read_sb · 33e38ac6

由 Goldwyn Rodrigues 提交于 7月 01, 2015

bitmap_read_sb is modifying mddev->bitmap_info.offset. This works for
the first bitmap read. However, when multiple bitmaps need to be opened
by the same node, it ends up corrupting the offset. Fix it by using a
local variable.

Also, bitmap_read_sb is not required in bitmap_copy_from_slot since
it is called in bitmap_create. Remove bitmap_read_sb().
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

33e38ac6

md: Return error if request_module fails and returns positive value · b0c26a79

由 Goldwyn Rodrigues 提交于 7月 22, 2015

request_module() can return 256 (process exited) in some cases,
which is not as specified in the documentation before the
request_module() definition. Convert the error to -ENOENT.

The positive error number results in bitmap_create() returning
a value that is meant to be an error but doesn't look like one,
so it is dereferenced as a point and causes a crash.

(not needed for stable as this is "experimental" code)
Fixes: edb39c9d ("Introduce md_cluster_operations to handle cluster functions")
Signed-off-By: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

b0c26a79

md: Skip cluster setup in case of error while reading bitmap · f7357273

由 Goldwyn Rodrigues 提交于 7月 22, 2015

If the bitmap read fails, the error code set is -EINVAL. However,
we don't check for errors and go ahead with cluster_setup.
Skip the cluster setup in case of error.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

f7357273

md/raid1: fix test for 'was read error from last working device'. · 34cab6f4

由 NeilBrown 提交于 7月 24, 2015

When we get a read error from the last working device, we don't
try to repair it, and don't fail the device.  We simple report a
read error to the caller.

However the current test for 'is this the last working device' is
wrong.
When there is only one fully working device, it assumes that a
non-faulty device is that device.  However a spare which is rebuilding
would be non-faulty but so not the only working device.

So change the test from "!Faulty" to "In_sync".  If ->degraded says
there is only one fully working device and this device is in_sync,
this must be the one.

This bug has existed since we allowed read_balance to read from
a recovering spare in v3.0
Reported-and-tested-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
Fixes: 76073054 ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NNeilBrown <neilb@suse.com>

34cab6f4

23 7月, 2015 1 次提交

md: Skip cluster setup for dm-raid · d3b178ad

由 Goldwyn Rodrigues 提交于 7月 22, 2015

There is a bug that the bitmap superblock isn't initialised properly for
dm-raid, so a new field can have garbage in new fields.
(dm-raid does initialisation in the kernel - md initialised the
 superblock in mdadm).

This means that for dm-raid we cannot currently trust the new ->nodes
field. So:
 - use __GFP_ZERO to initialise the superblock properly for all new
    arrays
 - initialise all fields in bitmap_info in bitmap_new_disk_sb
 - ignore ->nodes for dm arrays (yes, this is a hack)

This bug exposes dm-raid to bug in the (still experimental) md-cluster
code, so it is suitable for -stable.  It does cause crashes.

References: https://bugzilla.kernel.org/show_bug.cgi?id=100491
Cc: stable@vger.kernel.org (v4.1)
Signed-off-By: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

d3b178ad

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功