提交 · 5f0aa21da6cc620b08e5f69f51db29cb1f722174 · openanolis / cloud-kernel

22 9月, 2016 3 次提交

md-cluster: protect md_find_rdev_nr_rcu with rcu lock · 5f0aa21d

由 Guoqing Jiang 提交于 8月 12, 2016

We need to use rcu_read_lock/unlock to avoid potential
race.
Reported-by: NShaohua Li <shli@fb.com>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5f0aa21d

md-cluster: remove some unnecessary dlm_unlock_sync · e3f924d3

由 Guoqing Jiang 提交于 8月 12, 2016

Since DLM_LKF_FORCEUNLOCK is used in lockres_free,
we don't need to call dlm_unlock_sync before free
lock resource.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

e3f924d3

md-cluster: use FORCEUNLOCK in lockres_free · 400cb454

由 Guoqing Jiang 提交于 8月 12, 2016

For dlm_unlock, we need to pass flag to dlm_unlock as the
third parameter instead of set res->flags.

Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock
since it works even the lock is on waiting or convert queue.
Acked-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

400cb454

25 8月, 2016 1 次提交

md-cluster: fix error return code in join() · 0f6187db

由 Wei Yongjun 提交于 8月 21, 2016

Fix to return error code -ENOMEM from the lockres_init() error
handling case instead of 0, as done elsewhere in this function.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NShaohua Li <shli@fb.com>

0f6187db

10 5月, 2016 2 次提交

md-cluster: check the return value of process_recvd_msg · 1fa9a1ad

由 Guoqing Jiang 提交于 5月 03, 2016

We don't need to run the full path of recv_daemon
if process_recvd_msg doesn't return 0.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

1fa9a1ad

md-cluster: gather resync infos and enable recv_thread after bitmap is ready · 51e453ae

由 Guoqing Jiang 提交于 5月 04, 2016

The in-memory bitmap is not ready when node joins cluster,
so it doesn't make sense to make gather_all_resync_info()
called so earlier, we need to call it after the node's
bitmap is setup. Also, recv_thread could be wake up after
node joins cluster, but it could cause problem if node
receives RESYNCING message without persionality since
mddev->pers->quiesce is called in process_suspend_info.

This commit introduces a new cluster interface load_bitmaps
to fix above problems, load_bitmaps is called in bitmap_load
where bitmap and persionality are ready, and load_bitmaps
does the following tasks:

1. call gather_all_resync_info to load all the node's
   bitmap info.
2. set MD_CLUSTER_ALREADY_IN_CLUSTER bit to recv_thread
   could be wake up, and wake up recv_thread if there is
   pending recv event.

Then ack_bast only wakes up recv_thread after IN_CLUSTER
bit is ready otherwise MD_CLUSTER_PENDING_RESYNC_EVENT is
set.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

51e453ae

05 5月, 2016 5 次提交

md-cluster: sync bitmap when node received RESYNCING msg · 18c9ff7f

由 Guoqing Jiang 提交于 5月 02, 2016

If the node received RESYNCING message which means
another node will perform resync with the area, then
we don't want to do it again in another node.

Let's set RESYNC_MASK and clear NEEDED_MASK for the
region from old-low to new-low which has finished
syncing, and the region from old-hi to new-hi is about
to syncing, bitmap_sync_with_cluste is introduced for
the purpose.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

18c9ff7f

md-cluster: fix locking when node joins cluster during message broadcast · 1535212c

由 Guoqing Jiang 提交于 5月 02, 2016

If a node joins the cluster while a message broadcast
is under way, a lock issue could happen as follows.

For a cluster which included two nodes, if node A is
calling __sendmsg before up-convert CR to EX on ack,
and node B released CR on ack. But if a new node C
joins the cluster and it doesn't receive the message
which A sent before, so it could hold CR on ack before
A up-convert CR to EX on ack.

So a node joining the cluster should get an EX lock on
the "token" first to ensure no broadcast is ongoing,
then release it after held CR on ack.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

1535212c

md-cluster: unregister thread if err happened · 5b0fb33e

由 Guoqing Jiang 提交于 5月 02, 2016

The two threads need to be unregistered if a node
can't join cluster successfully.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

5b0fb33e

md-cluster: wake up thread to continue recovery · eb315cd0

由 Guoqing Jiang 提交于 5月 02, 2016

In recovery case, we need to set MD_RECOVERY_NEEDED
and wake up thread only if recover is not finished.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

eb315cd0

md-cluster: change resync lock from asynchronous to synchronous · 41a9a0dc

由 Guoqing Jiang 提交于 5月 02, 2016

If multiple nodes choose to attempt do resync at the same time
they need to be serialized so they don't duplicate effort. This
serialization is done by locking the 'resync' DLM lock.

Currently if a node cannot get the lock immediately it doesn't
request notification when the lock becomes available (i.e.
DLM_LKF_NOQUEUE is set), so it may not reliably find out when it
is safe to try again.

Rather than trying to arrange an async wake-up when the lock
becomes available, switch to using synchronous locking - this is
a lot easier to think about.  As it is not permitted to block in
the 'raid1d' thread, move the locking to the resync thread.  So
the rsync thread is forked immediately, but it blocks until the
resync lock is available. Once the lock is locked it checks again
if any resync action is needed.

A particular symptom of the current problem is that a node can
get stuck with "resync=pending" indefinitely.
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

41a9a0dc

25 1月, 2016 1 次提交

md-cluster: fix missing memory free · 4ac7a65f

由 Shaohua Li 提交于 1月 22, 2016

There are several places we allocate dlm_lock_resource, but not free it.

leave() need free a lock resource too (from Guoqing)
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Guoqing Jiang <gqjiang@suse.com>
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

4ac7a65f

06 1月, 2016 7 次提交

md-cluster: update comments for MD_CLUSTER_SEND_LOCKED_ALREADY · e19508fa

由 Guoqing Jiang 提交于 12月 21, 2015

1. fix unbalanced parentheses.
2. add more description about that MD_CLUSTER_SEND_LOCKED_ALREADY
   will be cleared after set it in add_new_disk.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

e19508fa

md-cluster: Protect communication with mutexes · 8b9277c8

由 Guoqing Jiang 提交于 12月 21, 2015

Communication can happen through multiple threads. It is possible that
one thread steps over another threads sequence. So, we use mutexes to
protect both the send and receive sequences.

Send communication is locked through state bit, MD_CLUSTER_SEND_LOCK.
Communication is locked with bit manipulation in order to allow
"lock and hold" for the add operation. In case of an add operation,
if the lock is held, MD_CLUSTER_SEND_LOCKED_ALREADY is set.
When md_update_sb() calls metadata_update_start(), it checks
(in a single statement to avoid races), if the communication
is already locked. If yes, it merely returns zero, else it
locks the token lockresource.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

8b9277c8

md-cluster: Defer MD reloading to mddev->thread · 15858fa5

由 Guoqing Jiang 提交于 12月 21, 2015

Reloading of superblock must be performed under reconfig_mutex. However,
this cannot be done with md_reload_sb because it would deadlock with
the message DLM lock. So, we defer it in md_check_recovery() which is
executed by mddev->thread.

This introduces a new flag, MD_RELOAD_SB, which if set, will reload the
superblock. And good_device_nr is also added to 'struct mddev' which is
used to get the num of the good device within cluster raid.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

15858fa5

md-cluster: append some actions when change bitmap from clustered to none · f6a2dc64

由 Guoqing Jiang 提交于 12月 21, 2015

For clustered raid, we need to do extra actions when change
bitmap to none.

1. check if all the bitmap lock could be get or not, if yes then
   we can continue the change since cluster raid is only active
   in current node. Otherwise return fail and unlock the related
   bitmap locks
2. set nodes to 0 and then leave cluster environment.
3. release other nodes's bitmap lock.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

f6a2dc64

md-cluster: Fix the remove sequence with the new MD reload code · 54a88392

由 Goldwyn Rodrigues 提交于 12月 21, 2015

The remove disk message does not need metadata_update_start(), but
can be an independent message.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

54a88392

md-cluster: remove a disk asynchronously from cluster environment · 659b254f

由 Guoqing Jiang 提交于 12月 21, 2015

For cluster raid, if one disk couldn't be reach in one node, then
other nodes would receive the REMOVE message for the disk.

In receiving node, we can't call md_kick_rdev_from_array to remove
the disk from array synchronously since the disk might still be busy
in this node. So let's set a ClusterRemove flag on the disk, then
let the thread to do the removal job eventually.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

659b254f

md-cluster: Avoid the resync ping-pong · ac277c6a

由 Goldwyn Rodrigues 提交于 12月 21, 2015

If a RESYNCING message with (0,0) has been sent before, do not send it
again. This avoids a resync ping pong between the nodes. We read
the bitmap lockresource's LVB to figure out the previous value
of the RESYNCING message.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

ac277c6a

24 10月, 2015 4 次提交

N
md-cluster: remove mddev arg from add_resync_info() · 30661b49
由 NeilBrown 提交于 10月 19, 2015
```
The arg isn't used, so its presence is only confusing.
Signed-off-by: NNeilBrown <neilb@suse.com>
```
30661b49

md-cluster: don't cast void pointers when assigning them. · 2e2a7cd9

由 NeilBrown 提交于 10月 19, 2015

It is common practice in the kernel to leave out this case.
It isn't needed and adds little if any value.
Signed-off-by: NNeilBrown <neilb@suse.com>

2e2a7cd9

N
md-cluster: discard unused sb_mutex. · 82381523
由 NeilBrown 提交于 10月 19, 2015
```
Signed-off-by: NNeilBrown <neilb@suse.com>
```
82381523

md-cluster: Fix warnings when build with CF=-D__CHECK_ENDIAN__ · cf97a348

由 Guoqing Jiang 提交于 10月 16, 2015

This patches fixes sparse warnings like incorrect type in assignment
(different base types), cast to restricted __le64.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

cf97a348

16 10月, 2015 1 次提交

md-cluster: metadata_update_finish: consistently use cmsg.raid_slot as le32 · ba2746b0

由 NeilBrown 提交于 10月 16, 2015

As cmsg.raid_slot is le32, comparing for >0 is not meaningful.

So introduce cpu-endian 'raid_slot' and only assign to cmsg.raid_slot
when we know value is valid.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

ba2746b0

13 10月, 2015 6 次提交

G
md-cluster: Add 'SUSE' as author for md-cluster.c · 86b57277
由 Guoqing Jiang 提交于 10月 12, 2015
```
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
```
86b57277
G
md-cluster: zero cmsg before it was sent · aee177ac
由 Guoqing Jiang 提交于 10月 12, 2015
```
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
```
aee177ac

md-cluster: make sure the node do not receive it's own msg · 256f5b24

由 Guoqing Jiang 提交于 10月 12, 2015

During the past test, the node occasionally received the msg which is
sent from itself, this case should not happen in theory, but it is
better to avoid it in case something wrong happened.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

256f5b24

md-cluster: remove unnecessary setting for slot · 487cf914

由 Guoqing Jiang 提交于 10月 12, 2015

Since slot will be set within _sendmsg, we can remove
the redundant code in resync_info_update.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>

487cf914

G
md-cluster: make other members of cluster_msg is handled by little endian funcs · faeff83f
由 Guoqing Jiang 提交于 10月 12, 2015
```
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
```
faeff83f

md-cluster: Do not printk() every received message · d216711b

由 Goldwyn Rodrigues 提交于 10月 12, 2015

The receive daemon prints kernel messages for every network message
received. This would fill the kernel message log with unnecessary messages.
Remove the pr_info() messages.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

d216711b

12 10月, 2015 7 次提交

md-cluster: Fix adding of new disk with new reload code · dbb64f86

由 Goldwyn Rodrigues 提交于 10月 01, 2015

Adding the disk worked incorrectly with the new reload code. Fix it:

 - No operation should be performed on rdev marked as Candidate
 - After a metadata update operation, kick disk if role is 0xfffe
   else clear Candidate bit and continue with the regular change check.
 - Saving the mode of the lock resource to check if token lock is already
   locked, because it can be called twice while adding a disk. However,
   unlock_comm() must be called only once.
 - add_new_disk() is called by the node initiating the --add operation.
   If it needs to be canceled, call add_new_disk_cancel(). The operation
   is completed by md_update_sb() which will write and unlock the
   communication.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

dbb64f86

md-cluster: Perform resync/recovery under a DLM lock · c186b128

由 Goldwyn Rodrigues 提交于 9月 30, 2015

Resync or recovery must be performed by only one node at a time.
A DLM lock resource, resync_lockres provides the mutual exclusion
so that only one node performs the recovery/resync at a time.

If a node is unable to get the resync_lockres, because recovery is
being performed by another node, it set MD_RECOVER_NEEDED so as
to schedule recovery in the future.

Remove the debug message in resync_info_update()
used during development.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

c186b128

md-cluster: Improve md_reload_sb to be less error prone · 70bcecdb

由 Goldwyn Rodrigues 提交于 8月 21, 2015

md_reload_sb is too simplistic and it explicitly needs to determine
the changes made by the writing node. However, there are multiple areas
where a simple reload could fail.

Instead, read the superblock of one of the "good" rdevs and update
the necessary information:

- read the superblock into a newly allocated page, by temporarily
  swapping out rdev->sb_page and calling ->load_super.
- if that fails return
- if it succeeds, call check_sb_changes
  1. iterates over list of active devices and checks the matching
   dev_roles[] value.
   	If that is 'faulty', the device must be  marked as faulty
	 - call md_error to mark the device as faulty. Make sure
	   not to set CHANGE_DEVS and wakeup mddev->thread or else
	   it would initiate a resync process, which is the responsibility
	   of the "primary" node.
	 - clear the Blocked bit
	 - Call remove_and_add_spares() to hot remove the device.
	If the device is 'spare':
	 - call remove_and_add_spares() to get the number of spares
	   added in this operation.
	 - Reduce mddev->degraded to mark the array as not degraded.
  2. reset recovery_cp
- read the rest of the rdevs to update recovery_offset. If recovery_offset
  is equal to MaxSector, call spare_active() to set it In_sync

This required that recovery_offset be initialized to MaxSector, as
opposed to zero so as to communicate the end of sync for a rdev.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

70bcecdb

md-cluster: Wake up suspended process · b8ca846e

由 Goldwyn Rodrigues 提交于 10月 09, 2015

When the suspended_area is deleted, the suspended processes
must be woken up in order to complete their I/O.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

b8ca846e

md-cluster: send BITMAP_NEEDS_SYNC when node is leaving cluster · 09995411

由 Guoqing Jiang 提交于 10月 01, 2015

Previously, BITMAP_NEEDS_SYNC message is sent when the resyc
aborts, but it could abort for different reasons, and not all
of reasons require another node to take over the resync ownship.

It is better make BITMAP_NEEDS_SYNC message only be sent when
the node is leaving cluster with dirty bitmap. And we also need
to ensure dlm connection is ok.
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

09995411

md-cluster: Use a small window for resync · c40f341f

由 Goldwyn Rodrigues 提交于 8月 19, 2015

Suspending the entire device for resync could take too long. Resync
in small chunks.

cluster's resync window (32M) is maintained in r1conf as
cluster_sync_low and cluster_sync_high and processed in
raid1's sync_request(). If the current resync is outside the cluster
resync window:

1. Set the cluster_sync_low to curr_resync_completed.
2. Check if the sync will fit in the new window, if not issue a
   wait_barrier() and set cluster_sync_low to sector_nr.
3. Set cluster_sync_high to cluster_sync_low + resync_window.
4. Send a message to all nodes so they may add it in their suspension
   list.

bitmap_cond_end_sync is modified to allow to force a sync inorder
to get the curr_resync_completed uptodate with the sector passed.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

c40f341f

md-cluster: complete all write requests before adding suspend_info · 9ed38ff5

由 Goldwyn Rodrigues 提交于 8月 14, 2015

process_suspend_info - which handles the RESYNCING request - must not
reply until all writes which were initiated before the request arrived,
have completed.

As a by-product, all process_* functions now take mddev as their
first arguement making it uniform.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

9ed38ff5

01 9月, 2015 3 次提交

md-cluster: remove inappropriate try_module_get from join() · 18b9f679

由 NeilBrown 提交于 8月 14, 2015

md_setup_cluster already calls try_module_get(), so this
try_module_get isn't needed.
Also, there is no matching module_put (except in error patch),
so this leaves an unbalanced module count.
Signed-off-by: NNeilBrown <neilb@suse.com>

18b9f679

md-cluster: Read the disk bitmap sb and check if it needs recovery · abb9b22a

由 Guoqing Jiang 提交于 7月 10, 2015

In gather_all_resync_info, we need to read the disk bitmap sb and
check if it needs recovery.
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

abb9b22a

md-cluster: only call complete(&cinfo->completion) when node join cluster · eece075c

由 Guoqing Jiang 提交于 7月 10, 2015

Introduce MD_CLUSTER_BEGIN_JOIN_CLUSTER flag to make sure
complete(&cinfo->completion) is only be invoked when node
join cluster. Otherwise node failure could also call the
complete, and it doesn't make sense to do it.
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

eece075c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功