md-cluster: remove a disk asynchronously from cluster environment

For cluster raid, if one disk couldn't be reach in one node, then other nodes would receive the REMOVE message for the disk. In receiving node, we can't call md_kick_rdev_from_array to remove the disk from array synchronously since the disk might still be busy in this node. So let's set a ClusterRemove flag on the disk, then let the thread to do the removal job eventually. Signed-off-by: N Guoqing Jiang <gqjiang@suse.com> Signed-off-by: N Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: N NeilBrown <neilb@suse.com>

md-cluster: remove a disk asynchronously from cluster environment
For cluster raid, if one disk couldn't be reach in one node, then other nodes would receive the REMOVE message for the disk. In receiving node, we can't call md_kick_rdev_from_array to remove the disk from array synchronously since the disk might still be busy in this node. So let's set a ClusterRemove flag on the disk, then let the thread to do the removal job eventually. Signed-off-by: N Guoqing Jiang <gqjiang@suse.com> Signed-off-by: N Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: N NeilBrown <neilb@suse.com>
659b254f · Guoqing Jiang · NeilBrown · ac277c6a · 659b254f · 659b254f
隐藏空白更改
内联并排

Showing with 18 addition and 2 deletion

drivers/md/md-cluster.c drivers/md/md-cluster.c +5 -2

drivers/md/md.c drivers/md/md.c +12 -0

drivers/md/md.h drivers/md/md.h +1 -0

未找到文件。
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -440,8 +440,11 @@ static void process_remove_disk(struct mddev *mddev, struct cluster_msg *msg)
 	struct md_rdev *rdev = md_find_rdev_nr_rcu(mddev,
 						   le32_to_cpu(msg->raid_slot));

-	if (rdev)
-		md_kick_rdev_from_array(rdev);
+	if (rdev) {
+		set_bit(ClusterRemove, &rdev->flags);
+		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+		md_wakeup_thread(mddev->thread);
+	}
 	else
 		pr_warn("%s: %d Could not find disk(%d) to REMOVE\n",
 			__func__, __LINE__, le32_to_cpu(msg->raid_slot));

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8318,6 +8318,18 @@ void md_check_recovery(struct mddev *mddev)
 			goto unlock;
 		}

+		if (mddev_is_clustered(mddev)) {
+			struct md_rdev *rdev;
+			/* kick the device if another node issued a
+			 * remove disk.
+			 */
+			rdev_for_each(rdev, mddev) {
+				if (test_and_clear_bit(ClusterRemove, &rdev->flags) &&
+						rdev->raid_disk < 0)
+					md_kick_rdev_from_array(rdev);
+			}
+		}
+
 		if (!mddev->external) {
 			int did_change = 0;
 			spin_lock(&mddev->lock);

--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -183,6 +183,7 @@ enum flag_bits {
 				 * Usually, this device should be faster
 				 * than other devices in the array
 				 */
+	ClusterRemove,
 };

 #define BB_LEN_MASK	(0x00000000000001FFULL)