- 22 4月, 2015 2 次提交
-
-
由 Goldwyn Rodrigues 提交于
When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state, the clustered md: 1. Sends RE_ADD message with the desc_nr. Nodes receiving the message clear the Faulty bit in their respective rdev->flags. 2. The node initiating re-add, gathers the bitmaps of all nodes and copies them into the local bitmap. It does not clear the bitmap from which it is copying. 3. Initiating node schedules a md recovery to sync the devices. Signed-off-by: NGuoqing Jiang <gqjiang@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Goldwyn Rodrigues 提交于
This adds "remove" capabilities for the clustered environment. When a user initiates removal of a device from the array, a REMOVE message with disk number in the array is sent to all the nodes which kick the respective device in their own array. This facilitates the removal of failed devices. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 21 3月, 2015 1 次提交
-
-
由 Goldwyn Rodrigues 提交于
A --cluster-confirm without an --add (by another node) can crash the kernel. Fix it by guarding it using a state. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 2月, 2015 7 次提交
-
-
由 Goldwyn Rodrigues 提交于
Algorithm: 1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD) 2. Node 1 sends NEWDISK with uuid and slot number 3. Other nodes issue kobject_uevent_env with uuid and slot number (Steps 4,5 could be a udev rule) 4. In userspace, the node searches for the disk, perhaps using blkid -t SUB_UUID="" 5. Other nodes issue either of the following depending on whether the disk was found: ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and disc.number set to slot number) ioctl(CLUSTERED_DISK_NACK) 6. Other nodes drop lock on no-new-devs (CR) if device is found 7. Node 1 attempts EX lock on no-new-devs 8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk as SpareLocal 9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED 10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message. Signed-off-by: NLidong Zhong <lzhong@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
If there is a resync going on, all nodes must suspend writes to the range. This is recorded in the suspend_info/suspend_list. If there is an I/O within the ranges of any of the suspend_info, should_suspend will return 1. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
When a resync is initiated, RESYNCING message is sent to all active nodes with the range (lo,hi). When the resync is over, a RESYNCING message is sent with (0,0). A high sector value of zero indicates that the resync is over. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
- request to send a message - make changes to superblock - send messages telling everyone that the superblock has changed - other nodes all read the superblock - other nodes all ack the messages - updating node release the "I'm sending a message" resource. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
When a node joins, it does not know of other nodes performing resync. So, each node keeps the resync information in it's LVB. When a new node joins, it reads the LVB of each "online" bitmap. [TODO] The new node attempts to get the PW lock on other bitmap, if it is successful, it reads the bitmap and performs the resync (if required) on it's behalf. If the node does not get the PW, it requests CR and reads the LVB for the resync information. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
DLM offers callbacks when a node fails and the lock remastery is performed: 1. recover_prep: called when DLM discovers a node is down 2. recover_slot: called when DLM identifies the node and recovery can start 3. recover_done: called when all nodes have completed recover_slot recover_slot() and recover_done() are also called when the node joins initially in order to inform the node with its slot number. These slot numbers start from one, so we deduct one to make it start with zero which the cluster-md code uses. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
This allows dynamic registering of cluster hooks. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-