1. 13 10月, 2015 4 次提交
  2. 12 10月, 2015 7 次提交
    • G
      md-cluster: Fix adding of new disk with new reload code · dbb64f86
      Goldwyn Rodrigues 提交于
      Adding the disk worked incorrectly with the new reload code. Fix it:
      
       - No operation should be performed on rdev marked as Candidate
       - After a metadata update operation, kick disk if role is 0xfffe
         else clear Candidate bit and continue with the regular change check.
       - Saving the mode of the lock resource to check if token lock is already
         locked, because it can be called twice while adding a disk. However,
         unlock_comm() must be called only once.
       - add_new_disk() is called by the node initiating the --add operation.
         If it needs to be canceled, call add_new_disk_cancel(). The operation
         is completed by md_update_sb() which will write and unlock the
         communication.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      dbb64f86
    • G
      md-cluster: Perform resync/recovery under a DLM lock · c186b128
      Goldwyn Rodrigues 提交于
      Resync or recovery must be performed by only one node at a time.
      A DLM lock resource, resync_lockres provides the mutual exclusion
      so that only one node performs the recovery/resync at a time.
      
      If a node is unable to get the resync_lockres, because recovery is
      being performed by another node, it set MD_RECOVER_NEEDED so as
      to schedule recovery in the future.
      
      Remove the debug message in resync_info_update()
      used during development.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      c186b128
    • G
      md-cluster: Improve md_reload_sb to be less error prone · 70bcecdb
      Goldwyn Rodrigues 提交于
      md_reload_sb is too simplistic and it explicitly needs to determine
      the changes made by the writing node. However, there are multiple areas
      where a simple reload could fail.
      
      Instead, read the superblock of one of the "good" rdevs and update
      the necessary information:
      
      - read the superblock into a newly allocated page, by temporarily
        swapping out rdev->sb_page and calling ->load_super.
      - if that fails return
      - if it succeeds, call check_sb_changes
        1. iterates over list of active devices and checks the matching
         dev_roles[] value.
         	If that is 'faulty', the device must be  marked as faulty
      	 - call md_error to mark the device as faulty. Make sure
      	   not to set CHANGE_DEVS and wakeup mddev->thread or else
      	   it would initiate a resync process, which is the responsibility
      	   of the "primary" node.
      	 - clear the Blocked bit
      	 - Call remove_and_add_spares() to hot remove the device.
      	If the device is 'spare':
      	 - call remove_and_add_spares() to get the number of spares
      	   added in this operation.
      	 - Reduce mddev->degraded to mark the array as not degraded.
        2. reset recovery_cp
      - read the rest of the rdevs to update recovery_offset. If recovery_offset
        is equal to MaxSector, call spare_active() to set it In_sync
      
      This required that recovery_offset be initialized to MaxSector, as
      opposed to zero so as to communicate the end of sync for a rdev.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      70bcecdb
    • G
      md-cluster: Wake up suspended process · b8ca846e
      Goldwyn Rodrigues 提交于
      When the suspended_area is deleted, the suspended processes
      must be woken up in order to complete their I/O.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      b8ca846e
    • G
      md-cluster: send BITMAP_NEEDS_SYNC when node is leaving cluster · 09995411
      Guoqing Jiang 提交于
      Previously, BITMAP_NEEDS_SYNC message is sent when the resyc
      aborts, but it could abort for different reasons, and not all
      of reasons require another node to take over the resync ownship.
      
      It is better make BITMAP_NEEDS_SYNC message only be sent when
      the node is leaving cluster with dirty bitmap. And we also need
      to ensure dlm connection is ok.
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      09995411
    • G
      md-cluster: Use a small window for resync · c40f341f
      Goldwyn Rodrigues 提交于
      Suspending the entire device for resync could take too long. Resync
      in small chunks.
      
      cluster's resync window (32M) is maintained in r1conf as
      cluster_sync_low and cluster_sync_high and processed in
      raid1's sync_request(). If the current resync is outside the cluster
      resync window:
      
      1. Set the cluster_sync_low to curr_resync_completed.
      2. Check if the sync will fit in the new window, if not issue a
         wait_barrier() and set cluster_sync_low to sector_nr.
      3. Set cluster_sync_high to cluster_sync_low + resync_window.
      4. Send a message to all nodes so they may add it in their suspension
         list.
      
      bitmap_cond_end_sync is modified to allow to force a sync inorder
      to get the curr_resync_completed uptodate with the sector passed.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c40f341f
    • G
      md-cluster: complete all write requests before adding suspend_info · 9ed38ff5
      Goldwyn Rodrigues 提交于
      process_suspend_info - which handles the RESYNCING request - must not
      reply until all writes which were initiated before the request arrived,
      have completed.
      
      As a by-product, all process_* functions now take mddev as their
      first arguement making it uniform.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      9ed38ff5
  3. 01 9月, 2015 12 次提交
  4. 24 7月, 2015 1 次提交
  5. 22 4月, 2015 3 次提交
  6. 21 3月, 2015 3 次提交
  7. 23 2月, 2015 10 次提交
    • G
      Add new disk to clustered array · 1aee41f6
      Goldwyn Rodrigues 提交于
      Algorithm:
      1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
         ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD)
      2. Node 1 sends NEWDISK with uuid and slot number
      3. Other nodes issue kobject_uevent_env with uuid and slot number
      (Steps 4,5 could be a udev rule)
      4. In userspace, the node searches for the disk, perhaps
         using blkid -t SUB_UUID=""
      5. Other nodes issue either of the following depending on whether the disk
         was found:
         ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
      	 disc.number set to slot number)
         ioctl(CLUSTERED_DISK_NACK)
      6. Other nodes drop lock on no-new-devs (CR) if device is found
      7. Node 1 attempts EX lock on no-new-devs
      8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk
         as SpareLocal
      9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED
      10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message.
      Signed-off-by: NLidong Zhong <lzhong@suse.com>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      1aee41f6
    • G
      Suspend writes in RAID1 if within range · 589a1c49
      Goldwyn Rodrigues 提交于
      If there is a resync going on, all nodes must suspend writes to the
      range. This is recorded in the suspend_info/suspend_list.
      
      If there is an I/O within the ranges of any of the suspend_info,
      should_suspend will return 1.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      589a1c49
    • G
      Resync start/Finish actions · e59721cc
      Goldwyn Rodrigues 提交于
      When a RESYNC_START message arrives, the node removes the entry
      with the current slot number and adds the range to the
      suspend_list.
      
      Simlarly, when a RESYNC_FINISHED message is received, node clears
      entry with respect to the bitmap number.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      e59721cc
    • G
      Send RESYNCING while performing resync start/stop · 965400eb
      Goldwyn Rodrigues 提交于
      When a resync is initiated, RESYNCING message is sent to all active
      nodes with the range (lo,hi). When the resync is over, a RESYNCING
      message is sent with (0,0). A high sector value of zero indicates
      that the resync is over.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      965400eb
    • G
      Reload superblock if METADATA_UPDATED is received · 1d7e3e96
      Goldwyn Rodrigues 提交于
      Re-reads the devices by invalidating the cache.
      Since we don't write to faulty devices, this is detected using
      events recorded in the devices. If it is old as compared to the mddev
      mark it is faulty.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      1d7e3e96
    • G
      metadata_update sends message to other nodes · 293467aa
      Goldwyn Rodrigues 提交于
         - request to send a message
         - make changes to superblock
         - send messages telling everyone that the superblock has changed
         - other nodes all read the superblock
         - other nodes all ack the messages
         - updating node release the "I'm sending a message" resource.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      293467aa
    • G
      Communication Framework: Sending functions · 601b515c
      Goldwyn Rodrigues 提交于
      The sending part is split in two functions to make sure
      atomicity of the operations, such as the MD superblock update.
      Signed-off-by: NLidong Zhong <lzhong@suse.com>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      601b515c
    • G
      Communication Framework: Receiving · 4664680c
      Goldwyn Rodrigues 提交于
      1. receive status
      
         sender                         receiver                   receiver
         ACK:CR                          ACK:CR                     ACK:CR
      
      2. sender get EX of TOKEN
         sender get EX of MESSAGE
         sender                          receiver                   receiver
         TOKEN:EX                         ACK:CR                     ACK:CR
         MESSAGE:EX
         ACK:CR
      
      3. sender write LVB.
         sender down-convert MESSAGE from EX to CR
         sender try to get EX of ACK
         [ wait until all receiver has *processed* the MESSAGE ]
      
                                           [ triggered by bast of ACK ]
                                           receiver get CR of MESSAGE
                                           receiver read LVB
                                           receiver processes the message
      				     [ wait finish ]
                                           receiver release ACK
      
         sender                         receiver                   receiver
         TOKEN:EX                       MESSAGE:CR                 MESSAGE:CR
         MESSAGE:CR
         ACK:EX
      
      4. sender down-convert ACK from EX to CR
         sender release MESSAGE
         sender release TOKEN
      				  receiver upconvert to EX of MESSAGE
                                        receiver get CR of ACK
      				  receiver release MESSAGE
      
         sender                        receiver                   receiver
         ACK:CR                         ACK:CR                     ACK:CR
      Signed-off-by: NLidong Zhong <lzhong@suse.com>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      4664680c
    • G
      Perform resync for cluster node failure · 4b26a08a
      Goldwyn Rodrigues 提交于
      If bitmap_copy_slot returns hi>0, we need to perform resync.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      4b26a08a
    • G
      Initiate recovery on node failure · e94987db
      Goldwyn Rodrigues 提交于
      The DLM informs us in case of node failure with the DLM slot number.
      cluster_info->recovery_map sets the bit corresponding to the slot number
      and wakes up the recovery thread.
      
      The recovery thread:
      1. Derives the slot number from the recovery_map
      2. Locks the bitmap corresponding to the slot
      3. Copies the set bits to the node-local bitmap
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      e94987db