1. 05 9月, 2009 1 次提交
    • J
      dm raid1: do not allow log_failure variable to unset after being set · d2b69864
      Jonathan Brassow 提交于
      This patch fixes a bug which was triggering a case where the primary leg
      could not be changed on failure even when the mirror was in-sync.
      
      The case involves the failure of the primary device along with
      the transient failure of the log device.  The problem is that
      bios can be put on the 'failures' list (due to log failure)
      before 'fail_mirror' is called due to the primary device failure.
      Normally, this is fine, but if the log device failure is transient,
      a subsequent iteration of the work thread, 'do_mirror', will
      reset 'log_failure'.  The 'do_failures' function then resets
      the 'in_sync' variable when processing bios on the failures list.
      The 'in_sync' variable is what is used to determine if the
      primary device can be switched in the event of a failure.  Since
      this has been reset, the primary device is incorrectly assumed
      to be not switchable.
      
      The case has been seen in the cluster mirror context, where one
      machine realizes the log device is dead before the other machines.
      As the responsibilities of the server migrate from one node to
      another (because the mirror is being reconfigured due to the failure),
      the new server may think for a moment that the log device is fine -
      thus resetting the 'log_failure' variable.
      
      In any case, it is inappropiate for us to reset the 'log_failure'
      variable.  The above bug simply illustrates that it can actually
      hurt us.
      
      Cc: stable@kernel.org
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d2b69864
  2. 24 7月, 2009 2 次提交
  3. 22 6月, 2009 1 次提交
  4. 15 4月, 2009 1 次提交
  5. 03 4月, 2009 2 次提交
  6. 06 1月, 2009 3 次提交
  7. 14 11月, 2008 1 次提交
  8. 30 10月, 2008 1 次提交
  9. 22 10月, 2008 3 次提交
  10. 10 10月, 2008 1 次提交
    • J
      dm raid1: kcopyd should stop on error if errors handled · f7c83e2e
      Jonathan Brassow 提交于
      dm-raid1 is setting the 'DM_KCOPYD_IGNORE_ERROR' flag unconditionally
      when assigning kcopyd work.  kcopyd is responsible for copying an
      assigned section of disk to one or more other disks.  The
      'DM_KCOPYD_IGNORE_ERROR' flag affects kcopyd in the following way:
      
      When not set:
      kcopyd will immediately stop the copy operation when an error is
      encountered.
      
      When set:
      kcopyd will try to proceed regardless of errors and try to continue
      copying any remaining amount.
      
      Since dm-raid1 tracks regions of the address space that are (or
      are not) in sync and it now has the ability to handle these
      errors, we can safely enable this optimization.  This optimization
      is conditional on whether mirror error handling has been enabled.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f7c83e2e
  11. 25 4月, 2008 8 次提交
  12. 29 3月, 2008 1 次提交
  13. 20 2月, 2008 1 次提交
  14. 14 2月, 2008 1 次提交
  15. 08 2月, 2008 5 次提交
    • J
      dm raid1: report fault status · af195ac8
      Jonathan Brassow 提交于
      This patch adds extra information to the mirror status output, so that
      it can be determined which device(s) have failed.  For each mirror device,
      a character is printed indicating the most severe error encountered.  The
      characters are:
       *    A => Alive - No failures
       *    D => Dead - A write failure occurred leaving mirror out-of-sync
       *    S => Sync - A sychronization failure occurred, mirror out-of-sync
       *    R => Read - A read failure occurred, mirror data unaffected
      This allows userspace to properly reconfigure the mirror set.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      af195ac8
    • J
      dm raid1: handle read failures · 06386bbf
      Jonathan Brassow 提交于
      This patch gives the ability to respond-to/record device failures
      that happen during read operations.  It also adds the ability to
      read from mirror devices that are not the primary if they are
      in-sync.
      
      There are essentially two read paths in mirroring; the direct path
      and the queued path.  When a read request is mapped, if the region
      is 'in-sync' the direct path is taken; otherwise the queued path
      is taken.
      
      If the direct path is taken, we must record bio information so that
      if the read fails we can retry it.  We then discover the status of
      a direct read through mirror_end_io.  If the read has failed, we will
      mark the device from which the read was attempted as failed (so we
      don't try to read from it again), restore the bio and try again.
      
      If the queued path is taken, we discover the results of the read
      from 'read_callback'.  If the device failed, we will mark the device
      as failed and attempt the read again if there is another device
      where this region is known to be 'in-sync'.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      06386bbf
    • J
      dm raid1: fix EIO after log failure · b80aa7a0
      Jonathan Brassow 提交于
      This patch adds the ability to requeue write I/O to
      core device-mapper when there is a log device failure.
      
      If a write to the log produces and error, the pending writes are
      put on the "failures" list.  Since the log is marked as failed,
      they will stay on the failures list until a suspend happens.
      
      Suspends come in two phases, presuspend and postsuspend.  We must
      make sure that all the writes on the failures list are requeued
      in the presuspend phase (a requirement of dm core).  This means
      that recovery must be complete (because writes may be delayed
      behind it) and the failures list must be requeued before we
      return from presuspend.
      
      The mechanisms to ensure recovery is complete (or stopped) was
      already in place, but needed to be moved from postsuspend to
      presuspend.  We rely on 'flush_workqueue' to ensure that the
      mirror thread is complete and therefore, has requeued all writes
      in the failures list.
      
      Because we are using flush_workqueue, we must ensure that no
      additional 'queue_work' calls will produce additional I/O
      that we need to requeue (because once we return from
      presuspend, we are unable to do anything about it).  'queue_work'
      is called in response to the following functions:
      - complete_resync_work = NA, recovery is stopped
      - rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it
                                 is ready to recover the region
                                 (recovery is stopped) or it needs
                                 to clear the region in the log*
                                 **this doesn't get called while
                                 suspending**
      - rh_recovery_end = NA, recovery is stopped
      - rh_recovery_start = NA, recovery is stopped
      - write_callback = 1) Writes w/o failures simply call
                         bio_endio -> mirror_end_io -> rh_dec
                         (see rh_dec above)
                         2) Writes with failures are put on
                         the failures list and queue_work is
                         called**
                         ** write_callbacks don't happen
                         during suspend **
      - do_failures = NA, 'queue_work' not called if suspending
      - add_mirror (initialization) = NA, only done on mirror creation
      - queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue
                    is called.  2) No more I/Os are being issued.
                    3) Re-attempted READs can still be handled.
                    (Write completions are handled through rh_dec/
                    write_callback - mention above - and do not
                    use queue_bio.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b80aa7a0
    • J
      dm raid1: handle recovery failures · 8f0205b7
      Jonathan Brassow 提交于
      This patch adds the calls to 'fail_mirror' if an error occurs during
      mirror recovery (aka resynchronization).  'fail_mirror' is responsible
      for recording the type of error by mirror device and ensuring an event
      gets raised for the purpose of notifying userspace.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      8f0205b7
    • J
      dm raid1: handle write failures · 72f4b314
      Jonathan Brassow 提交于
      This patch gives mirror the ability to handle device failures
      during normal write operations.
      
      The 'write_callback' function is called when a write completes.
      If all the writes failed or succeeded, we report failure or
      success respectively.  If some of the writes failed, we call
      fail_mirror; which increments the error count for the device, notes
      the type of error encountered (DM_RAID1_WRITE_ERROR),  and
      selects a new primary (if necessary).  Note that the primary
      device can never change while the mirror is not in-sync (IOW,
      while recovery is happening.)  This means that the scenario
      where a failed write changes the primary and gives
      recovery_complete a chance to misread the primary never happens.
      The fact that the primary can change has necessitated the change
      to the default_mirror field.  We need to protect against reading
      garbage while the primary changes.  We then add the bio to a new
      list in the mirror set, 'failures'.  For every bio in the 'failures'
      list, we call a new function, '__bio_mark_nosync', where we mark
      the region 'not-in-sync' in the log and properly set the region
      state as, RH_NOSYNC.  Userspace must also be notified of the
      failure.  This is done by 'raising an event' (dm_table_event()).
      If fail_mirror is called in process context the event can be raised
      right away.  If in interrupt context, the event is deferred to the
      kmirrord thread - which raises the event if 'event_waiting' is set.
      
      Backwards compatibility is maintained by ignoring errors if
      the DM_FEATURES_HANDLE_ERRORS flag is not present.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      72f4b314
  16. 20 10月, 2007 4 次提交
  17. 10 10月, 2007 1 次提交
  18. 20 7月, 2007 1 次提交
  19. 13 7月, 2007 2 次提交
    • J
      dm raid1: handle log failure · fc1ff958
      Jonathan Brassow 提交于
      When writing to a mirror, the log must be updated first.  Failure
      to update the log could result in the log not properly reflecting
      the state of the mirror if the machine should crash.
      
      We change the return type of the rh_flush function to give us
      the ability to check if a log write was successful.  If the
      log write was unsuccessful, we fail the writes to avoid the
      case where the log does not properly reflect the state of the
      mirror.
      
      A follow-up patch - which is dependent on the ability to
      requeue I/O's to core device-mapper - will requeue the I/O's
      for retry (allowing the mirror to be reconfigured.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc1ff958
    • J
      dm raid1: handle resync failures · f44db678
      Jonathan Brassow 提交于
      Device-mapper mirroring currently takes a best effort approach to
      recovery - failures during mirror synchronization are completely ignored.
      This means that regions are marked 'in-sync' and 'clean' and removed
      from the hash list.  Future reads and writes that query the region
      will incorrectly interpret the region as in-sync.
      
      This patch handles failures during the recovery process.  If a failure
      occurs, the region is marked as 'not-in-sync' (aka RH_NOSYNC) and added
      to a new list 'failed_recovered_regions'.
      
      Regions on the 'failed_recovered_regions' list are not marked as 'clean'
      upon removal from the list.  Furthermore, if the DM_RAID1_HANDLE_ERRORS
      flag is set, the region is marked as 'not-in-sync'.  This action prevents
      any future read-balancing from choosing an invalid device because of the
      'not-in-sync' status.
      
      If "handle_errors" is not specified when creating a mirror (leaving the
      DM_RAID1_HANDLE_ERRORS flag unset), failures will be ignored exactly as they
      would be without this patch.  This is to preserve backwards compatibility with
      user-space tools, such as 'pvmove'.  However, since future read-balancing
      policies will rely on the correct sync status of a region, a user must choose
      "handle_errors" when using read-balancing.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f44db678