1. 22 10月, 2008 3 次提交
  2. 10 10月, 2008 1 次提交
    • J
      dm raid1: kcopyd should stop on error if errors handled · f7c83e2e
      Jonathan Brassow 提交于
      dm-raid1 is setting the 'DM_KCOPYD_IGNORE_ERROR' flag unconditionally
      when assigning kcopyd work.  kcopyd is responsible for copying an
      assigned section of disk to one or more other disks.  The
      'DM_KCOPYD_IGNORE_ERROR' flag affects kcopyd in the following way:
      
      When not set:
      kcopyd will immediately stop the copy operation when an error is
      encountered.
      
      When set:
      kcopyd will try to proceed regardless of errors and try to continue
      copying any remaining amount.
      
      Since dm-raid1 tracks regions of the address space that are (or
      are not) in sync and it now has the ability to handle these
      errors, we can safely enable this optimization.  This optimization
      is conditional on whether mirror error handling has been enabled.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f7c83e2e
  3. 25 4月, 2008 8 次提交
  4. 29 3月, 2008 1 次提交
  5. 20 2月, 2008 1 次提交
  6. 14 2月, 2008 1 次提交
  7. 08 2月, 2008 5 次提交
    • J
      dm raid1: report fault status · af195ac8
      Jonathan Brassow 提交于
      This patch adds extra information to the mirror status output, so that
      it can be determined which device(s) have failed.  For each mirror device,
      a character is printed indicating the most severe error encountered.  The
      characters are:
       *    A => Alive - No failures
       *    D => Dead - A write failure occurred leaving mirror out-of-sync
       *    S => Sync - A sychronization failure occurred, mirror out-of-sync
       *    R => Read - A read failure occurred, mirror data unaffected
      This allows userspace to properly reconfigure the mirror set.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      af195ac8
    • J
      dm raid1: handle read failures · 06386bbf
      Jonathan Brassow 提交于
      This patch gives the ability to respond-to/record device failures
      that happen during read operations.  It also adds the ability to
      read from mirror devices that are not the primary if they are
      in-sync.
      
      There are essentially two read paths in mirroring; the direct path
      and the queued path.  When a read request is mapped, if the region
      is 'in-sync' the direct path is taken; otherwise the queued path
      is taken.
      
      If the direct path is taken, we must record bio information so that
      if the read fails we can retry it.  We then discover the status of
      a direct read through mirror_end_io.  If the read has failed, we will
      mark the device from which the read was attempted as failed (so we
      don't try to read from it again), restore the bio and try again.
      
      If the queued path is taken, we discover the results of the read
      from 'read_callback'.  If the device failed, we will mark the device
      as failed and attempt the read again if there is another device
      where this region is known to be 'in-sync'.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      06386bbf
    • J
      dm raid1: fix EIO after log failure · b80aa7a0
      Jonathan Brassow 提交于
      This patch adds the ability to requeue write I/O to
      core device-mapper when there is a log device failure.
      
      If a write to the log produces and error, the pending writes are
      put on the "failures" list.  Since the log is marked as failed,
      they will stay on the failures list until a suspend happens.
      
      Suspends come in two phases, presuspend and postsuspend.  We must
      make sure that all the writes on the failures list are requeued
      in the presuspend phase (a requirement of dm core).  This means
      that recovery must be complete (because writes may be delayed
      behind it) and the failures list must be requeued before we
      return from presuspend.
      
      The mechanisms to ensure recovery is complete (or stopped) was
      already in place, but needed to be moved from postsuspend to
      presuspend.  We rely on 'flush_workqueue' to ensure that the
      mirror thread is complete and therefore, has requeued all writes
      in the failures list.
      
      Because we are using flush_workqueue, we must ensure that no
      additional 'queue_work' calls will produce additional I/O
      that we need to requeue (because once we return from
      presuspend, we are unable to do anything about it).  'queue_work'
      is called in response to the following functions:
      - complete_resync_work = NA, recovery is stopped
      - rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it
                                 is ready to recover the region
                                 (recovery is stopped) or it needs
                                 to clear the region in the log*
                                 **this doesn't get called while
                                 suspending**
      - rh_recovery_end = NA, recovery is stopped
      - rh_recovery_start = NA, recovery is stopped
      - write_callback = 1) Writes w/o failures simply call
                         bio_endio -> mirror_end_io -> rh_dec
                         (see rh_dec above)
                         2) Writes with failures are put on
                         the failures list and queue_work is
                         called**
                         ** write_callbacks don't happen
                         during suspend **
      - do_failures = NA, 'queue_work' not called if suspending
      - add_mirror (initialization) = NA, only done on mirror creation
      - queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue
                    is called.  2) No more I/Os are being issued.
                    3) Re-attempted READs can still be handled.
                    (Write completions are handled through rh_dec/
                    write_callback - mention above - and do not
                    use queue_bio.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b80aa7a0
    • J
      dm raid1: handle recovery failures · 8f0205b7
      Jonathan Brassow 提交于
      This patch adds the calls to 'fail_mirror' if an error occurs during
      mirror recovery (aka resynchronization).  'fail_mirror' is responsible
      for recording the type of error by mirror device and ensuring an event
      gets raised for the purpose of notifying userspace.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      8f0205b7
    • J
      dm raid1: handle write failures · 72f4b314
      Jonathan Brassow 提交于
      This patch gives mirror the ability to handle device failures
      during normal write operations.
      
      The 'write_callback' function is called when a write completes.
      If all the writes failed or succeeded, we report failure or
      success respectively.  If some of the writes failed, we call
      fail_mirror; which increments the error count for the device, notes
      the type of error encountered (DM_RAID1_WRITE_ERROR),  and
      selects a new primary (if necessary).  Note that the primary
      device can never change while the mirror is not in-sync (IOW,
      while recovery is happening.)  This means that the scenario
      where a failed write changes the primary and gives
      recovery_complete a chance to misread the primary never happens.
      The fact that the primary can change has necessitated the change
      to the default_mirror field.  We need to protect against reading
      garbage while the primary changes.  We then add the bio to a new
      list in the mirror set, 'failures'.  For every bio in the 'failures'
      list, we call a new function, '__bio_mark_nosync', where we mark
      the region 'not-in-sync' in the log and properly set the region
      state as, RH_NOSYNC.  Userspace must also be notified of the
      failure.  This is done by 'raising an event' (dm_table_event()).
      If fail_mirror is called in process context the event can be raised
      right away.  If in interrupt context, the event is deferred to the
      kmirrord thread - which raises the event if 'event_waiting' is set.
      
      Backwards compatibility is maintained by ignoring errors if
      the DM_FEATURES_HANDLE_ERRORS flag is not present.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      72f4b314
  8. 20 10月, 2007 4 次提交
  9. 10 10月, 2007 1 次提交
  10. 20 7月, 2007 1 次提交
  11. 13 7月, 2007 5 次提交
    • J
      dm raid1: handle log failure · fc1ff958
      Jonathan Brassow 提交于
      When writing to a mirror, the log must be updated first.  Failure
      to update the log could result in the log not properly reflecting
      the state of the mirror if the machine should crash.
      
      We change the return type of the rh_flush function to give us
      the ability to check if a log write was successful.  If the
      log write was unsuccessful, we fail the writes to avoid the
      case where the log does not properly reflect the state of the
      mirror.
      
      A follow-up patch - which is dependent on the ability to
      requeue I/O's to core device-mapper - will requeue the I/O's
      for retry (allowing the mirror to be reconfigured.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc1ff958
    • J
      dm raid1: handle resync failures · f44db678
      Jonathan Brassow 提交于
      Device-mapper mirroring currently takes a best effort approach to
      recovery - failures during mirror synchronization are completely ignored.
      This means that regions are marked 'in-sync' and 'clean' and removed
      from the hash list.  Future reads and writes that query the region
      will incorrectly interpret the region as in-sync.
      
      This patch handles failures during the recovery process.  If a failure
      occurs, the region is marked as 'not-in-sync' (aka RH_NOSYNC) and added
      to a new list 'failed_recovered_regions'.
      
      Regions on the 'failed_recovered_regions' list are not marked as 'clean'
      upon removal from the list.  Furthermore, if the DM_RAID1_HANDLE_ERRORS
      flag is set, the region is marked as 'not-in-sync'.  This action prevents
      any future read-balancing from choosing an invalid device because of the
      'not-in-sync' status.
      
      If "handle_errors" is not specified when creating a mirror (leaving the
      DM_RAID1_HANDLE_ERRORS flag unset), failures will be ignored exactly as they
      would be without this patch.  This is to preserve backwards compatibility with
      user-space tools, such as 'pvmove'.  However, since future read-balancing
      policies will rely on the correct sync status of a region, a user must choose
      "handle_errors" when using read-balancing.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f44db678
    • J
      dm raid1: clear region outside spinlock · 943317ef
      Jonathan Brassow 提交于
      A clear_region function is permitted to block (in practice, rare) but gets
      called in rh_update_states() with a spinlock held.
      
      The bits being marked and cleared by the above functions are used
      to update the on-disk log, but are never read directly.  We can
      perform these operations outside the spinlock since the
      bits are only changed within one thread viz.
         - mark_region in rh_inc()
         - clear_region in rh_update_states().
      
      So, we grab the clean_regions list items via list_splice() within the
      spinlock and defer clear_region() until we iterate over the list for
      deletion - similar to how the recovered_regions list is already handled.
      We then move the flush() call down to ensure it encapsulates the changes
      which are done by the later calls to clear_region().
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      943317ef
    • M
      dm raid1: fix status · c95bc206
      Milan Broz 提交于
      Fix mirror status line broken in dm-log-report-fault-status.patch:
        - space missing between two words
        - placeholder ("0") required for compatibility with a subsequent patch
        - incorrect offset parameter
      
      Cc: stable@kernel.org
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c95bc206
    • A
      dm: remove duplicate module name from error msgs · 0cd33124
      Alasdair G Kergon 提交于
      Remove explicit module name from messages as the macro now includes it
      automatically.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cd33124
  12. 10 5月, 2007 6 次提交
  13. 09 12月, 2006 2 次提交
  14. 22 11月, 2006 1 次提交