1. 20 10月, 2007 4 次提交
  2. 10 10月, 2007 1 次提交
  3. 20 7月, 2007 1 次提交
  4. 13 7月, 2007 5 次提交
    • J
      dm raid1: handle log failure · fc1ff958
      Jonathan Brassow 提交于
      When writing to a mirror, the log must be updated first.  Failure
      to update the log could result in the log not properly reflecting
      the state of the mirror if the machine should crash.
      
      We change the return type of the rh_flush function to give us
      the ability to check if a log write was successful.  If the
      log write was unsuccessful, we fail the writes to avoid the
      case where the log does not properly reflect the state of the
      mirror.
      
      A follow-up patch - which is dependent on the ability to
      requeue I/O's to core device-mapper - will requeue the I/O's
      for retry (allowing the mirror to be reconfigured.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc1ff958
    • J
      dm raid1: handle resync failures · f44db678
      Jonathan Brassow 提交于
      Device-mapper mirroring currently takes a best effort approach to
      recovery - failures during mirror synchronization are completely ignored.
      This means that regions are marked 'in-sync' and 'clean' and removed
      from the hash list.  Future reads and writes that query the region
      will incorrectly interpret the region as in-sync.
      
      This patch handles failures during the recovery process.  If a failure
      occurs, the region is marked as 'not-in-sync' (aka RH_NOSYNC) and added
      to a new list 'failed_recovered_regions'.
      
      Regions on the 'failed_recovered_regions' list are not marked as 'clean'
      upon removal from the list.  Furthermore, if the DM_RAID1_HANDLE_ERRORS
      flag is set, the region is marked as 'not-in-sync'.  This action prevents
      any future read-balancing from choosing an invalid device because of the
      'not-in-sync' status.
      
      If "handle_errors" is not specified when creating a mirror (leaving the
      DM_RAID1_HANDLE_ERRORS flag unset), failures will be ignored exactly as they
      would be without this patch.  This is to preserve backwards compatibility with
      user-space tools, such as 'pvmove'.  However, since future read-balancing
      policies will rely on the correct sync status of a region, a user must choose
      "handle_errors" when using read-balancing.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f44db678
    • J
      dm raid1: clear region outside spinlock · 943317ef
      Jonathan Brassow 提交于
      A clear_region function is permitted to block (in practice, rare) but gets
      called in rh_update_states() with a spinlock held.
      
      The bits being marked and cleared by the above functions are used
      to update the on-disk log, but are never read directly.  We can
      perform these operations outside the spinlock since the
      bits are only changed within one thread viz.
         - mark_region in rh_inc()
         - clear_region in rh_update_states().
      
      So, we grab the clean_regions list items via list_splice() within the
      spinlock and defer clear_region() until we iterate over the list for
      deletion - similar to how the recovered_regions list is already handled.
      We then move the flush() call down to ensure it encapsulates the changes
      which are done by the later calls to clear_region().
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      943317ef
    • M
      dm raid1: fix status · c95bc206
      Milan Broz 提交于
      Fix mirror status line broken in dm-log-report-fault-status.patch:
        - space missing between two words
        - placeholder ("0") required for compatibility with a subsequent patch
        - incorrect offset parameter
      
      Cc: stable@kernel.org
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c95bc206
    • A
      dm: remove duplicate module name from error msgs · 0cd33124
      Alasdair G Kergon 提交于
      Remove explicit module name from messages as the macro now includes it
      automatically.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cd33124
  5. 10 5月, 2007 6 次提交
  6. 09 12月, 2006 2 次提交
  7. 22 11月, 2006 1 次提交
  8. 09 11月, 2006 1 次提交
  9. 03 10月, 2006 1 次提交
  10. 28 8月, 2006 1 次提交
    • D
      [PATCH] dm: Fix deadlock under high i/o load in raid1 setup. · c06aad85
      Daniel Kobras 提交于
      On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
      we experienced frequent deadlock of the system under high i/o load.  'cat
      /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
      after a few GB, 'cp' would be left in 'D' state along with kjournald and
      kmirrord.  The functions cp and kjournald were blocked in did vary, but
      kmirrord's wchan always pointed to 'mempool_alloc()'.  We've seen this pattern
      on 2.6.15 and 2.6.17 kernels.  http://lkml.org/lkml/2005/4/20/142 indicates
      that this problem has been around even before.
      
      So much for the facts, here's my interpretation: mempool_alloc() first tries
      to atomically allocate the requested memory, or falls back to hand out
      preallocated chunks from the mempool.  If both fail, it puts the calling
      process (kmirrord in this case) on a private waitqueue until somebody refills
      the pool.  Where the only 'somebody' is kmirrord itself, so we have a
      deadlock.
      
      I worked around this problem by falling back to a (blocking) kmalloc when
      before kmirrord would have ended up on the waitqueue.  This defeats part of
      the benefits of using the mempool, but at least keeps the system running.  And
      it could be done with a two-line change.  Note that mempool_alloc() clears the
      GFP_NOIO flag internally, and only uses it to decide whether to wait or return
      an error if immediate allocation fails, so the attached patch doesn't change
      behaviour in the non-deadlocking case.  Path is against current git
      (2.6.18-rc4), but should apply to earlier versions as well.  I've tested on
      2.6.15, where this patch makes the difference between random lockup and a
      stable system.
      Signed-off-by: NDaniel Kobras <kobras@linux.de>
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c06aad85
  11. 27 6月, 2006 5 次提交
  12. 28 3月, 2006 2 次提交
    • A
      [PATCH] dm: remove SECTOR_FORMAT · 4ee218cd
      Andrew Morton 提交于
      We don't know what type sector_t has.  Sometimes it's unsigned long, sometimes
      it's unsigned long long.  For example on ppc64 it's unsigned long with
      CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n.
      
      The way to handle all of this is to always use unsigned long long and to
      always typecast the sector_t when printing it.
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ee218cd
    • J
      [PATCH] drivers/md/dm-raid1.c: Fix inconsistent mirroring after interrupted recovery · 930d332a
      Jun'ichi Nomura 提交于
      dm-mirror has potential data corruption problem: while on-disk log shows
      that all disk contents are in-sync, actual contents of the disks are not
      synchronized.  This problem occurs if initial recovery (synching) is
      interrupted and resumed.
      
      Attached patch fixes this problem.
      
      Background:
      
      rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
      (in-sync), which results in the corresponding bit of clean_bits being set.
      
      This is harmful if on-disk log is used and the map is removed/suspended
      before the initial sync is completed.  The clean_bits is written down to
      the on-disk log at the map removal, and, upon resume, it's read and copied
      to sync_bits.  Since the recovery process refers to the sync_bits to find a
      region to be recovered, the region whose state was changed from RH_NOSYNC
      to RH_CLEAN is no longer recovered.
      
      If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
      sometimes ago, the contents of the mirrored disk just corrupt silently.  If
      you have, balanced read may get bogus data from out-of-sync disks.
      
      The patch keeps RH_NOSYNC state unchanged.  It will be changed to
      RH_RECOVERING when recovery starts and get reclaimed when the recovery
      completes.  So it doesn't leak the region hash entry.
      
      Description:
      
      Keep RH_NOSYNC state unchanged when I/O on the region completes.
      
      rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
      (in-sync), which results in the corresponding bit of clean_bits being set.
      
      This is harmful if on-disk log is used and the map is removed/suspended
      before the initial sync is completed.  The clean_bits is written down to
      the on-disk log at the map removal, and, upon resume, it's read and copied
      to sync_bits.  Since the recovery process refers to the sync_bits to find a
      region to be recovered, the region whose state was changed from RH_NOSYNC
      to RH_CLEAN is no longer recovered.
      
      If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
      sometimes ago, the contents of the mirrored disk just corrupt silently.  If
      you have, balanced read may get bogus data from out-of-sync disks.
      
      The RH_NOSYNC region will be changed to RH_RECOVERING when recovery starts
      on the region and get reclaimed when the recovery completes.  So it doesn't
      leak the region hash entry.
      
      Alasdair said:
      
        I've analysed the relevant part of the state machine and I believe that
        the patch is correct.
      
        (Further work on this code is still needed - this patch has the
        side-effect of holding onto memory unnecessarily for long periods of time
        under certain workloads - but better that than corrupting data.)
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      930d332a
  13. 27 3月, 2006 1 次提交
  14. 07 1月, 2006 1 次提交
  15. 23 11月, 2005 1 次提交
  16. 09 10月, 2005 1 次提交
  17. 10 9月, 2005 1 次提交
    • J
      [PATCH] dm: fix rh_dec()/rh_inc() race in dm-raid1.c · 844e8d90
      Jun'ichi Nomura 提交于
      Fix another bug in dm-raid1.c that the dirty region may stay in or be moved
      to clean list and freed while in use.
      
      It happens as follows:
      
         CPU0                                   CPU1
         ------------------------------------------------------------------------------
         rh_dec()
           if (atomic_dec_and_test(pending))
              <the region is still marked dirty>
                                                rh_inc()
                                                  if the region is clean
                                                     mark the region dirty
                                                     and remove from clean list
              mark the region clean
              and move to clean list
                                                        atomic_inc(pending)
      
      At this stage, the region is in clean list and will be mistakenly reclaimed
      by rh_update_states() later.
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      844e8d90
  18. 05 8月, 2005 1 次提交
  19. 08 7月, 2005 1 次提交
  20. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4