1. 24 10月, 2015 3 次提交
    • N
      md/raid10: fix the 'new' raid10 layout to work correctly. · 8bce6d35
      NeilBrown 提交于
      In Linux 3.9 we introduce a new 'far' layout for RAID10 which was
      supposed to rotate the replicas differently and so provide better
      resilience.  In particular it could survive more combinations of 2
      drive failures.
      
      Unfortunately. due to a coding error, this some did what was wanted,
      sometimes improved less than we hoped, and sometimes - in very
      unlikely circumstances - put multiple replicas on the same device so
      the redundancy was harmed.
      
      No public user-space tool has created arrays using this layout so it
      is very unlikely that zero-redundancy arrays actually exist.  Probably
      no arrays using any form of the new layout exist.  But we cannot be
      certain.
      
      So use another bit in the 'layout' number and introduce a bug-fixed
      version of the layout.
      Also when assembling an array, if it has a zero-redundancy layout,
      give a warning.
      Reported-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      8bce6d35
    • N
      md/raid10: don't clear bitmap bit when bad-block-list write fails. · c340702c
      NeilBrown 提交于
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 95af587e ("md/raid10: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R10BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-by: NNate Dailey <nate.dailey@stratus.com>
      Fixes: bd870a16 ("md/raid10:  Handle write errors by updating badblock log.")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      c340702c
    • N
      md/raid1: don't clear bitmap bit when bad-block-list write fails. · bd8688a1
      NeilBrown 提交于
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R1BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-and-tested-by: NNate Dailey <nate.dailey@stratus.com>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Fixes: cd5ff9a1 ("md/raid1:  Handle write errors by updating badblock log.")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      bd8688a1
  2. 21 10月, 2015 2 次提交
  3. 14 10月, 2015 2 次提交
  4. 10 10月, 2015 1 次提交
  5. 09 10月, 2015 2 次提交
    • J
      dm cache: fix NULL pointer when switching from cleaner policy · 2bffa150
      Joe Thornber 提交于
      The cleaner policy doesn't make use of the per cache block hint space in
      the metadata (unlike the other policies).  When switching from the
      cleaner policy to mq or smq a NULL pointer crash (in dm_tm_new_block)
      was observed.  The crash was caused by bugs in dm-cache-metadata.c
      when trying to skip creation of the hint btree.
      
      The minimal fix is to change hint size for the cleaner policy to 4 bytes
      (only hint size supported).
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2bffa150
    • M
      crash in md-raid1 and md-raid10 due to incorrect list manipulation · a452744b
      Mikulas Patocka 提交于
      The commit 55ce74d4 (md/raid1: ensure
      device failure recorded before write request returns) is causing crash in
      the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100%
      reproducible.
      
      The reason for the crash is that the newly added code in raid1d moves the
      list from conf->bio_end_io_list to tmp, then tests if tmp is non-empty and
      then incorrectly pops the bio from conf->bio_end_io_list (which is empty
      because the list was alrady moved).
      
      Raid-10 has a similar bug.
      
      Kernel Fault: Code=15 regs=000000006ccb8640 (Addr=0000000100000000)
      CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35
      task: 000000006cc1f258 ti: 000000006ccb8000 task.ti: 000000006ccb8000
      
           YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
      PSW: 00001000000001001111111000001111 Not tainted
      r00-03  000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38
      r04-07  000000001059d000 000000007f16be08 0000000000200200 0000000000000001
      r08-11  000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000
      r12-15  000000004056f320 0000000000000000 0000000000013dd0 0000000000000000
      r16-19  00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001
      r20-23  000000000800000f 0000000042200390 0000000000000000 0000000000000000
      r24-27  0000000000000001 000000000800000f 000000007f16be08 000000001059d000
      r28-31  0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000
      sr00-03  0000000000249800 0000000000000000 0000000000000000 0000000000249800
      sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
      IASQ: 0000000000000000 0000000000000000 IAOQ: 000000001059f61c 000000001059f620
       IIR: 0f8010c6    ISR: 0000000000000000  IOR: 0000000100000000
       CPU:        3   CR30: 000000006ccb8000 CR31: 0000000000000000
       ORIG_R28: 000000001059d000
       IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1]
       IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1]
       RP(r2): raid_end_bio_io+0x88/0x168 [raid1]
      Backtrace:
       [<000000001059f818>] raid_end_bio_io+0x88/0x168 [raid1]
       [<00000000105a4f64>] raid1d+0x144/0x1640 [raid1]
       [<000000004017fd5c>] kthread+0x144/0x160
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Fixes: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
      Fixes: 95af587e ("md/raid10: ensure device failure recorded before write request returns.")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      a452744b
  6. 06 10月, 2015 1 次提交
  7. 03 10月, 2015 1 次提交
    • M
      dm raid: fix round up of default region size · 042745ee
      Mikulas Patocka 提交于
      Commit 3a0f9aae ("dm raid: round region_size to power of two")
      intended to make sure that the default region size is a power of two.
      However, the logic in that commit is incorrect and sets the variable
      region_size to 0 or 1, depending on whether min_region_size is a power
      of two.
      
      Fix this logic, using roundup_pow_of_two(), so that region_size is
      properly rounded up to the next power of two.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Fixes: 3a0f9aae ("dm raid: round region_size to power of two")
      Cc: stable@vger.kernel.org # v3.8+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      042745ee
  8. 02 10月, 2015 8 次提交
  9. 01 10月, 2015 1 次提交
    • J
      dm: fix AB-BA deadlock in __dm_destroy() · 2a708cff
      Junichi Nomura 提交于
      __dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and
      suspend_lock in reverse order.  Doing so can cause AB-BA deadlock:
      
        __dm_destroy                    dm_swap_table
        ---------------------------------------------------
                                        mutex_lock(suspend_lock)
        dm_get_live_table()
          srcu_read_lock(io_barrier)
                                        dm_sync_table()
                                          synchronize_srcu(io_barrier)
                                            .. waiting for dm_put_live_table()
        mutex_lock(suspend_lock)
          .. waiting for suspend_lock
      
      Fix this by taking the locks in proper order.
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Fixes: ab7c7bb6 ("dm: hold suspend_lock while suspending device during device deletion")
      Acked-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2a708cff
  10. 15 9月, 2015 1 次提交
    • M
      dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE · 586b286b
      Mike Snitzer 提交于
      Setting the dm-crypt device's max_segment_size to PAGE_SIZE is an
      unfortunate constraint that is required to avoid the potential for
      exceeding dm-crypt's underlying device's max_segments limits -- due to
      crypt_alloc_buffer() possibly allocating pages for the encryption bio
      that are not as physically contiguous as the original bio.
      
      It is interesting to note that this problem was already fixed back in
      2007 via commit 91e10625 ("dm crypt: use bio_add_page").  But Linux 4.0
      commit cf2f1abf ("dm crypt: don't allocate pages for a partial
      request") regressed dm-crypt back to _not_ using bio_add_page().  But
      given dm-crypt's cpu parallelization changes all depend on commit
      cf2f1abf's abandoning of the more complex io fragments processing that
      dm-crypt previously had we cannot easily go back to using
      bio_add_page().
      
      So all said the cleanest way to resolve this issue is to fix dm-crypt to
      properly constrain the original bios entering dm-crypt so the encryption
      bios that dm-crypt generates from the original bios are always
      compatible with the underlying device's max_segments queue limits.
      
      It should be noted that technically Linux 4.3 does _not_ need this fix
      because of the block core's new late bio-splitting capability.  But, it
      is reasoned, there is little to be gained by having the block core split
      the encrypted bio that is composed of PAGE_SIZE segments.  That said, in
      the future we may revert this change.
      
      Fixes: cf2f1abf ("dm crypt: don't allocate pages for a partial request")
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=104421Suggested-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.0+
      586b286b
  11. 14 9月, 2015 1 次提交
  12. 12 9月, 2015 1 次提交
  13. 01 9月, 2015 16 次提交