1. 07 3月, 2018 1 次提交
    • J
      dm raid: fix incorrect sync_ratio when degraded · da1e1488
      Jonathan Brassow 提交于
      Upstream commit 4102d9de ("dm raid: fix rs_get_progress()
      synchronization state/ratio") in combination with commit 7c29744e
      ("dm raid: simplify rs_get_progress()") introduced a regression by
      incorrectly reporting a sync_ratio of 0 for degraded raid sets.  This
      caused lvm2 to fail to repair raid legs automatically.
      
      Fix by identifying the degraded state by checking the MD_RECOVERY_INTR
      flag and returning mddev->recovery_cp in case it is set.
      
      MD sets recovery = [ MD_RECOVERY_RECOVER MD_RECOVERY_INTR
      MD_RECOVERY_NEEDED ] when a RAID member fails.  It then shuts down any
      sync thread that is running and leaves us with all MD_RECOVERY_* flags
      cleared.  The bug occurs if a status is requested in the short time it
      takes to shut down any sync thread and clear the flags, because we were
      keying in on the MD_RECOVERY_NEEDED - understanding it to be the initial
      phase of a “recover” sync thread.  However, this is an incorrect
      interpretation if MD_RECOVERY_INTR is also set.
      
      This also explains why the bug only happened when automatic repair was
      enabled and not a normal ‘manual’ method.  It is impossible to react
      quick enough to hit the problematic window without it being automated.
      
      Fix passes automatic repair tests.
      
      Fixes: 7c29744e ("dm raid: simplify rs_get_progress()")
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      da1e1488
  2. 17 1月, 2018 1 次提交
  3. 14 12月, 2017 5 次提交
  4. 12 12月, 2017 1 次提交
    • S
      md: introduce new personality funciton start() · d5d885fd
      Song Liu 提交于
      In do_md_run(), md threads should not wake up until the array is fully
      initialized in md_run(). However, in raid5_run(), raid5-cache may wake
      up mddev->thread to flush stripes that need to be written back. This
      design doesn't break badly right now. But it could lead to bad bug in
      the future.
      
      This patch tries to resolve this problem by splitting start up work
      into two personality functions, run() and start(). Tasks that do not
      require the md threads should go into run(), while task that require
      the md threads go into start().
      
      r5l_load_log() is moved to raid5_start(), so it is not called until
      the md threads are started in do_md_run().
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      d5d885fd
  5. 08 12月, 2017 11 次提交
  6. 17 11月, 2017 1 次提交
  7. 11 11月, 2017 1 次提交
  8. 02 11月, 2017 1 次提交
    • N
      md: always hold reconfig_mutex when calling mddev_suspend() · 4d5324f7
      NeilBrown 提交于
      Most often mddev_suspend() is called with
      reconfig_mutex held.  Make this a requirement in
      preparation a subsequent patch.  Also require
      reconfig_mutex to be held for mddev_resume(),
      partly for symmetry and partly to guarantee
      no races with incr/decr of mddev->suspend.
      
      Taking the mutex in r5c_disable_writeback_async() is
      a little tricky as this is called from a work queue
      via log->disable_writeback_work, and flush_work()
      is called on that while holding ->reconfig_mutex.
      If the work item hasn't run before flush_work()
      is called, the work function will not be able to
      get the mutex.
      
      So we use mddev_trylock() inside the wait_event() call, and have that
      abort when conf->log is set to NULL, which happens before
      flush_work() is called.
      We wait in mddev->sb_wait and ensure this is woken
      when any of the conditions change.  This requires
      waking mddev->sb_wait in mddev_unlock().  This is only
      like to trigger extra wake_ups of threads that needn't
      be woken when metadata is being written, and that
      doesn't happen often enough that the cost would be
      noticeable.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      4d5324f7
  9. 17 10月, 2017 1 次提交
  10. 06 10月, 2017 1 次提交
    • J
      dm raid: fix incorrect status output at the end of a "recover" process · 41dcf197
      Jonathan Brassow 提交于
      There are three important fields that indicate the overall health and
      status of an array: dev_health, sync_ratio, and sync_action.  They tell
      us the condition of the devices in the array, and the degree to which
      the array is synchronized.
      
      This commit fixes a condition that is reported incorrectly.  When a member
      of the array is being rebuilt or a new device is added, the "recover"
      process is used to synchronize it with the rest of the array.  When the
      process is complete, but the sync thread hasn't yet been reaped, it is
      possible for the state of MD to be:
       mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ]
       curr_resync_completed = <max dev size> (but not MaxSector)
       and all rdevs to be In_sync.
      This causes the 'array_in_sync' output parameter that is passed to
      rs_get_progress() to be computed incorrectly and reported as 'false' --
      or not in-sync.  This in turn causes the dev_health status characters to
      be reported as all 'a', rather than the proper 'A'.
      
      This can cause erroneous output for several seconds at a time when tools
      will want to be checking the condition due to events that are raised at
      the end of a sync process.  Fix this by properly calculating the
      'array_in_sync' return parameter in rs_get_progress().
      
      Also, remove an unnecessary intermediate 'recovery_cp' variable in
      rs_get_progress().
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      41dcf197
  11. 28 9月, 2017 1 次提交
  12. 26 7月, 2017 4 次提交
  13. 30 6月, 2017 1 次提交
  14. 24 6月, 2017 1 次提交
    • H
      dm raid: fix oops on upgrading to extended superblock format · c4d097d1
      Heinz Mauelshagen 提交于
      When a RAID set was created on dm-raid version < 1.9.0 (old RAID
      superblock format), all of the new 1.9.0 members of the superblock are
      uninitialized (zero) -- including the device sectors member needed to
      support shrinking.
      
      All the other accesses to superblock fields new in 1.9.0 were reviewed
      and verified to be properly guarded against invalid use.  The 'sectors'
      member was the only one used when the superblock version is < 1.9.
      
      Don't access the superblock's >= 1.9.0 'sectors' member unconditionally.
      Also add respective comments.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c4d097d1
  15. 09 4月, 2017 1 次提交
  16. 31 3月, 2017 1 次提交
    • D
      dm raid: fix NULL pointer dereference for raid1 without bitmap · 7a0c5c5b
      Dmitry Bilunov 提交于
      Commit 4257e085 ("dm raid: support to change bitmap region size")
      introduced a bitmap resize call during preresume phase. User can create
      a DM device with "raid" target configured as raid1 with no metadata
      devices to hold superblock/bitmap info. It can be achieved using the
      following sequence:
      
        truncate -s 32M /dev/shm/raid-test
        LOOP=$(losetup --show -f /dev/shm/raid-test)
        dmsetup create raid-test-linear0 --table "0 1024 linear $LOOP 0"
        dmsetup create raid-test-linear1 --table "0 1024 linear $LOOP 1024"
        dmsetup create raid-test --table "0 1024 raid raid1 1 2048 2 - /dev/mapper/raid-test-linear0 - /dev/mapper/raid-test-linear1"
      
      This results in the following crash:
      
      [ 4029.110216] device-mapper: raid: Ignoring chunk size parameter for RAID 1
      [ 4029.110217] device-mapper: raid: Choosing default region size of 4MiB
      [ 4029.111349] md/raid1:mdX: active with 2 out of 2 mirrors
      [ 4029.114770] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [ 4029.114802] IP: bitmap_resize+0x25/0x7c0 [md_mod]
      [ 4029.114816] PGD 0
      …
      [ 4029.115059] Hardware name: Aquarius Pro P30 S85 BUY-866/B85M-E, BIOS 2304 05/25/2015
      [ 4029.115079] task: ffff88015cc29a80 task.stack: ffffc90001a5c000
      [ 4029.115097] RIP: 0010:bitmap_resize+0x25/0x7c0 [md_mod]
      [ 4029.115112] RSP: 0018:ffffc90001a5fb68 EFLAGS: 00010246
      [ 4029.115127] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000000
      [ 4029.115146] RDX: 0000000000000000 RSI: 0000000000000400 RDI: 0000000000000000
      [ 4029.115166] RBP: ffffc90001a5fc28 R08: 0000000800000000 R09: 00000008ffffffff
      [ 4029.115185] R10: ffffea0005661600 R11: ffff88015cc29a80 R12: ffff88021231f058
      [ 4029.115204] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [ 4029.115223] FS:  00007fe73a6b4740(0000) GS:ffff88021ea80000(0000) knlGS:0000000000000000
      [ 4029.115245] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4029.115261] CR2: 0000000000000030 CR3: 0000000159a74000 CR4: 00000000001426e0
      [ 4029.115281] Call Trace:
      [ 4029.115291]  ? raid_iterate_devices+0x63/0x80 [dm_raid]
      [ 4029.115309]  ? dm_table_all_devices_attribute.isra.23+0x41/0x70 [dm_mod]
      [ 4029.115329]  ? dm_table_set_restrictions+0x225/0x2d0 [dm_mod]
      [ 4029.115346]  raid_preresume+0x81/0x2e0 [dm_raid]
      [ 4029.115361]  dm_table_resume_targets+0x47/0xe0 [dm_mod]
      [ 4029.115378]  dm_resume+0xa8/0xd0 [dm_mod]
      [ 4029.115391]  dev_suspend+0x123/0x250 [dm_mod]
      [ 4029.115405]  ? table_load+0x350/0x350 [dm_mod]
      [ 4029.115419]  ctl_ioctl+0x1c2/0x490 [dm_mod]
      [ 4029.115433]  dm_ctl_ioctl+0xe/0x20 [dm_mod]
      [ 4029.115447]  do_vfs_ioctl+0x8d/0x5a0
      [ 4029.115459]  ? ____fput+0x9/0x10
      [ 4029.115470]  ? task_work_run+0x79/0xa0
      [ 4029.115481]  SyS_ioctl+0x3c/0x70
      [ 4029.115493]  entry_SYSCALL_64_fastpath+0x13/0x94
      
      The raid_preresume() function incorrectly assumes that the raid_set has
      a bitmap enabled if RT_FLAG_RS_BITMAP_LOADED is set.  But
      RT_FLAG_RS_BITMAP_LOADED is getting set in __load_dirty_region_bitmap()
      even if there is no bitmap present (and bitmap_load() happily returns 0
      even if a bitmap isn't present).  So the only way forward in the
      near-term is to check if the bitmap is present by seeing if
      mddev->bitmap is not NULL after bitmap_load() has been called.
      
      By doing so the above NULL pointer is avoided.
      
      Fixes: 4257e085 ("dm raid: support to change bitmap region size")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: NDmitry Bilunov <kmeaw@yandex-team.ru>
      Signed-off-by: NAndrey Smetanin <asmetanin@yandex-team.ru>
      Acked-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7a0c5c5b
  17. 28 3月, 2017 1 次提交
    • H
      dm raid: add raid4/5/6 journal write-back support via journal_mode option · 6e53636f
      Heinz Mauelshagen 提交于
      Commit 63c32ed4 ("dm raid: add raid4/5/6 journaling support") added
      journal support to close the raid4/5/6 "write hole" -- in terms of
      writethrough caching.
      
      Introduce a "journal_mode" feature and use the new
      r5c_journal_mode_set() API to add support for switching the journal
      device's cache mode between write-through (the current default) and
      write-back.
      
      NOTE: If the journal device is not layered on resilent storage and it
      fails, write-through mode will cause the "write hole" to reoccur.  But
      if the journal fails while in write-back mode it will cause data loss
      for any dirty cache entries unless resilent storage is used for the
      journal.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6e53636f
  18. 27 3月, 2017 1 次提交
    • H
      dm raid: fix table line argument order in status · 4464e36e
      Heinz Mauelshagen 提交于
      Commit 3a1c1ef2 ("dm raid: enhance status interface and fixup
      takeover/raid0") added new table line arguments and introduced an
      ordering flaw.  The sequence of the raid10_copies and raid10_format
      raid parameters got reversed which causes lvm2 userspace to fail by
      falsely assuming a changed table line.
      
      Sequence those 2 parameters as before so that old lvm2 can function
      properly with new kernels by adjusting the table line output as
      documented in Documentation/device-mapper/dm-raid.txt.
      
      Also, add missing version 1.10.1 highlight to the documention.
      
      Fixes: 3a1c1ef2 ("dm raid: enhance status interface and fixup takeover/raid0")
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4464e36e
  19. 01 3月, 2017 3 次提交
    • M
      dm raid: bump the target version · 2664f3c9
      Mike Snitzer 提交于
      This version bump reflects that the reshape corruption fix (commit
      92a39f6cc "dm raid: fix data corruption on reshape request") is
      present.
      
      Done as a separate fix because the above referenced commit is marked for
      stable and target version bumps in a stable@ fix are a recipe for the
      fix to never get backported to stable@ kernels (because of target
      version number conflicts).
      
      Also, move RESUME_STAY_FROZEN_FLAGS up with the reset the the _FLAGS
      definitions now that we don't need to worry about stable@ conflicts as a
      result of missing context.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2664f3c9
    • H
      dm raid: fix data corruption on reshape request · d36a1954
      Heinz Mauelshagen 提交于
      The lvm2 sequence to manage dm-raid constructor flags that trigger a
      rebuild or a reshape is defined as:
      
      1) load table with flags (e.g. rebuild/delta_disks/data_offset)
      2) clear out the flags in lvm2 metadata
      3) store the lvm2 metadata, reload the table to reset the flags
         previously established during the initial load (1) -- in order to
         prevent repeatedly requesting a rebuild or a reshape on activation
      
      Currently, loading an inactive table with rebuild/reshape flags
      specified will cause dm-raid to rebuild/reshape on resume and thus start
      updating the raid metadata (about the progress).  When the second table
      reload, to reset the flags, occurs the constructor accesses the volatile
      progress state kept in the raid superblocks.  Because the active mapping
      is still processing the rebuild/reshape, that position will be stale by
      the time the device is resumed.
      
      In the reshape case, this causes data corruption by processing already
      reshaped stripes again.  In the rebuild case, it does _not_ cause data
      corruption but instead involves superfluous rebuilds.
      
      Fix by keeping the raid set frozen during the first resume and then
      allow the rebuild/reshape during the second resume.
      
      Fixes: 9dbd1aa3 ("dm raid: add reshaping support to the target")
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.8+
      d36a1954
    • M
      dm raid: fix raid "check" regression due to improper cleanup in raid_message() · ad470472
      Mike Snitzer 提交于
      While cleaning up awkward branching in raid_message() a raid set "check"
      regression was introduced because "check" needs both MD_RECOVERY_SYNC
      and MD_RECOVERY_REQUESTED flags set.
      
      Fix this regression by explicitly setting both flags for the "check"
      case (like is also done for the "repair" case, but redundant set_bit()s
      are perfectly fine because it adds clarity to what is needed in response
      to both messages -- in addition this isn't fast path code).
      
      Fixes: 105db599 ("dm raid: cleanup awkward branching in raid_message() option processing")
      Reported-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ad470472
  20. 25 1月, 2017 2 次提交