1. 06 10月, 2017 1 次提交
    • J
      dm raid: fix incorrect status output at the end of a "recover" process · 41dcf197
      Jonathan Brassow 提交于
      There are three important fields that indicate the overall health and
      status of an array: dev_health, sync_ratio, and sync_action.  They tell
      us the condition of the devices in the array, and the degree to which
      the array is synchronized.
      
      This commit fixes a condition that is reported incorrectly.  When a member
      of the array is being rebuilt or a new device is added, the "recover"
      process is used to synchronize it with the rest of the array.  When the
      process is complete, but the sync thread hasn't yet been reaped, it is
      possible for the state of MD to be:
       mddev->recovery = [ MD_RECOVERY_RUNNING MD_RECOVERY_RECOVER MD_RECOVERY_DONE ]
       curr_resync_completed = <max dev size> (but not MaxSector)
       and all rdevs to be In_sync.
      This causes the 'array_in_sync' output parameter that is passed to
      rs_get_progress() to be computed incorrectly and reported as 'false' --
      or not in-sync.  This in turn causes the dev_health status characters to
      be reported as all 'a', rather than the proper 'A'.
      
      This can cause erroneous output for several seconds at a time when tools
      will want to be checking the condition due to events that are raised at
      the end of a sync process.  Fix this by properly calculating the
      'array_in_sync' return parameter in rs_get_progress().
      
      Also, remove an unnecessary intermediate 'recovery_cp' variable in
      rs_get_progress().
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      41dcf197
  2. 28 9月, 2017 1 次提交
  3. 26 7月, 2017 4 次提交
  4. 30 6月, 2017 1 次提交
  5. 24 6月, 2017 1 次提交
    • H
      dm raid: fix oops on upgrading to extended superblock format · c4d097d1
      Heinz Mauelshagen 提交于
      When a RAID set was created on dm-raid version < 1.9.0 (old RAID
      superblock format), all of the new 1.9.0 members of the superblock are
      uninitialized (zero) -- including the device sectors member needed to
      support shrinking.
      
      All the other accesses to superblock fields new in 1.9.0 were reviewed
      and verified to be properly guarded against invalid use.  The 'sectors'
      member was the only one used when the superblock version is < 1.9.
      
      Don't access the superblock's >= 1.9.0 'sectors' member unconditionally.
      Also add respective comments.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c4d097d1
  6. 09 4月, 2017 1 次提交
  7. 31 3月, 2017 1 次提交
    • D
      dm raid: fix NULL pointer dereference for raid1 without bitmap · 7a0c5c5b
      Dmitry Bilunov 提交于
      Commit 4257e085 ("dm raid: support to change bitmap region size")
      introduced a bitmap resize call during preresume phase. User can create
      a DM device with "raid" target configured as raid1 with no metadata
      devices to hold superblock/bitmap info. It can be achieved using the
      following sequence:
      
        truncate -s 32M /dev/shm/raid-test
        LOOP=$(losetup --show -f /dev/shm/raid-test)
        dmsetup create raid-test-linear0 --table "0 1024 linear $LOOP 0"
        dmsetup create raid-test-linear1 --table "0 1024 linear $LOOP 1024"
        dmsetup create raid-test --table "0 1024 raid raid1 1 2048 2 - /dev/mapper/raid-test-linear0 - /dev/mapper/raid-test-linear1"
      
      This results in the following crash:
      
      [ 4029.110216] device-mapper: raid: Ignoring chunk size parameter for RAID 1
      [ 4029.110217] device-mapper: raid: Choosing default region size of 4MiB
      [ 4029.111349] md/raid1:mdX: active with 2 out of 2 mirrors
      [ 4029.114770] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [ 4029.114802] IP: bitmap_resize+0x25/0x7c0 [md_mod]
      [ 4029.114816] PGD 0
      …
      [ 4029.115059] Hardware name: Aquarius Pro P30 S85 BUY-866/B85M-E, BIOS 2304 05/25/2015
      [ 4029.115079] task: ffff88015cc29a80 task.stack: ffffc90001a5c000
      [ 4029.115097] RIP: 0010:bitmap_resize+0x25/0x7c0 [md_mod]
      [ 4029.115112] RSP: 0018:ffffc90001a5fb68 EFLAGS: 00010246
      [ 4029.115127] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000000
      [ 4029.115146] RDX: 0000000000000000 RSI: 0000000000000400 RDI: 0000000000000000
      [ 4029.115166] RBP: ffffc90001a5fc28 R08: 0000000800000000 R09: 00000008ffffffff
      [ 4029.115185] R10: ffffea0005661600 R11: ffff88015cc29a80 R12: ffff88021231f058
      [ 4029.115204] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      [ 4029.115223] FS:  00007fe73a6b4740(0000) GS:ffff88021ea80000(0000) knlGS:0000000000000000
      [ 4029.115245] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4029.115261] CR2: 0000000000000030 CR3: 0000000159a74000 CR4: 00000000001426e0
      [ 4029.115281] Call Trace:
      [ 4029.115291]  ? raid_iterate_devices+0x63/0x80 [dm_raid]
      [ 4029.115309]  ? dm_table_all_devices_attribute.isra.23+0x41/0x70 [dm_mod]
      [ 4029.115329]  ? dm_table_set_restrictions+0x225/0x2d0 [dm_mod]
      [ 4029.115346]  raid_preresume+0x81/0x2e0 [dm_raid]
      [ 4029.115361]  dm_table_resume_targets+0x47/0xe0 [dm_mod]
      [ 4029.115378]  dm_resume+0xa8/0xd0 [dm_mod]
      [ 4029.115391]  dev_suspend+0x123/0x250 [dm_mod]
      [ 4029.115405]  ? table_load+0x350/0x350 [dm_mod]
      [ 4029.115419]  ctl_ioctl+0x1c2/0x490 [dm_mod]
      [ 4029.115433]  dm_ctl_ioctl+0xe/0x20 [dm_mod]
      [ 4029.115447]  do_vfs_ioctl+0x8d/0x5a0
      [ 4029.115459]  ? ____fput+0x9/0x10
      [ 4029.115470]  ? task_work_run+0x79/0xa0
      [ 4029.115481]  SyS_ioctl+0x3c/0x70
      [ 4029.115493]  entry_SYSCALL_64_fastpath+0x13/0x94
      
      The raid_preresume() function incorrectly assumes that the raid_set has
      a bitmap enabled if RT_FLAG_RS_BITMAP_LOADED is set.  But
      RT_FLAG_RS_BITMAP_LOADED is getting set in __load_dirty_region_bitmap()
      even if there is no bitmap present (and bitmap_load() happily returns 0
      even if a bitmap isn't present).  So the only way forward in the
      near-term is to check if the bitmap is present by seeing if
      mddev->bitmap is not NULL after bitmap_load() has been called.
      
      By doing so the above NULL pointer is avoided.
      
      Fixes: 4257e085 ("dm raid: support to change bitmap region size")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: NDmitry Bilunov <kmeaw@yandex-team.ru>
      Signed-off-by: NAndrey Smetanin <asmetanin@yandex-team.ru>
      Acked-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7a0c5c5b
  8. 28 3月, 2017 1 次提交
    • H
      dm raid: add raid4/5/6 journal write-back support via journal_mode option · 6e53636f
      Heinz Mauelshagen 提交于
      Commit 63c32ed4 ("dm raid: add raid4/5/6 journaling support") added
      journal support to close the raid4/5/6 "write hole" -- in terms of
      writethrough caching.
      
      Introduce a "journal_mode" feature and use the new
      r5c_journal_mode_set() API to add support for switching the journal
      device's cache mode between write-through (the current default) and
      write-back.
      
      NOTE: If the journal device is not layered on resilent storage and it
      fails, write-through mode will cause the "write hole" to reoccur.  But
      if the journal fails while in write-back mode it will cause data loss
      for any dirty cache entries unless resilent storage is used for the
      journal.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6e53636f
  9. 27 3月, 2017 1 次提交
    • H
      dm raid: fix table line argument order in status · 4464e36e
      Heinz Mauelshagen 提交于
      Commit 3a1c1ef2 ("dm raid: enhance status interface and fixup
      takeover/raid0") added new table line arguments and introduced an
      ordering flaw.  The sequence of the raid10_copies and raid10_format
      raid parameters got reversed which causes lvm2 userspace to fail by
      falsely assuming a changed table line.
      
      Sequence those 2 parameters as before so that old lvm2 can function
      properly with new kernels by adjusting the table line output as
      documented in Documentation/device-mapper/dm-raid.txt.
      
      Also, add missing version 1.10.1 highlight to the documention.
      
      Fixes: 3a1c1ef2 ("dm raid: enhance status interface and fixup takeover/raid0")
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4464e36e
  10. 01 3月, 2017 3 次提交
    • M
      dm raid: bump the target version · 2664f3c9
      Mike Snitzer 提交于
      This version bump reflects that the reshape corruption fix (commit
      92a39f6cc "dm raid: fix data corruption on reshape request") is
      present.
      
      Done as a separate fix because the above referenced commit is marked for
      stable and target version bumps in a stable@ fix are a recipe for the
      fix to never get backported to stable@ kernels (because of target
      version number conflicts).
      
      Also, move RESUME_STAY_FROZEN_FLAGS up with the reset the the _FLAGS
      definitions now that we don't need to worry about stable@ conflicts as a
      result of missing context.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2664f3c9
    • H
      dm raid: fix data corruption on reshape request · d36a1954
      Heinz Mauelshagen 提交于
      The lvm2 sequence to manage dm-raid constructor flags that trigger a
      rebuild or a reshape is defined as:
      
      1) load table with flags (e.g. rebuild/delta_disks/data_offset)
      2) clear out the flags in lvm2 metadata
      3) store the lvm2 metadata, reload the table to reset the flags
         previously established during the initial load (1) -- in order to
         prevent repeatedly requesting a rebuild or a reshape on activation
      
      Currently, loading an inactive table with rebuild/reshape flags
      specified will cause dm-raid to rebuild/reshape on resume and thus start
      updating the raid metadata (about the progress).  When the second table
      reload, to reset the flags, occurs the constructor accesses the volatile
      progress state kept in the raid superblocks.  Because the active mapping
      is still processing the rebuild/reshape, that position will be stale by
      the time the device is resumed.
      
      In the reshape case, this causes data corruption by processing already
      reshaped stripes again.  In the rebuild case, it does _not_ cause data
      corruption but instead involves superfluous rebuilds.
      
      Fix by keeping the raid set frozen during the first resume and then
      allow the rebuild/reshape during the second resume.
      
      Fixes: 9dbd1aa3 ("dm raid: add reshaping support to the target")
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.8+
      d36a1954
    • M
      dm raid: fix raid "check" regression due to improper cleanup in raid_message() · ad470472
      Mike Snitzer 提交于
      While cleaning up awkward branching in raid_message() a raid set "check"
      regression was introduced because "check" needs both MD_RECOVERY_SYNC
      and MD_RECOVERY_REQUESTED flags set.
      
      Fix this regression by explicitly setting both flags for the "check"
      case (like is also done for the "repair" case, but redundant set_bit()s
      are perfectly fine because it adds clarity to what is needed in response
      to both messages -- in addition this isn't fast path code).
      
      Fixes: 105db599 ("dm raid: cleanup awkward branching in raid_message() option processing")
      Reported-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ad470472
  11. 25 1月, 2017 6 次提交
    • M
    • H
      977f1a0a
    • H
      dm raid: use read_disk_sb() throughout · e2568465
      Heinz Mauelshagen 提交于
      For consistency, call read_disk_sb() from
      attempt_restore_of_faulty_devices() instead
      of calling sync_page_io() directly.
      
      Explicitly set device to faulty on superblock read error.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e2568465
    • H
      dm raid: add raid4/5/6 journaling support · 63c32ed4
      Heinz Mauelshagen 提交于
      Add md raid4/5/6 journaling support (upstream commit bac624f3 started
      the implementation) which closes the write hole (i.e. non-atomic updates
      to stripes) using a dedicated journal device.
      
      Background:
      raid4/5/6 stripes hold N data payloads per stripe plus one parity raid4/5
      or two raid6 P/Q syndrome payloads in an in-memory stripe cache.
      Parity or P/Q syndromes used to recover any data payloads in case of a disk
      failure are calculated from the N data payloads and need to be updated on the
      different component devices of the raid device.  Those are non-atomic,
      persistent updates.  Hence a crash can cause failure to update all stripe
      payloads persistently and thus cause data loss during stripe recovery.
      This problem gets addressed by writing whole stripe cache entries (together with
      journal metadata) to a persistent journal entry on a dedicated journal device.
      Only if that journal entry is written successfully, the stripe cache entry is
      updated on the component devices of the raid device (i.e. writethrough type).
      In case of a crash, the entry can be recovered from the journal and be written
      again thus ensuring consistent stripe payload suitable to data recovery.
      
      Future dependencies:
      once writeback caching being worked on to compensate for the throughput
      implictions involved with writethrough overhead is supported with journaling
      in upstream, an additional patch based on this one will support it in dm-raid.
      
      Journal resilience related remarks:
      because stripes are recovered from the journal in case of a crash, the
      journal device better be resilient.  Resilience becomes mandatory with
      future writeback support, because loosing the working set in the log
      means data loss as oposed to writethrough, were the loss of the
      journal device 'only' reintroduces the write hole.
      
      Fix comment on data offsets in parse_dev_params() and initialize
      new_data_offset as well.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      63c32ed4
    • H
      dm raid: be prepared to accept arbitrary '- -' tuples · 50c4feb9
      Heinz Mauelshagen 提交于
      During raid set resize checks and setting up the recovery offset in case a raid
      set grows, calculated rd->md.dev_sectors is compared to rs->dev[0].rdev.sectors.
      
      Device 0 may not be defined in case userspace passes in '- -' for it
      (lvm2 doesn't do that so far), thus it's device sectors can't be taken
      authoritatively in this comparison and another valid device must be used
      to retrieve the device size.
      
      Use mddev->dev_sectors in checking for ongoing recovery for the same reason.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      50c4feb9
    • H
      dm raid: fix transient device failure processing · c63ede3b
      Heinz Mauelshagen 提交于
      This fix addresses the following 3 failure scenarios:
      
      1) If a (transiently) inaccessible metadata device is being passed into the
      constructor (e.g. a device tuple '254:4 254:5'), it is processed as if
      '- -' was given.  This erroneously results in a status table line containing
      '- -', which mistakenly differs from what has been passed in.  As a result,
      userspace libdevmapper puts the device tuple seperate from the RAID device
      thus not processing the dependencies properly.
      
      2) False health status char 'A' instead of 'D' is emitted on the status
      status info line for the meta/data device tuple in this metadata device
      failure case.
      
      3) If the metadata device is accessible when passed into the constructor
      but the data device (partially) isn't, that leg may be set faulty by the
      raid personality on access to the (partially) unavailable leg.  Restore
      tried in a second raid device resume on such failed leg (status char 'D')
      fails after the (partial) leg returned.
      
      Fixes for aforementioned failure scenarios:
      
      - don't release passed in devices in the constructor thus allowing the
        status table line to e.g. contain '254:4 254:5' rather than '- -'
      
      - emit device status char 'D' rather than 'A' for the device tuple
        with the failed metadata device on the status info line
      
      - when attempting to restore faulty devices in a second resume, allow the
        device hot remove function to succeed by setting the device to not in-sync
      
      In case userspace intentionally passes '- -' into the constructor to avoid that
      device tuple (e.g. to split off a raid1 leg temporarily for later re-addition),
      the status table line will correctly show '- -' and the status info line will
      provide a '-' device health character for the non-defined device tuple.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c63ede3b
  12. 09 12月, 2016 3 次提交
    • S
      md: separate flags for superblock changes · 2953079c
      Shaohua Li 提交于
      The mddev->flags are used for different purposes. There are a lot of
      places we check/change the flags without masking unrelated flags, we
      could check/change unrelated flags. These usage are most for superblock
      write, so spearate superblock related flags. This should make the code
      clearer and also fix real bugs.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      2953079c
    • H
      dm raid: fix discard support regression · 11e29684
      Heinz Mauelshagen 提交于
      Commit ecbfb9f1 ("dm raid: add raid level takeover support") moved the
      configure_discard_support() call from raid_ctr() to raid_preresume().
      
      Enabling/disabling discard _must_ happen during table load (through the
      .ctr hook).  Fix this regression by moving the
      configure_discard_support() call back to raid_ctr().
      
      Fixes: ecbfb9f1 ("dm raid: add raid level takeover support")
      Cc: stable@vger.kernel.org # 4.8+
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      11e29684
    • H
      dm raid: don't allow "write behind" with raid4/5/6 · affa9d28
      Heinz Mauelshagen 提交于
      Remove CTR_FLAG_MAX_WRITE_BEHIND from raid4/5/6's valid ctr flags.
      
      Only the md raid1 personality supports setting a maximum number
      of "write behind" write IOs on any legs set to "write mostly".
      "write mostly" enhances throughput with slow links/disks.
      
      Technically the "write behind" value is a write intent bitmap
      property only being respected by the raid1 personality.  It allows a
      maximum number of "write behind" writes to any "write mostly" raid1
      mirror legs to be delayed and avoids reads from such legs.
      
      No other MD personalities supported via dm-raid make use of "write
      behind", thus setting this property is superfluous; it wouldn't cause
      harm but it is correct to reject it.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      affa9d28
  13. 21 11月, 2016 1 次提交
    • H
      dm raid: correct error messages on old metadata validation · 453c2a89
      Heinz Mauelshagen 提交于
      When target 1.9.1 gets takeover/reshape requests on devices with old superblock
      format not supporting such conversions and rejects them in super_init_validation(),
      it logs bogus error message (e.g. Reshape when a takeover is requested).
      
      Whilst on it, add messages for disk adding/removing and stripe sectors
      reshape requests, use the newer rs_{takeover,reshape}_requested() API,
      address a raid10 false positive in checking array positions and
      remove rs_set_new() because device members are already set proper.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      453c2a89
  14. 18 10月, 2016 1 次提交
  15. 12 10月, 2016 1 次提交
  16. 17 8月, 2016 4 次提交
    • H
      dm raid: support raid0 with missing metadata devices · 9e7d9367
      Heinz Mauelshagen 提交于
      The raid0 MD personality does not start a raid0 array with any of its
      data devices missing.
      
      dm-raid was removing data/metadata device pairs unconditionally if it
      failed to read a superblock off the respective metadata device of such
      pair, resulting in failure to start arrays with the raid0 personality.
      
      Avoid removing any data/metadata device pairs in case of raid0
      (e.g. lvm2 segment type 'raid0_meta') thus allowing MD to start the
      array.
      
      Also, avoid region size validation for raid0.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      9e7d9367
    • H
      dm raid: enhance attempt_restore_of_faulty_devices() to support more devices · a3c06a38
      Heinz Mauelshagen 提交于
      attempt_restore_of_faulty_devices() is limited to 64 when it should support
      the new maximum of 253 when identifying any failed devices. It clears any
      revivable devices via an MD personality hot remove and add cylce to allow
      for their recovery.
      
      Address by using existing functions to retrieve and update all failed
      devices' bitfield members in the dm raid superblocks on all RAID devices
      and check for any devices to clear in it.
      
      Whilst on it, don't call attempt_restore_of_faulty_devices() for any MD
      personality not providing disk hot add/remove methods (i.e. raid0 now),
      because such personalities don't support reviving of failed disks.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      a3c06a38
    • H
      dm raid: fix restoring of failed devices regression · 31e10a41
      Heinz Mauelshagen 提交于
      'lvchange --refresh RaidLV' causes a mapped device suspend/resume cycle
      aiming at device restore and resync after transient device failures.  This
      failed because flag RT_FLAG_RS_RESUMED was always cleared in the suspend path,
      thus the device restore wasn't performed in the resume path.
      
      Solve by removing RT_FLAG_RS_RESUMED from the suspend path and resume
      unconditionally.  Also, remove superfluous comment from raid_resume().
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      31e10a41
    • H
      dm raid: fix frozen recovery regression · a4423287
      Heinz Mauelshagen 提交于
      On LVM2 conversions via lvconvert(8), the target keeps mapped devices in
      frozen state when requesting RAID devices be resynchronized.  This
      applies to e.g. adding legs to a raid1 device or taking over from raid0
      to raid4 when the rebuild flag's set on the new raid1 legs or the added
      dedicated parity stripe.
      
      Also, fix frozen recovery for reshaping as well.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      a4423287
  17. 04 8月, 2016 2 次提交
  18. 03 8月, 2016 1 次提交
  19. 19 7月, 2016 6 次提交