1. 26 2月, 2014 1 次提交
  2. 25 2月, 2014 1 次提交
    • M
      dm thin: fix the error path for the thin device constructor · 1acacc07
      Mike Snitzer 提交于
      dm_pool_close_thin_device() must be called if dm_set_target_max_io_len()
      fails in thin_ctr().  Otherwise __pool_destroy() will fail because the
      pool will still have an open thin device:
      
       device-mapper: thin metadata: attempt to close pmd when 1 device(s) are still open
       device-mapper: thin: __pool_destroy: dm_pool_metadata_close() failed.
      
      Also, must establish error code if failing thin_ctr() because the pool
      is in fail_io mode.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      1acacc07
  3. 18 2月, 2014 5 次提交
    • M
      dm raid1: fix immutable biovec related BUG when retrying read bio · f3a44fe0
      Mikulas Patocka 提交于
      When restoring bi_end_io, increase bi_remaining before retrying the bio
      to avoid BUG_ON(atomic_read(&bio->bi_remaining) <= 0) in bio_endio().
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f3a44fe0
    • M
      dm io: fix I/O to multiple destinations · d73f9907
      Mikulas Patocka 提交于
      Commit 003b5c57 ("block: Convert drivers
      to immutable biovecs") broke dm-mirror due to dm-io breakage.
      
      dm-io had three possible iterators (DM_IO_PAGE_LIST, DM_IO_BVEC,
      DM_IO_VMA) that iterate over pages where the I/O should be performed.
      
      The switch to immutable biovecs changed the DM_IO_BVEC iterator to
      DM_IO_BIO.  Before this change the iterator stored the pointer to a bio
      vector in the dpages structure.  The iterator incremented the pointer in
      the dpages structure as it advanced over the pages.  After the immutable
      biovecs change, the DM_IO_BIO iterator stores a pointer to the bio in
      the dpages structure and uses bio_advance to change the bio as it
      advances.
      
      The problem is that the function dispatch_io stores the content of the
      dpages structure into the variable old_pages and restores it before
      issuing I/O to each of the devices.  Before the change, the statement
      "*dp = old_pages;" restored the iterator to its starting position.
      After the change, struct dpages holds a pointer to the bio, thus the
      statement "*dp = old_pages;" doesn't restore the iterator.
      
      Consequently, in the context of dm-mirror: only the first mirror leg is
      written correctly, the kernel locks up when trying to write the other
      mirror legs because the number of sectors to write in the where->count
      variable doesn't match the number of sectors returned by the iterator.
      
      This patch fixes the bug by partially reverting the original patch - it
      changes the code so that struct dpages holds a pointer to the bio vector,
      so that the statement "*dp = old_pages;" restores the iterator correctly.
      
      The field "context_u" holds the offset from the beginning of the current
      bio vector entry, just like the "bio->bi_iter.bi_bvec_done" field.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d73f9907
    • M
      dm thin: avoid metadata commit if a pool's thin devices haven't changed · 4d1662a3
      Mike Snitzer 提交于
      Commit 905e51b3 ("dm thin: commit outstanding data every second")
      introduced a periodic commit.  This commit occurs regardless of whether
      any thin devices have made changes.
      
      Fix the periodic commit to check if any of a pool's thin devices have
      changed using dm_pool_changed_this_transaction().
      Reported-by: NAlexander Larsson <alexl@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      4d1662a3
    • M
      dm cache: do not add migration to completed list before unhooking bio · 80ae49aa
      Mike Snitzer 提交于
      When completing an overwrite bio, in overwrite_endio(), the associated
      migration should not be added to the 'completed_migrations' until the
      bio's fields are restored with dm_unhook_bio().
      
      Otherwise, do_worker() can race to process 'completed_migrations' before
      dm_unhook_bio() -- so the bio's bi_end_io is incorrect.  This is
      unlikely to cause any problems given the current code but should be
      fixed on the basis of correctness.
      
      Also, the cache's spinlock only needs to be held when manipulating the
      'completed_migrations' list -- other changes don't need protection.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      80ae49aa
    • M
      dm cache: move hook_info into common portion of per_bio_data structure · c6eda5e8
      Mike Snitzer 提交于
      Commit c9d28d5d ("dm cache: promotion optimisation for writes")
      incorrectly placed the 'hook_info' member in the writethrough-only
      portion of the per_bio_data structure.
      
      Given that the overwrite optimization may be used for writeback the
      'hook_info' member must be placed above the 'cache' member of the
      per_bio_data structure.  Any members above 'cache' are available from
      both writeback and writethrough modes' per_bio_data structure.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org # 3.13+
      c6eda5e8
  4. 13 2月, 2014 1 次提交
    • O
      md/raid5: Fix CPU hotplug callback registration · 789b5e03
      Oleg Nesterov 提交于
      Subsystems that want to register CPU hotplug callbacks, as well as perform
      initialization for the CPUs that are already online, often do it as shown
      below:
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	put_online_cpus();
      
      This is wrong, since it is prone to ABBA deadlocks involving the
      cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
      with CPU hotplug operations).
      
      Interestingly, the raid5 code can actually prevent double initialization and
      hence can use the following simplified form of callback registration:
      
      	register_cpu_notifier(&foobar_cpu_notifier);
      
      	get_online_cpus();
      
      	for_each_online_cpu(cpu)
      		init_cpu(cpu);
      
      	put_online_cpus();
      
      A hotplug operation that occurs between registering the notifier and calling
      get_online_cpus(), won't disrupt anything, because the code takes care to
      perform the memory allocations only once.
      
      So reorganize the code in raid5 this way to fix the deadlock with callback
      registration.
      
      Cc: linux-raid@vger.kernel.org
      Cc: stable@vger.kernel.org (v2.6.32+)
      Fixes: 36d1c647Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      [Srivatsa: Fixed the unregister_cpu_notifier() deadlock, added the
      free_scratch_buffer() helper to condense code further and wrote the changelog.]
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      789b5e03
  5. 11 2月, 2014 1 次提交
  6. 05 2月, 2014 1 次提交
    • N
      md/raid1: restore ability for check and repair to fix read errors. · 1877db75
      NeilBrown 提交于
      commit 30bc9b53
          md/raid1: fix bio handling problems in process_checks()
      
      Move the bio_reset() to a point before where BIO_UPTODATE is checked,
      so that check now always report that the bio is uptodate, even if it is not.
      
      This causes process_check() to sometimes treat read-errors as
      successful matches so the good data isn't written out.
      
      This patch preserves the flag until it is needed.
      
      Bug was introduced in 3.11, but backported to 3.10-stable (as it fixed
      an even worse bug).  So suitable for any -stable since 3.10.
      Reported-and-tested-by: NMichael Tokarev <mjt@tls.msk.ru>
      Cc: stable@vger.kernel.org (3.10+)
      Fixed: 30bc9b53Signed-off-by: NNeilBrown <neilb@suse.de>
      1877db75
  7. 30 1月, 2014 3 次提交
  8. 22 1月, 2014 3 次提交
    • D
      dm log userspace: allow mark requests to piggyback on flush requests · 5066a4df
      Dongmao Zhang 提交于
      In the cluster evironment, cluster write has poor performance because
      userspace_flush() has to contact a userspace program (cmirrord) for
      clear/mark/flush requests.  But both mark and flush requests require
      cmirrord to communicate the message to all the cluster nodes for each
      flush call.  This behaviour is really slow.
      
      To address this we now merge mark and flush requests together to reduce
      the kernel-userspace-kernel time.  We allow a new directive,
      "integrated_flush" that can be used to instruct the kernel log code to
      combine flush and mark requests when directed by userspace.  If not
      directed by userspace (due to an older version of the userspace code
      perhaps), the kernel will function as it did previously - preserving
      backwards compatibility.  Additionally, flush requests are performed
      lazily when only clear requests exist.
      Signed-off-by: NDongmao Zhang <dmzhang@suse.com>
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5066a4df
    • N
      md/raid5: close recently introduced race in stripe_head management. · 7da9d450
      NeilBrown 提交于
      As release_stripe and __release_stripe decrement ->count and then
      manipulate ->lru both under ->device_lock, it is important that
      get_active_stripe() increments ->count and clears ->lru also under
      ->device_lock.
      
      However we currently list_del_init ->lru under the lock, but increment
      the ->count outside the lock.  This can lead to races and list
      corruption.
      
      So move the atomic_inc(&sh->count) up inside the ->device_lock
      protected region.
      
      Note that we still increment ->count without device lock in the case
      where get_free_stripe() was called, and in fact don't take
      ->device_lock at all in that path.
      This is safe because if the stripe_head can be found by
      get_free_stripe, then the hash lock assures us the no-one else could
      possibly be calling release_stripe() at the same time.
      
      Fixes: 566c09c5
      Cc: stable@vger.kernel.org (3.13)
      Reported-and-tested-by: NIan Kumlien <ian.kumlien@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7da9d450
    • J
      dm space map metadata: fix bug in resizing of thin metadata · fca02843
      Joe Thornber 提交于
      This bug was introduced in commit 7e664b3d ("dm space map metadata:
      fix extending the space map").
      
      When extending a dm-thin metadata volume we:
      
      - Switch the space map into a simple bootstrap mode, which allocates
        all space linearly from the newly added space.
      - Add new bitmap entries for the new space
      - Increment the reference counts for those newly allocated bitmap
        entries
      - Commit changes to disk
      - Switch back out of bootstrap mode.
      
      But, the disk commit may allocate space itself, if so this fact will be
      lost when switching out of bootstrap mode.
      
      The bug exhibited itself as an error when the bitmap_root, with an
      erroneous ref count of 0, was subsequently decremented as part of a
      later disk commit.  This would cause the disk commit to fail, and thinp
      to enter read_only mode.  The metadata was not damaged (thin_check
      passed).
      
      The fix is to put the increments + commit into a loop, running until
      the commit has not allocated extra space.  In practise this loop only
      runs twice.
      
      With this fix the following device mapper testsuite test passes:
       dmtest run --suite thin-provisioning -n thin_remove_works_after_resize
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # depends on commit 7e664b3d
      fca02843
  9. 17 1月, 2014 1 次提交
    • M
      dm cache: add policy name to status output · 2e68c4e6
      Mike Snitzer 提交于
      The cache's policy may have been established using the "default" alias,
      which is currently the "mq" policy but the default policy may change in
      the future.  It is useful to know exactly which policy is being used.
      
      Add a 'real' member to the dm_cache_policy_type structure and have the
      "default" dm_cache_policy_type point to the real "mq"
      dm_cache_policy_type.  Update dm_cache_policy_get_name() to check if
      real is set, if so report the name of the real policy (not the alias).
      Requested-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2e68c4e6
  10. 16 1月, 2014 3 次提交
    • M
      dm thin: fix pool feature parsing · 74aa45c3
      Mike Snitzer 提交于
      Commit 787a996c ("dm thin: add error_if_no_space feature")
      mistakenly forgot to increase the number of feature args supported.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      74aa45c3
    • N
      md/raid5: fix long-standing problem with bitmap handling on write failure. · 9f97e4b1
      NeilBrown 提交于
      Before a write starts we set a bit in the write-intent bitmap.
      When the write completes we clear that bit if the write was successful
      to all devices.  However if the write wasn't fully successful we
      should not clear the bit.  If the faulty drive is subsequently
      re-added, the fact that the bit is still set ensure that we will
      re-write the data that is missing.
      
      This logic is mediated by the STRIPE_DEGRADED flag - we only clear the
      bitmap bit when this flag is not set.
      Currently we correctly set the flag if a write starts when some
      devices are failed or missing.  But we do *not* set the flag if some
      device failed during the write attempt.
      This is wrong and can result in clearing the bit inappropriately.
      
      So: set the flag when a write fails.
      
      This bug has been present since bitmaps were introduces, so the fix is
      suitable for any -stable kernel.
      Reported-by: NEthan Wilson <ethan.wilson@shiftmail.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9f97e4b1
    • N
      md: check command validity early in md_ioctl(). · cb335f88
      Nicolas Schichan 提交于
      Verify that the cmd parameter passed to md_ioctl() is valid before
      doing anything.
      
      This fixes mddev->hold_active being set to 0 when an invalid ioctl
      command is passed to md_ioctl() before the array has been configured.
      
      Clearing mddev->hold_active in that case can lead to a livelock
      situation when an invalid ioctl number is given to md_ioctl() by a
      process when the mddev is currently being opened by another process:
      
      Process 1				Process 2
      ---------				---------
      
      md_alloc()
        mddev_find()
        -> returns a new mddev with
           hold_active == UNTIL_IOCTL
        add_disk()
        -> sends KOBJ_ADD uevent
      
      					(sees KOBJ_ADD uevent for device)
                          			md_open()
                          			md_ioctl(INVALID_IOCTL)
                          			-> returns ENODEV and clears
                             			   mddev->hold_active
                          			md_release()
                            			md_put()
                            			-> deletes the mddev as
                               		   hold_active is 0
      
      md_open()
        mddev_find()
        -> returns a newly
          allocated mddev with
          mddev->gendisk == NULL
      -> returns with ERESTARTSYS
         (kernel restarts the open syscall)
      Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cb335f88
  11. 15 1月, 2014 5 次提交
    • M
      dm sysfs: fix a module unload race · 2995fa78
      Mikulas Patocka 提交于
      This reverts commit be35f486 ("dm: wait until embedded kobject is
      released before destroying a device") and provides an improved fix.
      
      The kobject release code that calls the completion must be placed in a
      non-module file, otherwise there is a module unload race (if the process
      calling dm_kobject_release is preempted and the DM module unloaded after
      the completion is triggered, but before dm_kobject_release returns).
      
      To fix this race, this patch moves the completion code to dm-builtin.c
      which is always compiled directly into the kernel if BLK_DEV_DM is
      selected.
      
      The patch introduces a new dm_kobject_holder structure, its purpose is
      to keep the completion and kobject in one place, so that it can be
      accessed from non-module code without the need to export the layout of
      struct mapped_device to that code.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2995fa78
    • M
      dm snapshot: use dm-bufio prefetch · 55b082e6
      Mikulas Patocka 提交于
      This patch modifies dm-snapshot so that it prefetches the buffers when
      loading the exceptions.
      
      The number of buffers read ahead is specified in the DM_PREFETCH_CHUNKS
      macro.  The current value for DM_PREFETCH_CHUNKS (12) was found to
      provide the best performance on a single 15k SCSI spindle.  In the
      future we may modify this default or make it configurable.
      
      Also, introduce the function dm_bufio_set_minimum_buffers to setup
      bufio's number of internal buffers before freeing happens.  dm-bufio may
      hold more buffers if enough memory is available.  There is no guarantee
      that the specified number of buffers will be available - if you need a
      guarantee, use the argument reserved_buffers for
      dm_bufio_client_create.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      55b082e6
    • M
      dm snapshot: use dm-bufio · 55494bf2
      Mikulas Patocka 提交于
      Use dm-bufio for initial loading of the exceptions.
      Introduce a new function dm_bufio_forget that frees the given buffer.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      55494bf2
    • M
      dm snapshot: prepare for switch to using dm-bufio · 2cadabd5
      Mikulas Patocka 提交于
      Change the functions get_exception, read_exception and insert_exceptions
      so that ps->area is passed as an argument.
      
      This patch doesn't change any functionality, but it refactors the code
      to allow for a cleaner switch over to using dm-bufio.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2cadabd5
    • M
      dm snapshot: use GFP_KERNEL when initializing exceptions · 119bc547
      Mikulas Patocka 提交于
      The list of initial exceptions is loaded in the target constructor.  We
      are allowed to allocate memory with GFP_KERNEL at this point.  So,
      change alloc_completed_exception to use GFP_KERNEL when being called
      from the constructor.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      119bc547
  12. 14 1月, 2014 10 次提交
    • N
      md: ensure metadata is writen after raid level change. · 830778a1
      NeilBrown 提交于
      level_store() currently does not make sure the metadata is
      updates to reflect the new raid level.  It simply sets MD_CHANGE_DEVS.
      
      Any level with a ->thread will quickly notice this and update the
      metadata.  However RAID0 and Linear do not have a thread so no
      metadata update happens until the array is stopped.  At that point the
      metadata is written.
      
      This is later that we would like.  While the delay doesn't risk any
      data it can cause confusion.  So if there is no md thread, immediately
      update the metadata after a level change.
      Reported-by: NRichard Michael <rmichael@edgeofthenet.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      830778a1
    • N
      md/raid10: avoid fullsync when not necessary. · 0b59bb64
      NeilBrown 提交于
      This is the raid10 equivalent of
      
      commit 4f0a5e01
          MD RAID1: Further conditionalize 'fullsync'
      
      If a device in a newly assembled array is not fully recovered we
      currently do a fully resync by don't need to.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0b59bb64
    • N
      md: allow a partially recovered device to be hot-added to an array. · 7eb41885
      NeilBrown 提交于
      When adding a new device into an array it is normally important to
      clear any stale data from ->recovery_offset else the new device may
      not be recovered properly.
      
      However when re-adding a device which is known to be nearly in-sync,
      this is not needed and can be detrimental.  The (bitmap-based)
      resync will still happen, and further recovery is only needed from
      where-ever it was already up to.
      
      So if save_raid_disk is set, signifying a re-add, don't clear
      ->recovery_offset.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7eb41885
    • N
      md: Change handling of save_raid_disk and metadata update during recovery. · f466722c
      NeilBrown 提交于
      Since commit d70ed2e4
         MD: Allow restarting an interrupted incremental recovery.
      
      we don't write out the metadata to devices while they are recovering.
      This had a good reason, but has unfortunate consequences.  This patch
      changes things to make them work better.
      
      At issue is what happens if the array is shut down while a recovery is
      happening, particularly a bitmap-guided recovery.
      Ideally the recovery should pick up where it left off.
      However the metadata cannot represent the state "A recovery is in
      process which is guided by the bitmap".
      
      Before the above mentioned commit, we wrote metadata to the device
      which said "this is being recovered and it is up to <here>".  So after
      a restart, a full recovery (not bitmap-guided) would happen from
      where-ever it was up to.
      
      After the commit the metadata wasn't updated so it still said "This
      device is fully in sync with <this> event count".  That leads to a
      bitmap-based recovery following the whole bitmap, which should be a
      lot less work than a full recovery from some starting point.  So this
      was an improvement.
      
      However updates some metadata but not all leads to other problems.
      In particular, the metadata written to the fully-up-to-date device
      record that the array has all devices present (even though some are
      recovering).  So on restart, mdadm wants to find all devices and
      expects them to have current event counts.
      Obviously it doesn't (some have old event counts) so (when assembling
      with --incremental) it waits indefinitely for the rest of the expected
      devices.
      
      It really is wrong to not update all the metadata together.  Do that
      is bound to cause confusion.
      Instead, we should make it possible to record the truth in the
      metadata.  i.e. we need to be able to record that a device is being
      recovered based on the bitmap.
      We already have a Feature flag to say that recovery is happening.  We
      now add another one to say that it is a bitmap-based recovery.
      
      With this we can remove the code that disables the write-out of
      metadata on some devices.
      
      So this patch:
       - moves the setting of 'saved_raid_disk' from add_new_disk to
         the validate_super methods.  This makes sure it is always set
         properly, both when adding a new device to an array, and when
         assembling an array from a collection of devices.
       - Adds a metadata flag MD_FEATURE_RECOVERY_BITMAP which is only
         used if MD_FEATURE_RECOVERY_OFFSET is set, and record that a
         bitmap-based recovery is allowed.
         This is only present in v1.x metadata. v0.90 doesn't support
         devices which are in the middle of recovery at all.
       - Only skips writing metadata to Faulty devices.
      
       - Also allows rdev state to be set to "-insync" via sysfs.
         This can be used for external-metadata arrays.  When the
         'role' is set the device is assumed to be in-sync.  If, after
         setting the role, we set the state to "-insync", the role is
         moved to saved_raid_disk which effectively says the device is
         partly in-sync with that slot and needs a bitmap recovery.
      
      Cc: Andrei Warkentin <andreiw@vmware.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f466722c
    • N
      md: fix problem when adding device to read-only array with bitmap. · 8313b8e5
      NeilBrown 提交于
      If an array is started degraded, and then the missing device
      is found it can be re-added and a minimal bitmap-based recovery
      will bring it fully up-to-date.
      
      If the array is read-only a recovery would not be allowed.
      But also if the array is read-only and the missing device was
      present very recently, then there could be no need for any
      recovery at all, so we simply include the device in the read-only
      array without any recovery.
      
      However... if the missing device was removed a little longer ago
      it could be missing some updates, but if a bitmap is present it will
      be conditionally accepted pending a bitmap-based update.  We don't
      currently detect this case properly and will include that old
      device into the read-only array with no recovery even though it really
      needs a recovery.
      
      This patch keeps track of whether a bitmap-based-recovery is really
      needed or not in the new Bitmap_sync rdev flag.  If that is set,
      then the device will not be added to a read-only array.
      
      Cc: Andrei Warkentin <andreiw@vmware.com>
      Fixes: d70ed2e4
      Cc: stable@vger.kernel.org (3.2+)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8313b8e5
    • N
      md/raid10: fix bug when raid10 recovery fails to recover a block. · e8b84915
      NeilBrown 提交于
      commit e875ecea
          md/raid10 record bad blocks as needed during recovery.
      
      added code to the "cannot recover this block" path to record a bad
      block rather than fail the whole recovery.
      Unfortunately this new case was placed *after* r10bio was freed rather
      than *before*, yet it still uses r10bio.
      This is will crash with a null dereference.
      
      So move the freeing of r10bio down where it is safe.
      
      Cc: stable@vger.kernel.org (v3.1+)
      Fixes: e875eceaReported-by: NDamian Nowak <spam@nowaker.net>
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181Signed-off-by: NNeilBrown <neilb@suse.de>
      e8b84915
    • N
      md/raid5: fix a recently broken BUG_ON(). · 5af9bef7
      NeilBrown 提交于
      commit 6d183de4
          md/raid5: fix newly-broken locking in get_active_stripe.
      
      simplified a BUG_ON, but removed too much so now it sometimes fires
      when it shouldn't.
      
      When the STRIPE_EXPANDING flag is set, the stripe_head might be on a
      special list while multiple stripe_heads are collected, or it might
      not be on any list, even a 'free' list when the refcount is zero.  As
      long as STRIPE_EXPANDING is set, it will be found and added back to a
      list eventually.
      
      So both of the BUG_ONs which test for the ->lru being empty or not
      need to avoid the case where STRIPE_EXPANDING is set.
      
      The patch which broke this was marked for -stable, so this patch needs
      to be applied to any branch that received 6d183de4
      
      Fixes: 6d183de4
      Cc: stable@vger.kernel.org (any release to which above was applied)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5af9bef7
    • N
      md/raid1: fix request counting bug in new 'barrier' code. · 41a336e0
      NeilBrown 提交于
      The new iobarrier implementation in raid1 (which keeps normal writes
      and resync activity separate) counts every request what is not before
      the current resync point in either next_window_requests or
      current_window_requests.
      It flags that the request is counted by setting ->start_next_window.
      
      allow_barrier follows this model exactly and decrements one of the
      *_window_requests if and only if ->start_next_window is set.
      
      However wait_barrier(), which increments *_window_requests uses a
      slightly different test for setting -.start_next_window (which is set
      from the return value of this function).
      So there is a possibility of the counts getting out of sync, and this
      leads to the resync hanging.
      
      So change wait_barrier() to return a non-zero value in exactly the
      same cases that it increments *_window_requests.
      
      But was introduced in 3.13-rc1.
      Reported-by: NBruno Wolff III <bruno@wolff.to>
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=68061
      Fixes: 79ef3a8a
      Cc: majianpeng <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      41a336e0
    • N
      md/raid10: fix two bugs in handling of known-bad-blocks. · b50c259e
      NeilBrown 提交于
      If we discover a bad block when reading we split the request and
      potentially read some of it from a different device.
      
      The code path of this has two bugs in RAID10.
      1/ we get a spin_lock with _irq, but unlock without _irq!!
      2/ The calculation of 'sectors_handled' is wrong, as can be clearly
         seen by comparison with raid1.c
      
      This leads to at least 2 warnings and a probable crash is a RAID10
      ever had known bad blocks.
      
      Cc: stable@vger.kernel.org (v3.1+)
      Fixes: 856e08e2Reported-by: NDamian Nowak <spam@nowaker.net>
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181Signed-off-by: NNeilBrown <neilb@suse.de>
      b50c259e
    • N
      md/raid5: Fix possible confusion when multiple write errors occur. · 1cc03eb9
      NeilBrown 提交于
      commit 5d8c71f9
          md: raid5 crash during degradation
      
      Fixed a crash in an overly simplistic way which could leave
      R5_WriteError or R5_MadeGood set in the stripe cache for devices
      for which it is no longer relevant.
      When those devices are removed and spares added the flags are still
      set and can cause incorrect behaviour.
      
      commit 14a75d3e
          md/raid5: preferentially read from replacement device if possible.
      
      Fixed the same bug if a more effective way, so we can now revert
      the original commit.
      Reported-and-tested-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
      Cc: stable@vger.kernel.org (3.2+ - 3.2 will need a different fix though)
      Fixes: 5d8c71f9Signed-off-by: NNeilBrown <neilb@suse.de>
      1cc03eb9
  13. 13 1月, 2014 1 次提交
  14. 10 1月, 2014 2 次提交
    • M
      dm cache: add block sizes and total cache blocks to status output · 6a388618
      Mike Snitzer 提交于
      Improve cache_status to emit:
      <metadata block size> <#used metadata blocks>/<#total metadata blocks>
      <cache block size> <#used cache blocks>/<#total cache blocks>
      ...
      
      Adding the block sizes allows for easier calculation of the overall size
      of both the metadata and cache devices.  Adding <#total cache blocks>
      provides useful context for how much of the cache is used.
      
      Unfortunately these additions to the status will require updates to
      users' scripts that monitor the cache status.  But these changes help
      provide more comprehensive information about the cache device and will
      simplify tools that are being developed to manage dm-cache devices --
      because they won't need to issue 3 operations to cobble together the
      information that we can easily provide via a single status ioctl.
      
      While updating the status documentation in cache.txt spaces were
      tabify'd.
      Requested-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      6a388618
    • J
      dm btree: add dm_btree_find_lowest_key · f164e690
      Joe Thornber 提交于
      dm_btree_find_lowest_key is the reciprocal of dm_btree_find_highest_key.
      Factor out common code for dm_btree_find_{highest,lowest}_key.
      
      dm_btree_find_lowest_key is needed for an upcoming DM target, as such it
      is best to get this interface in place.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f164e690
  15. 09 1月, 2014 2 次提交