1. 08 1月, 2014 1 次提交
  2. 07 1月, 2014 20 次提交
    • C
      dm snapshot: call destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() · c1a64160
      Chuansheng Liu 提交于
      In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
      call destroy_work_on_stack() which frees the debug object to pair
      with INIT_WORK_ONSTACK().
      Signed-off-by: NLiu, Chuansheng <chuansheng.liu@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c1a64160
    • J
      dm cache policy mq: introduce three promotion threshold tunables · 78e03d69
      Joe Thornber 提交于
      Internally the mq policy maintains a promotion threshold variable.  If
      the hit count of a block not in the cache goes above this threshold it
      gets promoted to the cache.
      
      This patch introduces three new tunables that allow you to tweak the
      promotion threshold by adding a small value.  These adjustments depend
      on the io type:
      
         read_promote_adjustment:    READ io, default 4
         write_promote_adjustment:   WRITE io, default 8
         discard_promote_adjustment: READ/WRITE io to a discarded block, default 1
      
      If you're trying to quickly warm a new cache device you may wish to
      reduce these to encourage promotion.  Remember to switch them back to
      their defaults after the cache fills though.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      78e03d69
    • W
    • M
      dm thin: fix set_pool_mode exposed pool operation races · 8b64e881
      Mike Snitzer 提交于
      The pool mode must not be switched until after the corresponding pool
      process_* methods have been established.  Otherwise, because
      set_pool_mode() isn't interlocked with the IO path for performance
      reasons, the IO path can end up executing process_* operations that
      don't match the mode.  This patch eliminates problems like the following
      (as seen on really fast PCIe SSD storage when transitioning the pool's
      mode from PM_READ_ONLY to PM_WRITE):
      
      kernel: device-mapper: thin: 253:2: reached low water mark for data device: sending event.
      kernel: device-mapper: thin: 253:2: no free data space available.
      kernel: device-mapper: thin: 253:2: switching pool to read-only mode
      kernel: device-mapper: thin: 253:2: switching pool to write mode
      kernel: ------------[ cut here ]------------
      kernel: WARNING: CPU: 11 PID: 7564 at drivers/md/dm-thin.c:995 handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]()
      ...
      kernel: Workqueue: dm-thin do_worker [dm_thin_pool]
      kernel: 00000000000003e3 ffff880308831cc8 ffffffff8152ebcb 00000000000003e3
      kernel: 0000000000000000 ffff880308831d08 ffffffff8104c46c ffff88032502a800
      kernel: ffff880036409000 ffff88030ec7ce00 0000000000000001 00000000ffffffc3
      kernel: Call Trace:
      kernel: [<ffffffff8152ebcb>] dump_stack+0x49/0x5e
      kernel: [<ffffffff8104c46c>] warn_slowpath_common+0x8c/0xc0
      kernel: [<ffffffff8104c4ba>] warn_slowpath_null+0x1a/0x20
      kernel: [<ffffffffa001e2c6>] handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]
      kernel: [<ffffffffa001f276>] process_bio_read_only+0x136/0x180 [dm_thin_pool]
      kernel: [<ffffffffa0020b75>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
      kernel: [<ffffffffa0020d31>] do_worker+0x51/0x60 [dm_thin_pool]
      kernel: [<ffffffff81067823>] process_one_work+0x183/0x490
      kernel: [<ffffffff81068c70>] worker_thread+0x120/0x3a0
      kernel: [<ffffffff81068b50>] ? manage_workers+0x160/0x160
      kernel: [<ffffffff8106e86e>] kthread+0xce/0xf0
      kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
      kernel: [<ffffffff8153b3ec>] ret_from_fork+0x7c/0xb0
      kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
      kernel: ---[ end trace 3f00528e08ffa55c ]---
      kernel: device-mapper: thin: pool mode is PM_WRITE not PM_READ_ONLY like expected!?
      
      dm-thin.c:995 was the WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
      at the top of handle_unserviceable_bio().  And as the additional
      debugging I had conveys: the pool mode was _not_ PM_READ_ONLY like
      expected, it was already PM_WRITE, yet pool->process_bio was still set
      to process_bio_read_only().
      
      Also, while fixing this up, reduce logging of redundant pool mode
      transitions by checking new_mode is different from old_mode.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      8b64e881
    • M
      dm thin: eliminate the no_free_space flag · 6d16202b
      Mike Snitzer 提交于
      The pool's error_if_no_space flag can easily serve the same purpose that
      no_free_space did, namely: control whether handle_unserviceable_bio()
      will error a bio or requeue it.
      
      This is cleaner since error_if_no_space is established when the pool's
      features are processed during table load.  So it avoids managing the
      no_free_space flag by taking the pool's spinlock.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6d16202b
    • M
      dm thin: add error_if_no_space feature · 787a996c
      Mike Snitzer 提交于
      If the pool runs out of data or metadata space, the pool can either
      queue or error the IO destined to the data device.  The default is to
      queue the IO until more space is added.
      
      An admin may now configure the pool to error IO when no space is
      available by setting the 'error_if_no_space' feature when loading the
      thin-pool table.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      787a996c
    • M
      dm thin: requeue bios to DM core if no_free_space and in read-only mode · 8c0f0e8c
      Mike Snitzer 提交于
      Now that we switch the pool to read-only mode when the data device runs
      out of space it causes active writers to get IO errors once we resume
      after resizing the data device.
      
      If no_free_space is set, save bios to the 'retry_on_resume_list' and
      requeue them on resume (once the data or metadata device may have been
      resized).
      
      With this patch the resize_io test passes again (on slower storage):
       dmtest run --suite thin-provisioning -n /resize_io/
      
      Later patches fix some subtle races associated with the pool mode
      transitions done as part of the pool's -ENOSPC handling.  These races
      are exposed on fast storage (e.g. PCIe SSD).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      8c0f0e8c
    • M
      dm thin: cleanup and improve no space handling · 399caddf
      Mike Snitzer 提交于
      Factor out_of_data_space() out of alloc_data_block().  Eliminate the use
      of 'no_free_space' as a latch in alloc_data_block() -- this is no longer
      needed now that we switch to read-only mode when we run out of data or
      metadata space.  In a later patch, the 'no_free_space' flag will be
      eliminated entirely (in favor of checking metadata rather than relying
      on a transient flag).
      
      Move no metdata space handling into metdata_operation_failed().  Set
      no_free_space when metadata space is exhausted too.  This is useful,
      because it offers consistency, for the following patch that will requeue
      data IOs if no_free_space.
      
      Also, rename no_space() to retry_bios_on_resume().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      399caddf
    • M
      6f7f51d4
    • J
      dm thin: handle metadata failures more consistently · b5330655
      Joe Thornber 提交于
      Introduce metadata_operation_failed() wrappers, around set_pool_mode(),
      to assist with improving the consistency of how metadata failures are
      handled.  Logging is improved and metadata operation failures trigger
      read-only mode immediately.
      
      Also, eliminate redundant set_pool_mode() calls in the two
      alloc_data_block() caller's error paths.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b5330655
    • J
      dm thin: factor out check_low_water_mark and use bools · 88a6621b
      Joe Thornber 提交于
      Factor check_low_water_mark() out of alloc_data_block().
      Change a couple unsigned flags in the pool structure to bool.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      88a6621b
    • M
      dm thin: add mappings to end of prepared_* lists · daec338b
      Mike Snitzer 提交于
      Mappings could be processed in descending logical block order,
      particularly if buffered IO is used.  This could adversely affect the
      latency of IO processing.  Fix this by adding mappings to the end of the
      'prepared_mappings' and 'prepared_discards' lists.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      daec338b
    • J
    • M
      dm thin: use bool rather than unsigned for flags in structures · 7f214665
      Mike Snitzer 提交于
      Also, move 'err' member in dm_thin_new_mapping structure to eliminate 4
      byte hole (reduces size from 88 bytes to 80).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      7f214665
    • M
      dm persistent data: cleanup dm-thin specific references in text · 10343180
      Mike Snitzer 提交于
      DM's persistent-data library is now used my multiple targets so
      exclusive references to "pool" or "thin provisioning" need to be
      cleaned up.  Adjust Kconfig's DM_DEBUG_BLOCK_STACK_TRACING text
      and remove "pool" from a block manager error message.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      10343180
    • M
      dm space map metadata: limit errors in sm_metadata_new_block · c46985e2
      Mike Snitzer 提交于
      The "unable to allocate new metadata block" error can be a particularly
      verbose error if there is a systemic issue with the metadata device.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      c46985e2
    • M
      dm delay: use per-bio data instead of a mempool and slab cache · 42065460
      Mikulas Patocka 提交于
      Starting with commit c0820cf5 ("dm: introduce per_bio_data"),
      device mapper has the capability to pre-allocate a target-specific
      structure with the bio.
      
      This patch changes dm-delay to use this facility instead of a slab cache
      and mempool.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      42065460
    • M
      dm table: remove unused buggy code that extends the targets array · 57a2f238
      Mikulas Patocka 提交于
      A device mapper table is allocated in the following way:
      * The function dm_table_create is called, it gets the number of targets
        as an argument -- it allocates a targets array accordingly.
      * For each target, we call dm_table_add_target.
      
      If we add more targets than were specified in dm_table_create, the
      function dm_table_add_target reallocates the targets array.  However,
      this reallocation code is wrong - it moves the targets array to a new
      location, while some target constructors hold pointers to the array in
      the old location.
      
      The following DM target drivers save the pointer to the target
      structure, so they corrupt memory if the target array is moved:
      multipath, raid, mirror, snapshot, stripe, switch, thin, verity.
      
      Under normal circumstances, the reallocation function is not called
      (because dm_table_create is called with the correct number of targets),
      so the buggy reallocation code is not used.
      
      Prior to the fix "dm table: fail dm_table_create on dm_round_up
      overflow", the reallocation code could only be used in case the user
      specifies too large a value in param->target_count, such as 0xffffffff.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      57a2f238
    • J
      dm thin: fix discard support to a previously shared block · 19fa1a67
      Joe Thornber 提交于
      If a snapshot is created and later deleted the origin dm_thin_device's
      snapshotted_time will have been updated to reflect the snapshot's
      creation time.  The 'shared' flag in the dm_thin_lookup_result struct
      returned from dm_thin_find_block() is an approximation based on
      snapshotted_time -- this is done to avoid 0(n), or worse, time
      complexity.  In this case, the shared flag would be true.
      
      But because the 'shared' flag reflects an approximation a block can be
      incorrectly assumed to be shared (e.g. false positive for 'shared'
      because the snapshot no longer exists).  This could result in discards
      issued to a thin device not being passed down to the pool's underlying
      data device.
      
      To fix this we double check that a thin block is really still in-use
      after a mapping is removed using dm_pool_block_is_used().  If the
      reference count for a block is now zero the discard is allowed to be
      passed down.
      
      Also add a 'definitely_not_shared' member to the dm_thin_new_mapping
      structure -- reflects that the 'shared' flag in the response from
      dm_thin_find_block() can only be held as definitive if false is
      returned.
      
      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1043527Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      19fa1a67
    • M
      dm thin: initialize dm_thin_new_mapping returned by get_next_mapping · 16961b04
      Mike Snitzer 提交于
      As additional members are added to the dm_thin_new_mapping structure
      care should be taken to make sure they get initialized before use.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      16961b04
  3. 14 12月, 2013 2 次提交
    • J
      dm array: fix a reference counting bug in shadow_ablock · ed9571f0
      Joe Thornber 提交于
      An old array block could have its reference count decremented below
      zero when it is being replaced in the btree by a new array block.
      
      The fix is to increment the old ablock's reference count just before
      inserting a new ablock into the btree.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.9+
      ed9571f0
    • J
      dm space map: disallow decrementing a reference count below zero · 5b564d80
      Joe Thornber 提交于
      The old behaviour, returning -EINVAL if a ref_count of 0 would be
      decremented, was removed in commit f722063e ("dm space map: optimise
      sm_ll_dec and sm_ll_inc").  To fix this regression we return an error
      code from the mutator function pointer passed to sm_ll_mutate() and have
      dec_ref_count() return -EINVAL if the old ref_count is 0.
      
      Add a DMERR to reflect the potential seriousness of this error.
      
      Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path.
      
      With this fix the following dmts regression test now passes:
       dmtest run --suite cache -n /metadata_use_kernel/
      
      The next patch fixes the higher-level dm-array code that exposed this
      regression.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.12+
      5b564d80
  4. 11 12月, 2013 12 次提交
    • M
      dm stats: initialize read-only module parameter · 76f5bee5
      Mikulas Patocka 提交于
      The module parameter stats_current_allocated_bytes in dm-mod is
      read-only.  This parameter informs the user about memory
      consumption.  It is not supposed to be changed by the user.
      
      However, despite being read-only, this parameter can be set on
      modprobe or insmod command line:
      modprobe dm-mod stats_current_allocated_bytes=12345
      
      The kernel doesn't expect that this variable can be non-zero at module
      initialization and if the user sets it, it results in warning.
      
      This patch initializes the variable in the module init routine, so
      that user-supplied value is ignored.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.12+
      76f5bee5
    • M
      dm bufio: initialize read-only module parameters · 4cb57ab4
      Mikulas Patocka 提交于
      Some module parameters in dm-bufio are read-only. These parameters
      inform the user about memory consumption. They are not supposed to be
      changed by the user.
      
      However, despite being read-only, these parameters can be set on
      modprobe or insmod command line, for example:
      modprobe dm-bufio current_allocated_bytes=12345
      
      The kernel doesn't expect that these variables can be non-zero at module
      initialization and if the user sets them, it results in BUG.
      
      This patch initializes the variables in the module init routine, so that
      user-supplied values are ignored.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.2+
      4cb57ab4
    • V
      dm cache: actually resize cache · 08844800
      Vincent Pelletier 提交于
      Commit f494a9c6 ("dm cache: cache
      shrinking support") broke cache resizing support.
      
      dm_cache_resize() is called with cache->cache_size before it gets
      updated to new_size, so it is a no-op.  But the dm-cache superblock is
      updated with the new_size even though the backing dm-array is not
      resized.  Fix this by passing the new_size to dm_cache_resize().
      Signed-off-by: NVincent Pelletier <plr.vincent@gmail.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      08844800
    • J
      dm cache policy mq: fix promotions to occur as expected · af95e7a6
      Joe Thornber 提交于
      Micro benchmarks that repeatedly issued IO to a single block were
      failing to cause a promotion from the origin device to the cache.  Fix
      this by not updating the stats during map() if -EWOULDBLOCK will be
      returned.
      
      The mq policy will only update stats, consider migration, etc, once per
      tick period (a unit of time established between dm-cache core and the
      policies).
      
      When the IO thread calls the policy's map method, if it would like to
      migrate the associated block it returns -EWOULDBLOCK, the IO then gets
      handed over to a worker thread which handles the migration.  The worker
      thread calls map again, to check the migration is still needed (avoids a
      race among other things).  *BUT*, before this fix, if we were still in
      the same tick period the stats were already updated by the previous map
      call -- so the migration would no longer be requested.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      af95e7a6
    • J
      dm thin: allow pool in read-only mode to transition to read-write mode · 9b7aaa64
      Joe Thornber 提交于
      A thin-pool may be in read-only mode because the pool's data or metadata
      space was exhausted.  To allow for recovery, by adding more space to the
      pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE
      mode.  Otherwise, running out of space will render the pool permanently
      read-only.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      9b7aaa64
    • J
      dm thin: re-establish read-only state when switching to fail mode · 5383ef3a
      Joe Thornber 提交于
      If the thin-pool transitioned to fail mode and the thin-pool's table
      were reloaded for some reason: the new table's default pool mode would
      be read-write, though it will transition to fail mode during resume.
      
      When the pool mode transitions directly from PM_WRITE to PM_FAIL we need
      to re-establish the intermediate read-only state in both the metadata
      and persistent-data block manager (as is usually done with the normal
      pool mode transition sequence: PM_WRITE -> PM_READ_ONLY -> PM_FAIL).
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5383ef3a
    • J
      dm thin: always fallback the pool mode if commit fails · 020cc3b5
      Joe Thornber 提交于
      Rename commit_or_fallback() to commit().  Now all previous calls to
      commit() will trigger the pool mode to fallback if the commit fails.
      
      Also, check the error returned from commit() in alloc_data_block().
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      020cc3b5
    • M
      dm thin: switch to read-only mode if metadata space is exhausted · 4a02b34e
      Mike Snitzer 提交于
      Switch the thin pool to read-only mode in alloc_data_block() if
      dm_pool_alloc_data_block() fails because the pool's metadata space is
      exhausted.
      
      Differentiate between data and metadata space in messages about no
      free space available.
      
      This issue was noticed with the device-mapper-test-suite using:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The quantity of errors logged in this case must be reduced.
      
      before patch:
      
      device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      <snip ... these repeat for a _very_ long while ... >
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: commit failed: error = -28
      device-mapper: thin: 253:4: switching pool to read-only mode
      
      after patch:
      
      device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: no free metadata space available.
      device-mapper: thin: 253:4: switching pool to read-only mode
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      4a02b34e
    • J
      dm thin: switch to read only mode if a mapping insert fails · fafc7a81
      Joe Thornber 提交于
      Switch the thin pool to read-only mode when dm_thin_insert_block() fails
      since there is little reason to expect the cause of the failure to be
      resolved without further action by user space.
      
      This issue was noticed with the device-mapper-test-suite using:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The quantity of errors logged in this case must be reduced.
      
      before patch:
      
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      <snip ... these repeat for a long while ... >
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: thin: 253:4: no free metadata space available.
      device-mapper: thin: 253:4: switching pool to read-only mode
      
      after patch:
      
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: dm_thin_insert_block() failed: error = -28
      device-mapper: thin: 253:4: switching pool to read-only mode
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      fafc7a81
    • M
      dm space map metadata: return on failure in sm_metadata_new_block · f62b6b8f
      Mike Snitzer 提交于
      Commit 2fc48021 ("dm persistent
      metadata: add space map threshold callback") introduced a regression
      to the metadata block allocation path that resulted in errors being
      ignored.  This regression was uncovered by running the following
      device-mapper-test-suite test:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The ignored error codes in sm_metadata_new_block() could crash the
      kernel through use of either the dm-thin or dm-cache targets, e.g.:
      
      device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
      device-mapper: space map metadata: unable to allocate new metadata block
      general protection fault: 0000 [#1] SMP
      ...
      Workqueue: dm-thin do_worker [dm_thin_pool]
      task: ffff880035ce2ab0 ti: ffff88021a054000 task.ti: ffff88021a054000
      RIP: 0010:[<ffffffffa0331385>]  [<ffffffffa0331385>] metadata_ll_load_ie+0x15/0x30 [dm_persistent_data]
      RSP: 0018:ffff88021a055a68  EFLAGS: 00010202
      RAX: 003fc8243d212ba0 RBX: ffff88021a780070 RCX: ffff88021a055a78
      RDX: ffff88021a055a78 RSI: 0040402222a92a80 RDI: ffff88021a780070
      RBP: ffff88021a055a68 R08: ffff88021a055ba4 R09: 0000000000000010
      R10: 0000000000000000 R11: 00000002a02e1000 R12: ffff88021a055ad4
      R13: 0000000000000598 R14: ffffffffa0338470 R15: ffff88021a055ba4
      FS:  0000000000000000(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007f467c0291b8 CR3: 0000000001a0b000 CR4: 00000000000007e0
      Stack:
       ffff88021a055ab8 ffffffffa0332020 ffff88021a055b30 0000000000000001
       ffff88021a055b30 0000000000000000 ffff88021a055b18 0000000000000000
       ffff88021a055ba4 ffff88021a055b98 ffff88021a055ae8 ffffffffa033304c
      Call Trace:
       [<ffffffffa0332020>] sm_ll_lookup_bitmap+0x40/0xa0 [dm_persistent_data]
       [<ffffffffa033304c>] sm_metadata_count_is_more_than_one+0x8c/0xc0 [dm_persistent_data]
       [<ffffffffa0333825>] dm_tm_shadow_block+0x65/0x110 [dm_persistent_data]
       [<ffffffffa0331b00>] sm_ll_mutate+0x80/0x300 [dm_persistent_data]
       [<ffffffffa0330e60>] ? set_ref_count+0x10/0x10 [dm_persistent_data]
       [<ffffffffa0331dba>] sm_ll_inc+0x1a/0x20 [dm_persistent_data]
       [<ffffffffa0332270>] sm_disk_new_block+0x60/0x80 [dm_persistent_data]
       [<ffffffff81520036>] ? down_write+0x16/0x40
       [<ffffffffa001e5c4>] dm_pool_alloc_data_block+0x54/0x80 [dm_thin_pool]
       [<ffffffffa001b23c>] alloc_data_block+0x9c/0x130 [dm_thin_pool]
       [<ffffffffa001c27e>] provision_block+0x4e/0x180 [dm_thin_pool]
       [<ffffffffa001fe9a>] ? dm_thin_find_block+0x6a/0x110 [dm_thin_pool]
       [<ffffffffa001c57a>] process_bio+0x1ca/0x1f0 [dm_thin_pool]
       [<ffffffff8111e2ed>] ? mempool_free+0x8d/0xa0
       [<ffffffffa001d755>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
       [<ffffffffa001d911>] do_worker+0x51/0x60 [dm_thin_pool]
       [<ffffffff81067872>] process_one_work+0x182/0x3b0
       [<ffffffff81068c90>] worker_thread+0x120/0x3a0
       [<ffffffff81068b70>] ? manage_workers+0x160/0x160
       [<ffffffff8106eb2e>] kthread+0xce/0xe0
       [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
       [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
       [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org # v3.10+
      f62b6b8f
    • M
      dm table: fail dm_table_create on dm_round_up overflow · 5b2d0657
      Mikulas Patocka 提交于
      The dm_round_up function may overflow to zero.  In this case,
      dm_table_create() must fail rather than go on to allocate an empty array
      with alloc_targets().
      
      This fixes a possible memory corruption that could be caused by passing
      too large a number in "param->target_count".
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5b2d0657
    • M
      dm snapshot: avoid snapshot space leak on crash · 230c83af
      Mikulas Patocka 提交于
      There is a possible leak of snapshot space in case of crash.
      
      The reason for space leaking is that chunks in the snapshot device are
      allocated sequentially, but they are finished (and stored in the metadata)
      out of order, depending on the order in which copying finished.
      
      For example, supposed that the metadata contains the following records
      SUPERBLOCK
      METADATA (blocks 0 ... 250)
      DATA 0
      DATA 1
      DATA 2
      ...
      DATA 250
      
      Now suppose that you allocate 10 new data blocks 251-260. Suppose that
      copying of these blocks finish out of order (block 260 finished first
      and the block 251 finished last). Now, the snapshot device looks like
      this:
      SUPERBLOCK
      METADATA (blocks 0 ... 250, 260, 259, 258, 257, 256)
      DATA 0
      DATA 1
      DATA 2
      ...
      DATA 250
      DATA 251
      DATA 252
      DATA 253
      DATA 254
      DATA 255
      METADATA (blocks 255, 254, 253, 252, 251)
      DATA 256
      DATA 257
      DATA 258
      DATA 259
      DATA 260
      
      Now, if the machine crashes after writing the first metadata block but
      before writing the second metadata block, the space for areas DATA 250-255
      is leaked, it contains no valid data and it will never be used in the
      future.
      
      This patch makes dm-snapshot complete exceptions in the same order they
      were allocated, thus fixing this bug.
      
      Note: when backporting this patch to the stable kernel, change the version
      field in the following way:
      * if version in the stable kernel is {1, 11, 1}, change it to {1, 12, 0}
      * if version in the stable kernel is {1, 10, 0} or {1, 10, 1}, change it
        to {1, 10, 2}
      Userspace reads the version to determine if the bug was fixed, so the
      version change is needed.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      230c83af
  5. 28 11月, 2013 3 次提交
  6. 25 11月, 2013 1 次提交
  7. 19 11月, 2013 1 次提交
    • M
      md/raid5: Use conf->device_lock protect changing of multi-thread resources. · 60aaf933
      majianpeng 提交于
      When we change group_thread_cnt from sysfs entry, it can OOPS.
      
      The kernel messages are:
      [  135.299021] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  135.299073] IP: [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
      [  135.299107] PGD 0
      [  135.299122] Oops: 0000 [#1] SMP
      [  135.299144] Modules linked in: netconsole e1000e ptp pps_core
      [  135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24
      [  135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
      [  135.299255] task: ffff8800b9638f80 ti: ffff8800b77a4000 task.ti: ffff8800b77a4000
      [  135.299283] RIP: 0010:[<ffffffff815188ab>]  [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
      [  135.299323] RSP: 0018:ffff8800b77a5c48  EFLAGS: 00010002
      [  135.299344] RAX: ffff880037bb5c70 RBX: 0000000000000000 RCX: 0000000000000008
      [  135.299371] RDX: ffff880037bb5cb8 RSI: 0000000000000001 RDI: ffff880037bb5c00
      [  135.299398] RBP: ffff8800b77a5d08 R08: 0000000000000001 R09: 0000000000000000
      [  135.299425] R10: ffff8800b77a5c98 R11: 00000000ffffffff R12: ffff880037bb5c00
      [  135.299452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880037bb5c70
      [  135.299479] FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
      [  135.299510] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  135.299532] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000407e0
      [  135.299559] Stack:
      [  135.299570]  ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300
      [  135.299611]  000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8
      [  135.299654]  000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98
      [  135.299696] Call Trace:
      [  135.299711]  [<ffffffff8107383e>] ? __wake_up+0x4e/0x70
      [  135.299733]  [<ffffffff81518f88>] raid5d+0x4c8/0x680
      [  135.299756]  [<ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0
      [  135.299781]  [<ffffffff81524c9f>] md_thread+0x11f/0x170
      [  135.299804]  [<ffffffff81069cd0>] ? wake_up_bit+0x40/0x40
      [  135.299826]  [<ffffffff81524b80>] ? md_rdev_init+0x110/0x110
      [  135.299850]  [<ffffffff81069656>] kthread+0xc6/0xd0
      [  135.299871]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
      [  135.299899]  [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0
      [  135.299923]  [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70
      [  135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90
      [  135.300005] RIP  [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440
      [  135.300005]  RSP <ffff8800b77a5c48>
      [  135.300005] CR2: 0000000000000000
      [  135.300005] ---[ end trace 504854e5bb7562ed ]---
      [  135.300005] Kernel panic - not syncing: Fatal exception
      
      This is because raid5d() can be running when the multi-thread
      resources are changed via system. We see need to provide locking.
      
      mddev->device_lock is suitable, but we cannot simple call
      alloc_thread_groups under this lock as we cannot allocate memory
      while holding a spinlock.
      So change alloc_thread_groups() to allocate and return the data
      structures, then raid5_store_group_thread_cnt() can take the lock
      while updating the pointers to the data structures.
      
      This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x
      stable series.
      
      Fixes: b721420e
      Cc: stable@vger.kernel.org (3.12)
      Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reviewed-by: NShaohua Li <shli@kernel.org>
      60aaf933