1. 28 2月, 2014 1 次提交
    • M
      dm thin: allow metadata space larger than supported to go unused · 7d48935e
      Mike Snitzer 提交于
      It was always intended that a user could provide a thin metadata device
      that is larger than the max supported by the on-disk format.  The extra
      space would just go unused.
      
      Unfortunately that never worked.  If the user attempted to use a larger
      metadata device on creation they would get an error like the following:
      
       device-mapper: space map common: space map too large
       device-mapper: transaction manager: couldn't create metadata space map
       device-mapper: thin metadata: tm_create_with_sm failed
       device-mapper: table: 252:17: thin-pool: Error creating metadata object
       device-mapper: ioctl: error adding target to table
      
      Fix this by allowing the initial metadata space map creation to cap its
      size at the max number of blocks supported (DM_SM_METADATA_MAX_BLOCKS).
      get_metadata_dev_size() must also impose DM_SM_METADATA_MAX_BLOCKS (via
      THIN_METADATA_MAX_SECTORS), otherwise extending metadata would cap at
      THIN_METADATA_MAX_SECTORS_WARNING (which is larger than supported).
      
      Also, the calculation for THIN_METADATA_MAX_SECTORS didn't account for
      the sizeof the disk_bitmap_header.  So the supported maximum metadata
      size is a bit smaller (reduced from 33423360 to 33292800 sectors).
      
      Lastly, remove the "excess space will not be used" warning message from
      get_metadata_dev_size(); it resulted in printing the warning multiple
      times.  Factor out warn_if_metadata_device_too_big(), call it from
      pool_ctr() and maybe_resize_metadata_dev().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      7d48935e
  2. 25 2月, 2014 1 次提交
    • M
      dm thin: fix the error path for the thin device constructor · 1acacc07
      Mike Snitzer 提交于
      dm_pool_close_thin_device() must be called if dm_set_target_max_io_len()
      fails in thin_ctr().  Otherwise __pool_destroy() will fail because the
      pool will still have an open thin device:
      
       device-mapper: thin metadata: attempt to close pmd when 1 device(s) are still open
       device-mapper: thin: __pool_destroy: dm_pool_metadata_close() failed.
      
      Also, must establish error code if failing thin_ctr() because the pool
      is in fail_io mode.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      1acacc07
  3. 18 2月, 2014 1 次提交
  4. 16 1月, 2014 1 次提交
  5. 07 1月, 2014 13 次提交
    • M
      dm thin: fix set_pool_mode exposed pool operation races · 8b64e881
      Mike Snitzer 提交于
      The pool mode must not be switched until after the corresponding pool
      process_* methods have been established.  Otherwise, because
      set_pool_mode() isn't interlocked with the IO path for performance
      reasons, the IO path can end up executing process_* operations that
      don't match the mode.  This patch eliminates problems like the following
      (as seen on really fast PCIe SSD storage when transitioning the pool's
      mode from PM_READ_ONLY to PM_WRITE):
      
      kernel: device-mapper: thin: 253:2: reached low water mark for data device: sending event.
      kernel: device-mapper: thin: 253:2: no free data space available.
      kernel: device-mapper: thin: 253:2: switching pool to read-only mode
      kernel: device-mapper: thin: 253:2: switching pool to write mode
      kernel: ------------[ cut here ]------------
      kernel: WARNING: CPU: 11 PID: 7564 at drivers/md/dm-thin.c:995 handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]()
      ...
      kernel: Workqueue: dm-thin do_worker [dm_thin_pool]
      kernel: 00000000000003e3 ffff880308831cc8 ffffffff8152ebcb 00000000000003e3
      kernel: 0000000000000000 ffff880308831d08 ffffffff8104c46c ffff88032502a800
      kernel: ffff880036409000 ffff88030ec7ce00 0000000000000001 00000000ffffffc3
      kernel: Call Trace:
      kernel: [<ffffffff8152ebcb>] dump_stack+0x49/0x5e
      kernel: [<ffffffff8104c46c>] warn_slowpath_common+0x8c/0xc0
      kernel: [<ffffffff8104c4ba>] warn_slowpath_null+0x1a/0x20
      kernel: [<ffffffffa001e2c6>] handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]
      kernel: [<ffffffffa001f276>] process_bio_read_only+0x136/0x180 [dm_thin_pool]
      kernel: [<ffffffffa0020b75>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
      kernel: [<ffffffffa0020d31>] do_worker+0x51/0x60 [dm_thin_pool]
      kernel: [<ffffffff81067823>] process_one_work+0x183/0x490
      kernel: [<ffffffff81068c70>] worker_thread+0x120/0x3a0
      kernel: [<ffffffff81068b50>] ? manage_workers+0x160/0x160
      kernel: [<ffffffff8106e86e>] kthread+0xce/0xf0
      kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
      kernel: [<ffffffff8153b3ec>] ret_from_fork+0x7c/0xb0
      kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
      kernel: ---[ end trace 3f00528e08ffa55c ]---
      kernel: device-mapper: thin: pool mode is PM_WRITE not PM_READ_ONLY like expected!?
      
      dm-thin.c:995 was the WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
      at the top of handle_unserviceable_bio().  And as the additional
      debugging I had conveys: the pool mode was _not_ PM_READ_ONLY like
      expected, it was already PM_WRITE, yet pool->process_bio was still set
      to process_bio_read_only().
      
      Also, while fixing this up, reduce logging of redundant pool mode
      transitions by checking new_mode is different from old_mode.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      8b64e881
    • M
      dm thin: eliminate the no_free_space flag · 6d16202b
      Mike Snitzer 提交于
      The pool's error_if_no_space flag can easily serve the same purpose that
      no_free_space did, namely: control whether handle_unserviceable_bio()
      will error a bio or requeue it.
      
      This is cleaner since error_if_no_space is established when the pool's
      features are processed during table load.  So it avoids managing the
      no_free_space flag by taking the pool's spinlock.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6d16202b
    • M
      dm thin: add error_if_no_space feature · 787a996c
      Mike Snitzer 提交于
      If the pool runs out of data or metadata space, the pool can either
      queue or error the IO destined to the data device.  The default is to
      queue the IO until more space is added.
      
      An admin may now configure the pool to error IO when no space is
      available by setting the 'error_if_no_space' feature when loading the
      thin-pool table.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      787a996c
    • M
      dm thin: requeue bios to DM core if no_free_space and in read-only mode · 8c0f0e8c
      Mike Snitzer 提交于
      Now that we switch the pool to read-only mode when the data device runs
      out of space it causes active writers to get IO errors once we resume
      after resizing the data device.
      
      If no_free_space is set, save bios to the 'retry_on_resume_list' and
      requeue them on resume (once the data or metadata device may have been
      resized).
      
      With this patch the resize_io test passes again (on slower storage):
       dmtest run --suite thin-provisioning -n /resize_io/
      
      Later patches fix some subtle races associated with the pool mode
      transitions done as part of the pool's -ENOSPC handling.  These races
      are exposed on fast storage (e.g. PCIe SSD).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      8c0f0e8c
    • M
      dm thin: cleanup and improve no space handling · 399caddf
      Mike Snitzer 提交于
      Factor out_of_data_space() out of alloc_data_block().  Eliminate the use
      of 'no_free_space' as a latch in alloc_data_block() -- this is no longer
      needed now that we switch to read-only mode when we run out of data or
      metadata space.  In a later patch, the 'no_free_space' flag will be
      eliminated entirely (in favor of checking metadata rather than relying
      on a transient flag).
      
      Move no metdata space handling into metdata_operation_failed().  Set
      no_free_space when metadata space is exhausted too.  This is useful,
      because it offers consistency, for the following patch that will requeue
      data IOs if no_free_space.
      
      Also, rename no_space() to retry_bios_on_resume().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      399caddf
    • M
      6f7f51d4
    • J
      dm thin: handle metadata failures more consistently · b5330655
      Joe Thornber 提交于
      Introduce metadata_operation_failed() wrappers, around set_pool_mode(),
      to assist with improving the consistency of how metadata failures are
      handled.  Logging is improved and metadata operation failures trigger
      read-only mode immediately.
      
      Also, eliminate redundant set_pool_mode() calls in the two
      alloc_data_block() caller's error paths.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b5330655
    • J
      dm thin: factor out check_low_water_mark and use bools · 88a6621b
      Joe Thornber 提交于
      Factor check_low_water_mark() out of alloc_data_block().
      Change a couple unsigned flags in the pool structure to bool.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      88a6621b
    • M
      dm thin: add mappings to end of prepared_* lists · daec338b
      Mike Snitzer 提交于
      Mappings could be processed in descending logical block order,
      particularly if buffered IO is used.  This could adversely affect the
      latency of IO processing.  Fix this by adding mappings to the end of the
      'prepared_mappings' and 'prepared_discards' lists.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      daec338b
    • J
    • M
      dm thin: use bool rather than unsigned for flags in structures · 7f214665
      Mike Snitzer 提交于
      Also, move 'err' member in dm_thin_new_mapping structure to eliminate 4
      byte hole (reduces size from 88 bytes to 80).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      7f214665
    • J
      dm thin: fix discard support to a previously shared block · 19fa1a67
      Joe Thornber 提交于
      If a snapshot is created and later deleted the origin dm_thin_device's
      snapshotted_time will have been updated to reflect the snapshot's
      creation time.  The 'shared' flag in the dm_thin_lookup_result struct
      returned from dm_thin_find_block() is an approximation based on
      snapshotted_time -- this is done to avoid 0(n), or worse, time
      complexity.  In this case, the shared flag would be true.
      
      But because the 'shared' flag reflects an approximation a block can be
      incorrectly assumed to be shared (e.g. false positive for 'shared'
      because the snapshot no longer exists).  This could result in discards
      issued to a thin device not being passed down to the pool's underlying
      data device.
      
      To fix this we double check that a thin block is really still in-use
      after a mapping is removed using dm_pool_block_is_used().  If the
      reference count for a block is now zero the discard is allowed to be
      passed down.
      
      Also add a 'definitely_not_shared' member to the dm_thin_new_mapping
      structure -- reflects that the 'shared' flag in the response from
      dm_thin_find_block() can only be held as definitive if false is
      returned.
      
      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1043527Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      19fa1a67
    • M
      dm thin: initialize dm_thin_new_mapping returned by get_next_mapping · 16961b04
      Mike Snitzer 提交于
      As additional members are added to the dm_thin_new_mapping structure
      care should be taken to make sure they get initialized before use.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      16961b04
  6. 11 12月, 2013 5 次提交
    • J
      dm thin: allow pool in read-only mode to transition to read-write mode · 9b7aaa64
      Joe Thornber 提交于
      A thin-pool may be in read-only mode because the pool's data or metadata
      space was exhausted.  To allow for recovery, by adding more space to the
      pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE
      mode.  Otherwise, running out of space will render the pool permanently
      read-only.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      9b7aaa64
    • J
      dm thin: re-establish read-only state when switching to fail mode · 5383ef3a
      Joe Thornber 提交于
      If the thin-pool transitioned to fail mode and the thin-pool's table
      were reloaded for some reason: the new table's default pool mode would
      be read-write, though it will transition to fail mode during resume.
      
      When the pool mode transitions directly from PM_WRITE to PM_FAIL we need
      to re-establish the intermediate read-only state in both the metadata
      and persistent-data block manager (as is usually done with the normal
      pool mode transition sequence: PM_WRITE -> PM_READ_ONLY -> PM_FAIL).
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5383ef3a
    • J
      dm thin: always fallback the pool mode if commit fails · 020cc3b5
      Joe Thornber 提交于
      Rename commit_or_fallback() to commit().  Now all previous calls to
      commit() will trigger the pool mode to fallback if the commit fails.
      
      Also, check the error returned from commit() in alloc_data_block().
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      020cc3b5
    • M
      dm thin: switch to read-only mode if metadata space is exhausted · 4a02b34e
      Mike Snitzer 提交于
      Switch the thin pool to read-only mode in alloc_data_block() if
      dm_pool_alloc_data_block() fails because the pool's metadata space is
      exhausted.
      
      Differentiate between data and metadata space in messages about no
      free space available.
      
      This issue was noticed with the device-mapper-test-suite using:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The quantity of errors logged in this case must be reduced.
      
      before patch:
      
      device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      <snip ... these repeat for a _very_ long while ... >
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: commit failed: error = -28
      device-mapper: thin: 253:4: switching pool to read-only mode
      
      after patch:
      
      device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: no free metadata space available.
      device-mapper: thin: 253:4: switching pool to read-only mode
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      4a02b34e
    • J
      dm thin: switch to read only mode if a mapping insert fails · fafc7a81
      Joe Thornber 提交于
      Switch the thin pool to read-only mode when dm_thin_insert_block() fails
      since there is little reason to expect the cause of the failure to be
      resolved without further action by user space.
      
      This issue was noticed with the device-mapper-test-suite using:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The quantity of errors logged in this case must be reduced.
      
      before patch:
      
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      <snip ... these repeat for a long while ... >
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: thin: 253:4: no free metadata space available.
      device-mapper: thin: 253:4: switching pool to read-only mode
      
      after patch:
      
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: dm_thin_insert_block() failed: error = -28
      device-mapper: thin: 253:4: switching pool to read-only mode
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      fafc7a81
  7. 24 11月, 2013 2 次提交
    • K
      block: Generic bio chaining · 196d38bc
      Kent Overstreet 提交于
      This adds a generic mechanism for chaining bio completions. This is
      going to be used for a bio_split() replacement, and it turns out to be
      very useful in a fair amount of driver code - a fair number of drivers
      were implementing this in their own roundabout ways, often painfully.
      
      Note that this means it's no longer to call bio_endio() more than once
      on the same bio! This can cause problems for drivers that save/restore
      bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
      - in all but the simplest cases they'd be better off just cloning the
      bio, and immutable biovecs is making bio cloning cheaper. But for now,
      we add a bio_endio_nodec() for these cases.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      196d38bc
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  8. 23 9月, 2013 1 次提交
    • M
      dm thin: do not expose non-zero discard limits if discards disabled · b60ab990
      Mike Snitzer 提交于
      Fix issue where the block layer would stack the discard limits of the
      pool's data device even if the "ignore_discard" pool feature was
      specified.
      
      The pool and thin device(s) still had discards disabled because the
      QUEUE_FLAG_DISCARD request_queue flag wasn't set.  But to avoid user
      confusion when "ignore_discard" is used: both the pool device and the
      thin device(s) have zeroes for all discard limits.
      
      Also, always set discard_zeroes_data_unsupported in targets because they
      should never advertise the 'discard_zeroes_data' capability (even if the
      pool's data device supports it).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      b60ab990
  9. 06 9月, 2013 3 次提交
  10. 23 8月, 2013 1 次提交
  11. 20 5月, 2013 1 次提交
    • A
      dm thin: fix metadata dev resize detection · 610bba8b
      Alasdair G Kergon 提交于
      Fix detection of the need to resize the dm thin metadata device.
      
      The code incorrectly tried to extend the metadata device when it
      didn't need to due to a merging error with patch 24347e95 ("dm thin:
      detect metadata device resizing").
      
        device-mapper: transaction manager: couldn't open metadata space map
        device-mapper: thin metadata: tm_open_with_sm failed
        device-mapper: thin: aborting transaction failed
        device-mapper: thin: switching pool to failure mode
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      610bba8b
  12. 10 5月, 2013 4 次提交
  13. 21 3月, 2013 2 次提交
    • J
      dm thin: fix non power of two discard granularity calc · 58051b94
      Joe Thornber 提交于
      Fix a discard granularity calculation to work for non power of 2 block sizes.
      
      In order for thinp to passdown discard bios to the underlying data
      device, the data device must have a discard granularity that is a
      factor of the thinp block size.  Originally this check was done by
      using bitops since the block_size was known to be a power of two.
      
      Introduced by commit f13945d7
      ("dm thin: support a non power of 2 discard_granularity").
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      58051b94
    • J
      dm thin: fix discard corruption · f046f89a
      Joe Thornber 提交于
      Fix a bug in dm_btree_remove that could leave leaf values with incorrect
      reference counts.  The effect of this was that removal of a shared block
      could result in the space maps thinking the block was no longer used.
      More concretely, if you have a thin device and a snapshot of it, sending
      a discard to a shared region of the thin could corrupt the snapshot.
      
      Thinp uses a 2-level nested btree to store it's mappings.  This first
      level is indexed by thin device, and the second level by logical
      block.
      
      Often when we're removing an entry in this mapping tree we need to
      rebalance nodes, which can involve shadowing them, possibly creating a
      copy if the block is shared.  If we do create a copy then children of
      that node need to have their reference counts incremented.  In this
      way reference counts percolate down the tree as shared trees diverge.
      
      The rebalance functions were incrementing the children at the
      appropriate time, but they were always assuming the children were
      internal nodes.  This meant the leaf values (in our case packed
      block/flags entries) were not being incremented.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f046f89a
  14. 02 3月, 2013 4 次提交
    • J
      dm thin: remove cells from stack · 025b9685
      Joe Thornber 提交于
      This patch takes advantage of the new bio-prison interface where the
      memory is now passed in rather than using a mempool in bio-prison.
      This allows the map function to avoid performing potentially-blocking
      allocations that could lead to deadlocks: We want to avoid the cell
      allocation that is done in bio_detain.
      
      (The potential for mempool deadlocks still remains in other functions
      that use bio_detain.)
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      025b9685
    • J
      dm bio prison: pass cell memory in · 6beca5eb
      Joe Thornber 提交于
      Change the dm_bio_prison interface so that instead of allocating memory
      internally, dm_bio_detain is supplied with a pre-allocated cell each
      time it is called.
      
      This enables a subsequent patch to move the allocation of the struct
      dm_bio_prison_cell outside the thin target's mapping function so it can
      no longer block there.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6beca5eb
    • M
      dm kcopyd: introduce configurable throttling · df5d2e90
      Mikulas Patocka 提交于
      This patch allows the administrator to reduce the rate at which kcopyd
      issues I/O.
      
      Each module that uses kcopyd acquires a throttle parameter that can be
      set in /sys/module/*/parameters.
      
      We maintain a history of kcopyd usage by each module in the variables
      io_period and total_period in struct dm_kcopyd_throttle. The actual
      kcopyd activity is calculated as a percentage of time equal to
      "(100 * io_period / total_period)".  This is compared with the user-defined
      throttle percentage threshold and if it is exceeded, we sleep.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      df5d2e90
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef