1. 02 5月, 2014 1 次提交
  2. 05 4月, 2014 1 次提交
    • J
      dm cache: fix a lock-inversion · 0596661f
      Joe Thornber 提交于
      When suspending a cache the policy is walked and the individual policy
      hints written to the metadata via sync_metadata().  This led to this
      lock order:
      
            policy->lock
              cache_metadata->root_lock
      
      When loading the cache target the policy is populated while the metadata
      lock is held:
      
            cache_metadata->root_lock
               policy->lock
      
      Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
      ensuring the cache_metadata root_lock is held whilst all the hints are
      written, rather than being repeatedly locked while policy->lock is held
      (as was the case with each callout that policy_walk_mappings() made to
      the old save_hint() method).
      
      Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
      locking correctness") build option.  However, it is not clear how the
      LOCKDEP reported paths can lead to a deadlock since the two paths,
      suspending a target and loading a target, never occur at the same time.
      But that doesn't mean the same lock-inversion couldn't have occurred
      elsewhere.
      Reported-by: NMarian Csontos <mcsontos@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      0596661f
  3. 28 3月, 2014 2 次提交
  4. 13 3月, 2014 2 次提交
    • H
      dm cache: fix access beyond end of origin device · e893fba9
      Heinz Mauelshagen 提交于
      In order to avoid wasting cache space a partial block at the end of the
      origin device is not cached.  Unfortunately, the check for such a
      partial block at the end of the origin device was flawed.
      
      Fix accesses beyond the end of the origin device that occured due to
      attempted promotion of an undetected partial block by:
      
      - initializing the per bio data struct to allow cache_end_io to work properly
      - recognizing access to the partial block at the end of the origin device
      - avoiding out of bounds access to the discard bitset
      
      Otherwise, users can experience errors like the following:
      
       attempt to access beyond end of device
       dm-5: rw=0, want=20971520, limit=20971456
       ...
       device-mapper: cache: promotion failed; couldn't copy block
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      e893fba9
    • H
      dm cache: fix truncation bug when copying a block to/from >2TB fast device · 8b9d9666
      Heinz Mauelshagen 提交于
      During demotion or promotion to a cache's >2TB fast device we must not
      truncate the cache block's associated sector to 32bits.  The 32bit
      temporary result of from_cblock() caused a 32bit multiplication when
      calculating the sector of the fast device in issue_copy_real().
      
      Use an intermediate 64bit type to store the 32bit from_cblock() to allow
      for proper 64bit multiplication.
      
      Here is an example of how this bug manifests on an ext4 filesystem:
      
       EXT4-fs error (device dm-0): ext4_mb_generate_buddy:756: group 17136, 32768 clusters in bitmap, 30688 in gd; block bitmap corrupt.
       JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      8b9d9666
  5. 28 2月, 2014 1 次提交
  6. 18 2月, 2014 2 次提交
    • M
      dm cache: do not add migration to completed list before unhooking bio · 80ae49aa
      Mike Snitzer 提交于
      When completing an overwrite bio, in overwrite_endio(), the associated
      migration should not be added to the 'completed_migrations' until the
      bio's fields are restored with dm_unhook_bio().
      
      Otherwise, do_worker() can race to process 'completed_migrations' before
      dm_unhook_bio() -- so the bio's bi_end_io is incorrect.  This is
      unlikely to cause any problems given the current code but should be
      fixed on the basis of correctness.
      
      Also, the cache's spinlock only needs to be held when manipulating the
      'completed_migrations' list -- other changes don't need protection.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      80ae49aa
    • M
      dm cache: move hook_info into common portion of per_bio_data structure · c6eda5e8
      Mike Snitzer 提交于
      Commit c9d28d5d ("dm cache: promotion optimisation for writes")
      incorrectly placed the 'hook_info' member in the writethrough-only
      portion of the per_bio_data structure.
      
      Given that the overwrite optimization may be used for writeback the
      'hook_info' member must be placed above the 'cache' member of the
      per_bio_data structure.  Any members above 'cache' are available from
      both writeback and writethrough modes' per_bio_data structure.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org # 3.13+
      c6eda5e8
  7. 17 1月, 2014 1 次提交
    • M
      dm cache: add policy name to status output · 2e68c4e6
      Mike Snitzer 提交于
      The cache's policy may have been established using the "default" alias,
      which is currently the "mq" policy but the default policy may change in
      the future.  It is useful to know exactly which policy is being used.
      
      Add a 'real' member to the dm_cache_policy_type structure and have the
      "default" dm_cache_policy_type point to the real "mq"
      dm_cache_policy_type.  Update dm_cache_policy_get_name() to check if
      real is set, if so report the name of the real policy (not the alias).
      Requested-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2e68c4e6
  8. 10 1月, 2014 1 次提交
    • M
      dm cache: add block sizes and total cache blocks to status output · 6a388618
      Mike Snitzer 提交于
      Improve cache_status to emit:
      <metadata block size> <#used metadata blocks>/<#total metadata blocks>
      <cache block size> <#used cache blocks>/<#total cache blocks>
      ...
      
      Adding the block sizes allows for easier calculation of the overall size
      of both the metadata and cache devices.  Adding <#total cache blocks>
      provides useful context for how much of the cache is used.
      
      Unfortunately these additions to the status will require updates to
      users' scripts that monitor the cache status.  But these changes help
      provide more comprehensive information about the cache device and will
      simplify tools that are being developed to manage dm-cache devices --
      because they won't need to issue 3 operations to cobble together the
      information that we can easily provide via a single status ioctl.
      
      While updating the status documentation in cache.txt spaces were
      tabify'd.
      Requested-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      6a388618
  9. 11 12月, 2013 1 次提交
  10. 04 12月, 2013 1 次提交
  11. 24 11月, 2013 2 次提交
    • K
      block: Generic bio chaining · 196d38bc
      Kent Overstreet 提交于
      This adds a generic mechanism for chaining bio completions. This is
      going to be used for a bio_split() replacement, and it turns out to be
      very useful in a fair amount of driver code - a fair number of drivers
      were implementing this in their own roundabout ways, often painfully.
      
      Note that this means it's no longer to call bio_endio() more than once
      on the same bio! This can cause problems for drivers that save/restore
      bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
      - in all but the simplest cases they'd be better off just cloning the
      bio, and immutable biovecs is making bio cloning cheaper. But for now,
      we add a bio_endio_nodec() for these cases.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      196d38bc
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  12. 13 11月, 2013 1 次提交
  13. 12 11月, 2013 3 次提交
    • J
      dm cache: add cache block invalidation support · 65790ff9
      Joe Thornber 提交于
      Cache block invalidation is removing an entry from the cache without
      writing it back.  Cache blocks can be invalidated via the
      'invalidate_cblocks' message, which takes an arbitrary number of cblock
      ranges:
         invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
      
      E.g.
         dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      65790ff9
    • J
      dm cache: add passthrough mode · 2ee57d58
      Joe Thornber 提交于
      "Passthrough" is a dm-cache operating mode (like writethrough or
      writeback) which is intended to be used when the cache contents are not
      known to be coherent with the origin device.  It behaves as follows:
      
      * All reads are served from the origin device (all reads miss the cache)
      * All writes are forwarded to the origin device; additionally, write
        hits cause cache block invalidates
      
      This mode decouples cache coherency checks from cache device creation,
      largely to avoid having to perform coherency checks while booting.  Boot
      scripts can create cache devices in passthrough mode and put them into
      service (mount cached filesystems, for example) without having to worry
      about coherency.  Coherency that exists is maintained, although the
      cache will gradually cool as writes take place.
      
      Later, applications can perform coherency checks, the nature of which
      will depend on the type of the underlying storage.  If coherency can be
      verified, the cache device can be transitioned to writethrough or
      writeback mode while still warm; otherwise, the cache contents can be
      discarded prior to transitioning to the desired operating mode.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMorgan Mears <Morgan.Mears@netapp.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2ee57d58
    • J
      dm cache: cache shrinking support · f494a9c6
      Joe Thornber 提交于
      Allow a cache to shrink if the blocks being removed from the cache are
      not dirty.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f494a9c6
  14. 10 11月, 2013 8 次提交
  15. 23 8月, 2013 3 次提交
  16. 11 7月, 2013 1 次提交
  17. 10 5月, 2013 6 次提交
  18. 05 4月, 2013 2 次提交
    • M
      dm cache: reduce bio front_pad size in writeback mode · 19b0092e
      Mike Snitzer 提交于
      A recent patch to fix the dm cache target's writethrough mode extended
      the bio's front_pad to include a 1056-byte struct dm_bio_details.
      Writeback mode doesn't need this, so this patch reduces the
      per_bio_data_size to 16 bytes in this case instead of 1096.
      
      The dm_bio_details structure was added in "dm cache: fix writes to
      cache device in writethrough mode" which fixed commit e2e74d61 ("dm
      cache: fix race in writethrough implementation").  In writeback mode
      we avoid allocating the writethrough-specific members of the
      per_bio_data structure (the dm_bio_details structure included).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      19b0092e
    • D
      dm cache: fix writes to cache device in writethrough mode · b844fe69
      Darrick J. Wong 提交于
      The dm-cache writethrough strategy introduced by commit e2e74d61
      ("dm cache: fix race in writethrough implementation") issues a bio to
      the origin device, remaps and then issues the bio to the cache device.
      This more conservative in-series approach was selected to favor
      correctness over performance (of the previous parallel writethrough).
      However, this in-series implementation that reuses the same bio to write
      both the origin and cache device didn't take into account that the block
      layer's req_bio_endio() modifies a completing bio's bi_sector and
      bi_size.  So the new writethrough strategy needs to preserve these bio
      fields, and restore them before submission to the cache device,
      otherwise nothing gets written to the cache (because bi_size is 0).
      
      This patch adds a struct dm_bio_details field to struct per_bio_data,
      and uses dm_bio_record() and dm_bio_restore() to ensure the bio is
      restored before reissuing to the cache device.  Adding such a large
      structure to the per_bio_data is not ideal but we can improve this
      later, for now correctness is the important thing.
      
      This problem initially went unnoticed because the dm-cache test-suite
      uses a linear DM device for the dm-cache device's origin device.
      Writethrough worked as expected because DM submits a *clone* of the
      original bio, so the original bio which was reused for the cache was
      never touched.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b844fe69
  19. 21 3月, 2013 1 次提交
    • M
      dm cache: policy ignore hints if generated by different version · ea2dd8c1
      Mike Snitzer 提交于
      When reading the dm cache metadata from disk, ignore the policy hints
      unless they were generated by the same major version number of the same
      policy module.
      
      The hints are considered to be private data belonging to the specific
      module that generated them and there is no requirement for them to make
      sense to different versions of the policy that generated them.
      Policy modules are all required to work fine if no previous hints are
      supplied (or if existing hints are lost).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ea2dd8c1