1. 07 3月, 2014 1 次提交
  2. 06 3月, 2014 1 次提交
    • M
      dm thin: ensure user takes action to validate data and metadata consistency · 07f2b6e0
      Mike Snitzer 提交于
      If a thin metadata operation fails the current transaction will abort,
      whereby causing potential for IO layers up the stack (e.g. filesystems)
      to have data loss.  As such, set THIN_METADATA_NEEDS_CHECK_FLAG in the
      thin metadata's superblock which:
      1) requires the user verify the thin metadata is consistent (e.g. use
         thin_check, etc)
      2) suggests the user verify the thin data is consistent (e.g. use fsck)
      
      The only way to clear the superblock's THIN_METADATA_NEEDS_CHECK_FLAG is
      to run thin_repair.
      
      On metadata operation failure: abort current metadata transaction, set
      pool in read-only mode, and now set the needs_check flag.
      
      As part of this change, constraints are introduced or relaxed:
      * don't allow a pool to transition to write mode if needs_check is set
      * don't allow data or metadata space to be resized if needs_check is set
      * if a thin pool's metadata space is exhausted: the kernel will now
        force the user to take the pool offline for repair before the kernel
        will allow the metadata space to be extended.
      
      Also, update Documentation to include information about when the thin
      provisioning target commits metadata, how it handles metadata failures
      and running out of space.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      07f2b6e0
  3. 17 1月, 2014 1 次提交
    • M
      dm cache: add policy name to status output · 2e68c4e6
      Mike Snitzer 提交于
      The cache's policy may have been established using the "default" alias,
      which is currently the "mq" policy but the default policy may change in
      the future.  It is useful to know exactly which policy is being used.
      
      Add a 'real' member to the dm_cache_policy_type structure and have the
      "default" dm_cache_policy_type point to the real "mq"
      dm_cache_policy_type.  Update dm_cache_policy_get_name() to check if
      real is set, if so report the name of the real policy (not the alias).
      Requested-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2e68c4e6
  4. 10 1月, 2014 1 次提交
    • M
      dm cache: add block sizes and total cache blocks to status output · 6a388618
      Mike Snitzer 提交于
      Improve cache_status to emit:
      <metadata block size> <#used metadata blocks>/<#total metadata blocks>
      <cache block size> <#used cache blocks>/<#total cache blocks>
      ...
      
      Adding the block sizes allows for easier calculation of the overall size
      of both the metadata and cache devices.  Adding <#total cache blocks>
      provides useful context for how much of the cache is used.
      
      Unfortunately these additions to the status will require updates to
      users' scripts that monitor the cache status.  But these changes help
      provide more comprehensive information about the cache device and will
      simplify tools that are being developed to manage dm-cache devices --
      because they won't need to issue 3 operations to cobble together the
      information that we can easily provide via a single status ioctl.
      
      While updating the status documentation in cache.txt spaces were
      tabify'd.
      Requested-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      6a388618
  5. 07 1月, 2014 2 次提交
    • J
      dm cache policy mq: introduce three promotion threshold tunables · 78e03d69
      Joe Thornber 提交于
      Internally the mq policy maintains a promotion threshold variable.  If
      the hit count of a block not in the cache goes above this threshold it
      gets promoted to the cache.
      
      This patch introduces three new tunables that allow you to tweak the
      promotion threshold by adding a small value.  These adjustments depend
      on the io type:
      
         read_promote_adjustment:    READ io, default 4
         write_promote_adjustment:   WRITE io, default 8
         discard_promote_adjustment: READ/WRITE io to a discarded block, default 1
      
      If you're trying to quickly warm a new cache device you may wish to
      reduce these to encourage promotion.  Remember to switch them back to
      their defaults after the cache fills though.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      78e03d69
    • M
      dm thin: add error_if_no_space feature · 787a996c
      Mike Snitzer 提交于
      If the pool runs out of data or metadata space, the pool can either
      queue or error the IO destined to the data device.  The default is to
      queue the IO until more space is added.
      
      An admin may now configure the pool to error IO when no space is
      available by setting the 'error_if_no_space' feature when loading the
      thin-pool table.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      787a996c
  6. 11 12月, 2013 1 次提交
  7. 13 11月, 2013 1 次提交
  8. 12 11月, 2013 2 次提交
    • J
      dm cache: add cache block invalidation support · 65790ff9
      Joe Thornber 提交于
      Cache block invalidation is removing an entry from the cache without
      writing it back.  Cache blocks can be invalidated via the
      'invalidate_cblocks' message, which takes an arbitrary number of cblock
      ranges:
         invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
      
      E.g.
         dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      65790ff9
    • J
      dm cache: add passthrough mode · 2ee57d58
      Joe Thornber 提交于
      "Passthrough" is a dm-cache operating mode (like writethrough or
      writeback) which is intended to be used when the cache contents are not
      known to be coherent with the origin device.  It behaves as follows:
      
      * All reads are served from the origin device (all reads miss the cache)
      * All writes are forwarded to the origin device; additionally, write
        hits cause cache block invalidates
      
      This mode decouples cache coherency checks from cache device creation,
      largely to avoid having to perform coherency checks while booting.  Boot
      scripts can create cache devices in passthrough mode and put them into
      service (mount cached filesystems, for example) without having to worry
      about coherency.  Coherency that exists is maintained, although the
      cache will gradually cool as writes take place.
      
      Later, applications can perform coherency checks, the nature of which
      will depend on the type of the underlying storage.  If coherency can be
      verified, the cache device can be transitioned to writethrough or
      writeback mode while still warm; otherwise, the cache contents can be
      discarded prior to transitioning to the desired operating mode.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMorgan Mears <Morgan.Mears@netapp.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2ee57d58
  9. 10 11月, 2013 2 次提交
    • J
      dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty() · 01911c19
      Joe Thornber 提交于
      There are now two multiqueues for in cache blocks.  A clean one and a
      dirty one.
      
      writeback_work comes from the dirty one.  Demotions come from the clean
      one.
      
      There are two benefits:
      - Performance improvement, since demoting a clean block is a noop.
      - The cache cleans itself when io load is light.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      01911c19
    • M
      dm crypt: add TCW IV mode for old CBC TCRYPT containers · ed04d981
      Milan Broz 提交于
      dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers
      in LRW or XTS block encryption mode.
      
      TCRYPT containers prior to version 4.1 use CBC mode with some additional
      tweaks, this patch adds support for these containers.
      
      This new mode is implemented using special IV generator named TCW
      (TrueCrypt IV with whitening).  TCW IV only supports containers that are
      encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and
      TripleDES).
      
      While this mode is legacy and is known to be vulnerable to some
      watermarking attacks (e.g. revealing of hidden disk existence) it can
      still be useful to activate old containers without using 3rd party
      software or for independent forensic analysis of such containers.
      
      (Both the userspace and kernel code is an independent implementation
      based on the format documentation and it completely avoids use of
      original source code.)
      
      The TCW IV generator uses two additional keys: Kw (whitening seed, size
      is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is
      always the IV size of the selected cipher).  These keys are concatenated
      at the end of the main encryption key provided in mapping table.
      
      While whitening is completely independent from IV, it is implemented
      inside IV generator for simplification.
      
      The whitening value is always 16 bytes long and is calculated per sector
      from provided Kw as initial seed, xored with sector number and mixed
      with CRC32 algorithm.  Resulting value is xored with ciphertext sector
      content.
      
      IV is calculated from the provided Kiv as initial IV seed and xored with
      sector number.
      
      Detailed calculation can be found in the Truecrypt documentation for
      version < 4.1 and will also be described on dm-crypt site, see:
      http://code.google.com/p/cryptsetup/wiki/DMCrypt
      
      The experimental support for activation of these containers is already
      present in git devel brach of cryptsetup.
      Signed-off-by: NMilan Broz <gmazyland@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ed04d981
  10. 06 9月, 2013 1 次提交
    • M
      dm: add statistics support · fd2ed4d2
      Mikulas Patocka 提交于
      Support the collection of I/O statistics on user-defined regions of
      a DM device.  If no regions are defined no statistics are collected so
      there isn't any performance impact.  Only bio-based DM devices are
      currently supported.
      
      Each user-defined region specifies a starting sector, length and step.
      Individual statistics will be collected for each step-sized area within
      the range specified.
      
      The I/O statistics counters for each step-sized area of a region are
      in the same format as /sys/block/*/stat or /proc/diskstats but extra
      counters (12 and 13) are provided: total time spent reading and
      writing in milliseconds.  All these counters may be accessed by sending
      the @stats_print message to the appropriate DM device via dmsetup.
      
      The creation of DM statistics will allocate memory via kmalloc or
      fallback to using vmalloc space.  At most, 1/4 of the overall system
      memory may be allocated by DM statistics.  The admin can see how much
      memory is used by reading
      /sys/module/dm_mod/parameters/stats_current_allocated_bytes
      
      See Documentation/device-mapper/statistics.txt for more details.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd2ed4d2
  11. 23 8月, 2013 3 次提交
  12. 11 7月, 2013 1 次提交
    • J
      dm: add switch target · 9d0eb0ab
      Jim Ramsay 提交于
      dm-switch is a new target that maps IO to underlying block devices
      efficiently when there is a large number of fixed-sized address regions
      but there is no simple pattern to allow for a compact mapping
      representation such as dm-stripe.
      
      Though we have developed this target for a specific storage device, Dell
      EqualLogic, we have made an effort to keep it as general purpose as
      possible in the hope that others may benefit.
      
      Originally developed by Jim Ramsay. Simplified by Mikulas Patocka.
      Signed-off-by: NJim Ramsay <jim_ramsay@dell.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9d0eb0ab
  13. 26 6月, 2013 1 次提交
    • J
      MD: Remember the last sync operation that was performed · c4a39551
      Jonathan Brassow 提交于
      MD:  Remember the last sync operation that was performed
      
      This patch adds a field to the mddev structure to track the last
      sync operation that was performed.  This is especially useful when
      it comes to what is recorded in mismatch_cnt in sysfs.  If the
      last operation was "data-check", then it reports the number of
      descrepancies found by the user-initiated check.  If it was a
      "repair" operation, then it is reporting the number of
      descrepancies repaired.  etc.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c4a39551
  14. 14 6月, 2013 1 次提交
    • J
      DM RAID: Add ability to restore transiently failed devices on resume · 9092c02d
      Jonathan Brassow 提交于
      DM RAID: Add ability to restore transiently failed devices on resume
      
      This patch adds code to the resume function to check over the devices
      in the RAID array.  If any are found to be marked as failed and their
      superblocks can be read, an attempt is made to reintegrate them into
      the array.  This allows the user to refresh the array with a simple
      suspend and resume of the array - rather than having to load a
      completely new table, allocate and initialize all the structures and
      throw away the old instantiation.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9092c02d
  15. 28 5月, 2013 1 次提交
  16. 24 4月, 2013 1 次提交
    • J
      DM RAID: Add message/status support for changing sync action · be83651f
      Jonathan Brassow 提交于
      DM RAID:  Add message/status support for changing sync action
      
      This patch adds a message interface to dm-raid to allow the user to more
      finely control the sync actions being performed by the MD driver.  This
      gives the user the ability to initiate "check" and "repair" (i.e. scrubbing).
      Two additional fields have been appended to the status output to provide more
      information about the type of sync action occurring and the results of those
      actions, specifically: <sync_action> and <mismatch_cnt>.  These new fields
      will always be populated.  This is essentially the device-mapper way of doing
      what MD controls through the 'sync_action' sysfs file and shows through the
      'mismatch_cnt' sysfs file.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      be83651f
  17. 02 3月, 2013 3 次提交
  18. 26 2月, 2013 1 次提交
    • J
      DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms · fe5d2f4a
      Jonathan Brassow 提交于
      DM RAID:  Add support for MD's RAID10 "far" and "offset" algorithms
      
      Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10
      implementation.  This patch adds support for the "far" and "offset"
      algorithms, but only with the improved redundancy that is brought with
      the introduction of the 'use_far_sets' bit, which shifts copied stripes
      according to smaller sets vs the entire array.  That is, the 17th bit
      of the 'layout' variable that defines the RAID10 implementation will
      always be set.   (More information on how the 'layout' variable selects
      the RAID10 algorithm can be found in the opening comments of
      drivers/md/raid10.c.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fe5d2f4a
  19. 24 1月, 2013 1 次提交
    • J
      DM-RAID: Fix RAID10's check for sufficient redundancy · 55ebbb59
      Jonathan Brassow 提交于
      Before attempting to activate a RAID array, it is checked for sufficient
      redundancy.  That is, we make sure that there are not too many failed
      devices - or devices specified for rebuild - to undermine our ability to
      activate the array.  The current code performs this check twice - once to
      ensure there were not too many devices specified for rebuild by the user
      ('validate_rebuild_devices') and again after possibly experiencing a failure
      to read the superblock ('analyse_superblocks').  Neither of these checks are
      sufficient.  The first check is done properly but with insufficient
      information about the possible failure state of the devices to make a good
      determination if the array can be activated.  The second check is simply
      done wrong in the case of RAID10 because it doesn't account for the
      independence of the stripes (i.e. mirror sets).  The solution is to use the
      properly written check ('validate_rebuild_devices'), but perform the check
      after the superblocks have been read and we know which devices have failed.
      This gives us one check instead of two and performs it in a location where
      it can be done right.
      
      Only RAID10 was affected and it was affected in the following ways:
      - the code did not properly catch the condition where a user specified
        a device for rebuild that already had a failed device in the same mirror
        set.  (This condition would, however, be caught at a deeper level in MD.)
      - the code triggers a false positive and denies activation when devices in
        independent mirror sets have failed - counting the failures as though they
        were all in the same set.
      
      The most likely place this error was introduced (or this patch should have
      been included) is in commit 4ec1e369 - first introduced in v3.7-rc1.
      Consequently this fix should also go in v3.7.y, however there is a
      small conflict on the .version in raid_target, so I'll submit a
      separate patch to -stable.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      55ebbb59
  20. 11 10月, 2012 1 次提交
  21. 01 8月, 2012 1 次提交
  22. 27 7月, 2012 3 次提交
  23. 03 7月, 2012 1 次提交
  24. 03 6月, 2012 1 次提交
    • J
      dm thin: provide userspace access to pool metadata · cc8394d8
      Joe Thornber 提交于
      This patch implements two new messages that can be sent to the thin
      pool target allowing it to take a snapshot of the _metadata_.  This,
      read-only snapshot can be accessed by userland, concurrently with the
      live target.
      
      Only one metadata snapshot can be held at a time.  The pool's status
      line will give the block location for the current msnap.
      
      Since version 0.1.5 of the userland thin provisioning tools, the
      thin_dump program displays the msnap as follows:
      
          thin_dump -m <msnap root> <metadata dev>
      
      Available here: https://github.com/jthornber/thin-provisioning-tools
      
      Now that userland can access the metadata we can do various things
      that have traditionally been kernel side tasks:
      
           i) Incremental backups.
      
           By using metadata snapshots we can work out what blocks have
           changed over time.  Combined with data snapshots we can ensure
           the data doesn't change while we back it up.
      
           A short proof of concept script can be found here:
      
           https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb
      
           ii) Migration of thin devices from one pool to another.
      
           iii) Merging snapshots back into an external origin.
      
           iv) Asyncronous replication.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      cc8394d8
  25. 29 3月, 2012 5 次提交
  26. 07 3月, 2012 1 次提交
  27. 21 2月, 2012 1 次提交