1. 08 3月, 2017 2 次提交
  2. 11 6月, 2016 1 次提交
  3. 11 3月, 2016 1 次提交
  4. 10 12月, 2015 2 次提交
    • S
      dm verity: add support for forward error correction · a739ff3f
      Sami Tolvanen 提交于
      Add support for correcting corrupted blocks using Reed-Solomon.
      
      This code uses RS(255, N) interleaved across data and hash
      blocks. Each error-correcting block covers N bytes evenly
      distributed across the combined total data, so that each byte is a
      maximum distance away from the others. This makes it possible to
      recover from several consecutive corrupted blocks with relatively
      small space overhead.
      
      In addition, using verity hashes to locate erasures nearly doubles
      the effectiveness of error correction. Being able to detect
      corrupted blocks also improves performance, because only corrupted
      blocks need to corrected.
      
      For a 2 GiB partition, RS(255, 253) (two parity bytes for each
      253-byte block) can correct up to 16 MiB of consecutive corrupted
      blocks if erasures can be located, and 8 MiB if they cannot, with
      16 MiB space overhead.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      a739ff3f
    • S
      dm verity: move dm-verity.c to dm-verity-target.c · 03045cba
      Sami Tolvanen 提交于
      Prepare for extending dm-verity with an optional object.  Follows the
      naming convention used by other DM targets (e.g. dm-cache and dm-era).
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      03045cba
  5. 24 10月, 2015 1 次提交
    • S
      raid5: add basic stripe log · f6bed0ef
      Shaohua Li 提交于
      This introduces a simple log for raid5. Data/parity writing to raid
      array first writes to the log, then write to raid array disks. If
      crash happens, we can recovery data from the log. This can speed up
      raid resync and fix write hole issue.
      
      The log structure is pretty simple. Data/meta data is stored in block
      unit, which is 4k generally. It has only one type of meta data block.
      The meta data block can track 3 types of data, stripe data, stripe
      parity and flush block. MD superblock will point to the last valid
      meta data block. Each meta data block has checksum/seq number, so
      recovery can scan the log correctly. We store a checksum of stripe
      data/parity to the metadata block, so meta data and stripe data/parity
      can be written to log disk together. otherwise, meta data write must
      wait till stripe data/parity is finished.
      
      For stripe data, meta data block will record stripe data sector and
      size. Currently the size is always 4k. This meta data record can be made
      simpler if we just fix write hole (eg, we can record data of a stripe's
      different disks together), but this format can be extended to support
      caching in the future, which must record data address/size.
      
      For stripe parity, meta data block will record stripe sector. It's
      size should be 4k (for raid5) or 8k (for raid6). We always store p
      parity first. This format should work for caching too.
      
      flush block indicates a stripe is in raid array disks. Fixing write
      hole doesn't need this type of meta data, it's for caching extension.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      f6bed0ef
  6. 12 6月, 2015 1 次提交
    • J
      dm cache: add stochastic-multi-queue (smq) policy · 66a63635
      Joe Thornber 提交于
      The stochastic-multi-queue (smq) policy addresses some of the problems
      with the current multiqueue (mq) policy.
      
      Memory usage
      ------------
      
      The mq policy uses a lot of memory; 88 bytes per cache block on a 64
      bit machine.
      
      SMQ uses 28bit indexes to implement it's data structures rather than
      pointers.  It avoids storing an explicit hit count for each block.  It
      has a 'hotspot' queue rather than a pre cache which uses a quarter of
      the entries (each hotspot block covers a larger area than a single
      cache block).
      
      All these mean smq uses ~25bytes per cache block.  Still a lot of
      memory, but a substantial improvement nontheless.
      
      Level balancing
      ---------------
      
      MQ places entries in different levels of the multiqueue structures
      based on their hit count (~ln(hit count)).  This means the bottom
      levels generally have the most entries, and the top ones have very
      few.  Having unbalanced levels like this reduces the efficacy of the
      multiqueue.
      
      SMQ does not maintain a hit count, instead it swaps hit entries with
      the least recently used entry from the level above.  The over all
      ordering being a side effect of this stochastic process.  With this
      scheme we can decide how many entries occupy each multiqueue level,
      resulting in better promotion/demotion decisions.
      
      Adaptability
      ------------
      
      The MQ policy maintains a hit count for each cache block.  For a
      different block to get promoted to the cache it's hit count has to
      exceed the lowest currently in the cache.  This means it can take a
      long time for the cache to adapt between varying IO patterns.
      Periodically degrading the hit counts could help with this, but I
      haven't found a nice general solution.
      
      SMQ doesn't maintain hit counts, so a lot of this problem just goes
      away.  In addition it tracks performance of the hotspot queue, which
      is used to decide which blocks to promote.  If the hotspot queue is
      performing badly then it starts moving entries more quickly between
      levels.  This lets it adapt to new IO patterns very quickly.
      
      Performance
      -----------
      
      In my tests SMQ shows substantially better performance than MQ.  Once
      this matures a bit more I'm sure it'll become the default policy.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      66a63635
  7. 16 4月, 2015 1 次提交
    • J
      dm: add log writes target · 0e9cebe7
      Josef Bacik 提交于
      Introduce a new target that is meant for file system developers to test file
      system integrity at particular points in the life of a file system.  We capture
      all write requests and associated data and log them to a separate device
      for later replay.  There is a userspace utility to do this replay.  The
      idea behind this is to give file system developers a tool to verify that
      the file system is always consistent.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NZach Brown <zab@zabbo.net>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0e9cebe7
  8. 23 2月, 2015 1 次提交
  9. 28 3月, 2014 1 次提交
    • J
      dm: add era target · eec40579
      Joe Thornber 提交于
      dm-era is a target that behaves similar to the linear target.  In
      addition it keeps track of which blocks were written within a user
      defined period of time called an 'era'.  Each era target instance
      maintains the current era as a monotonically increasing 32-bit
      counter.
      
      Use cases include tracking changed blocks for backup software, and
      partially invalidating the contents of a cache to restore cache
      coherency after rolling back a vendor snapshot.
      
      dm-era is primarily expected to be paired with the dm-cache target.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      eec40579
  10. 15 1月, 2014 1 次提交
    • M
      dm sysfs: fix a module unload race · 2995fa78
      Mikulas Patocka 提交于
      This reverts commit be35f486 ("dm: wait until embedded kobject is
      released before destroying a device") and provides an improved fix.
      
      The kobject release code that calls the completion must be placed in a
      non-module file, otherwise there is a module unload race (if the process
      calling dm_kobject_release is preempted and the DM module unloaded after
      the completion is triggered, but before dm_kobject_release returns).
      
      To fix this race, this patch moves the completion code to dm-builtin.c
      which is always compiled directly into the kernel if BLK_DEV_DM is
      selected.
      
      The patch introduces a new dm_kobject_holder structure, its purpose is
      to keep the completion and kobject in one place, so that it can be
      accessed from non-module code without the need to export the layout of
      struct mapped_device to that code.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2995fa78
  11. 06 9月, 2013 1 次提交
    • M
      dm: add statistics support · fd2ed4d2
      Mikulas Patocka 提交于
      Support the collection of I/O statistics on user-defined regions of
      a DM device.  If no regions are defined no statistics are collected so
      there isn't any performance impact.  Only bio-based DM devices are
      currently supported.
      
      Each user-defined region specifies a starting sector, length and step.
      Individual statistics will be collected for each step-sized area within
      the range specified.
      
      The I/O statistics counters for each step-sized area of a region are
      in the same format as /sys/block/*/stat or /proc/diskstats but extra
      counters (12 and 13) are provided: total time spent reading and
      writing in milliseconds.  All these counters may be accessed by sending
      the @stats_print message to the appropriate DM device via dmsetup.
      
      The creation of DM statistics will allocate memory via kmalloc or
      fallback to using vmalloc space.  At most, 1/4 of the overall system
      memory may be allocated by DM statistics.  The admin can see how much
      memory is used by reading
      /sys/module/dm_mod/parameters/stats_current_allocated_bytes
      
      See Documentation/device-mapper/statistics.txt for more details.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd2ed4d2
  12. 11 7月, 2013 1 次提交
    • J
      dm: add switch target · 9d0eb0ab
      Jim Ramsay 提交于
      dm-switch is a new target that maps IO to underlying block devices
      efficiently when there is a large number of fixed-sized address regions
      but there is no simple pattern to allow for a compact mapping
      representation such as dm-stripe.
      
      Though we have developed this target for a specific storage device, Dell
      EqualLogic, we have made an effort to keep it as general purpose as
      possible in the hope that others may benefit.
      
      Originally developed by Jim Ramsay. Simplified by Mikulas Patocka.
      Signed-off-by: NJim Ramsay <jim_ramsay@dell.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9d0eb0ab
  13. 24 3月, 2013 1 次提交
  14. 02 3月, 2013 3 次提交
  15. 13 10月, 2012 1 次提交
  16. 29 3月, 2012 1 次提交
  17. 01 11月, 2011 2 次提交
    • J
      dm: add thin provisioning target · 991d9fa0
      Joe Thornber 提交于
      Initial EXPERIMENTAL implementation of device-mapper thin provisioning
      with snapshot support.  The 'thin' target is used to create instances of
      the virtual devices that are hosted in the 'thin-pool' target.  The
      thin-pool target provides data sharing among devices.  This sharing is
      made possible using the persistent-data library in the previous patch.
      
      The main highlight of this implementation, compared to the previous
      implementation of snapshots, is that it allows many virtual devices to
      be stored on the same data volume, simplifying administration and
      allowing sharing of data between volumes (thus reducing disk usage).
      
      Another big feature is support for arbitrary depth of recursive
      snapshots (snapshots of snapshots of snapshots ...).  The previous
      implementation of snapshots did this by chaining together lookup tables,
      and so performance was O(depth).  This new implementation uses a single
      data structure so we don't get this degradation with depth.
      
      For further information and examples of how to use this, please read
      Documentation/device-mapper/thin-provisioning.txt
      Signed-off-by: NJoe Thornber <thornber@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      991d9fa0
    • M
      dm: add bufio · 95d402f0
      Mikulas Patocka 提交于
      The dm-bufio interface allows you to do cached I/O on devices,
      holding recently-read blocks in memory and performing delayed writes.
      
      We don't use buffer cache or page cache already present in the kernel, because:
      * we need to handle block sizes larger than a page
      * we can't allocate memory to perform reads or we'd have deadlocks
      
      Currently, when a cache is required, we limit its size to a fraction of
      available memory.  Usage can be viewed and changed in
      /sys/module/dm_bufio/parameters/ .
      
      The first user is thin provisioning, but more dm users are planned.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      95d402f0
  18. 24 3月, 2011 1 次提交
  19. 14 1月, 2011 1 次提交
    • N
      dm: raid456 basic support · 9d09e663
      NeilBrown 提交于
      This patch is the skeleton for the DM target that will be
      the bridge from DM to MD (initially RAID456 and later RAID1).  It
      provides a way to use device-mapper interfaces to the MD RAID456
      drivers.
      
      As with all device-mapper targets, the nominal public interfaces are the
      constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO
      and STATUSTYPE_TABLE).  The CTR table looks like the following:
      
      1: <s> <l> raid \
      2:	<raid_type> <#raid_params> <raid_params> \
      3:	<#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN>
      
      Line 1 contains the standard first three arguments to any device-mapper
      target - the start, length, and target type fields.  The target type in
      this case is "raid".
      
      Line 2 contains the arguments that define the particular raid
      type/personality/level, the required arguments for that raid type, and
      any optional arguments.  Possible raid types include: raid4, raid5_la,
      raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc.  (again, raid1 is
      planned for the future.)  The list of required and optional parameters
      is the same for all the current raid types.  The required parameters are
      positional, while the optional parameters are given as key/value pairs.
      The possible parameters are as follows:
       <chunk_size>		Chunk size in sectors.
       [[no]sync]		Force/Prevent RAID initialization
       [rebuild <idx>]	Rebuild the drive indicated by the index
       [daemon_sleep <ms>]	Time between bitmap daemon work to clear bits
       [min_recovery_rate <kB/sec/disk>]	Throttle RAID initialization
       [max_recovery_rate <kB/sec/disk>]	Throttle RAID initialization
       [max_write_behind <value>]		See '-write-behind=' (man mdadm)
       [stripe_cache <sectors>]		Stripe cache size for higher RAIDs
      
      Line 3 contains the list of devices that compose the array in
      metadata/data device pairs.  If the metadata is stored separately, a '-'
      is given for the metadata device position.  If a drive has failed or is
      missing at creation time, a '-' can be given for both the metadata and
      data drives for a given position.
      
      Examples:
      # RAID4 - 4 data drives, 1 parity
      # No metadata devices specified to hold superblock/bitmap info
      # Chunk size of 1MiB
      # (Lines separated for easy reading)
      0 1960893648 raid \
      	raid4 1 2048 \
      	5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
      
      # RAID4 - 4 data drives, 1 parity (no metadata devices)
      # Chunk size of 1MiB, force RAID initialization,
      #	min recovery rate at 20 kiB/sec/disk
      0 1960893648 raid \
              raid4 4 2048 min_recovery_rate 20 sync\
              5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
      
      Performing a 'dmsetup table' should display the CTR table used to
      construct the mapping (with possible reordering of optional
      parameters).
      
      Performing a 'dmsetup status' will yield information on the state and
      health of the array.  The output is as follows:
      1: <s> <l> raid \
      2:	<raid_type> <#devices> <1 health char for each dev> <resync_ratio>
      
      Line 1 is standard DM output.  Line 2 is best shown by example:
      	0 1960893648 raid raid4 5 AAAAA 2/490221568
      Here we can see the RAID type is raid4, there are 5 devices - all of
      which are 'A'live, and the array is 2/490221568 complete with recovery.
      
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9d09e663
  20. 29 10月, 2009 1 次提交
  21. 16 10月, 2009 1 次提交
  22. 22 6月, 2009 3 次提交
  23. 31 3月, 2009 2 次提交
    • D
      md/raid6: move raid6 data processing to raid6_pq.ko · f701d589
      Dan Williams 提交于
      Move the raid6 data processing routines into a standalone module
      (raid6_pq) to prepare them to be called from async_tx wrappers and other
      non-md drivers/modules.  This precludes a circular dependency of raid456
      needing the async modules for data processing while those modules in
      turn depend on raid456 for the base level synchronous raid6 routines.
      
      To support this move:
      1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h
      2/ The raid6_call, recovery calls, and table symbols are exported
      3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to
         compile
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f701d589
    • C
      cleanup drivers/md/Makefile · 2a40a8ae
      Christoph Hellwig 提交于
      Use the -y variables instead of the old -objs so we can easily add
      conditional objects to the modules.  Also always use += to add
      subobjects to avoid problems when placing additional objects in
      some place in the file.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2a40a8ae
  24. 06 1月, 2009 2 次提交
    • A
      dm snapshot: split out exception store implementations · 4db6bfe0
      Alasdair G Kergon 提交于
      Move the existing snapshot exception store implementations out into
      separate files.  Later patches will place these behind a new
      interface in preparation for alternative implementations.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4db6bfe0
    • M
      dm: add name and uuid to sysfs · 784aae73
      Milan Broz 提交于
      Implement simple read-only sysfs entry for device-mapper block device.
      
      This patch adds a simple sysfs directory named "dm" under block device
      properties and implements
      	- name attribute (string containing mapped device name)
      	- uuid attribute (string containing UUID, or empty string if not set)
      
      The kobject is embedded in mapped_device struct, so no additional
      memory allocation is needed for initializing sysfs entry.
      
      During the processing of sysfs attribute we need to lock mapped device
      which is done by a new function dm_get_from_kobj, which returns the md
      associated with kobject and increases the usage count.
      
      Each 'show attribute' function is responsible for its own locking.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      784aae73
  25. 22 10月, 2008 1 次提交
  26. 05 6月, 2008 2 次提交
  27. 25 4月, 2008 2 次提交
  28. 20 10月, 2007 2 次提交