1. 20 11月, 2014 3 次提交
    • M
      dm: enhance internal suspend and resume interface · ffcc3936
      Mike Snitzer 提交于
      Rename dm_internal_{suspend,resume} to dm_internal_{suspend,resume}_fast
      -- dm-stats will continue using these methods to avoid all the extra
      suspend/resume logic that is not needed in order to quickly flush IO.
      
      Introduce dm_internal_suspend_noflush() variant that actually calls the
      mapped_device's target callbacks -- otherwise target-specific hooks are
      avoided (e.g. dm-thin's thin_presuspend and thin_postsuspend).  Common
      code between dm_internal_{suspend_noflush,resume} and
      dm_{suspend,resume} was factored out as __dm_{suspend,resume}.
      
      Update dm_internal_{suspend_noflush,resume} to always take and release
      the mapped_device's suspend_lock.  Also update dm_{suspend,resume} to be
      aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond
      accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to
      be cleared.  Add lockdep annotation to dm_suspend() and dm_resume().
      
      The existing DM_SUSPEND_FLAG remains unchanged.
      DM_INTERNAL_SUSPEND_FLAG is set by dm_internal_suspend_noflush() and
      cleared by dm_internal_resume().
      
      Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device
      was already suspended when dm_internal_suspend_noflush() was called --
      this can be thought of as a "nested suspend".  A "nested suspend" can
      occur with legacy userspace dm-thin code that might suspend all active
      thin volumes before suspending the pool for resize.
      
      But otherwise, in the normal dm-thin-pool suspend case moving forward:
      the thin-pool will have DM_SUSPEND_FLAG set and all active thins from
      that thin-pool will have DM_INTERNAL_SUSPEND_FLAG set.
      
      Also add DM_INTERNAL_SUSPEND_FLAG to status report.  This new
      DM_INTERNAL_SUSPEND_FLAG state is being reported to assist with
      debugging (e.g. 'dmsetup info' will report an internally suspended
      device accordingly).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      ffcc3936
    • M
      dm: add presuspend_undo hook to target_type · d67ee213
      Mike Snitzer 提交于
      The DM thin-pool target now must undo the changes performed during
      pool_presuspend() so introduce presuspend_undo hook in target_type.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      d67ee213
    • M
      dm: return earlier from dm_blk_ioctl if target doesn't implement .ioctl · 4d341d82
      Mike Snitzer 提交于
      No point checking if the device is suspended if the current target
      doesn't even implement .ioctl
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4d341d82
  2. 11 11月, 2014 4 次提交
  3. 06 10月, 2014 3 次提交
    • B
      dm: allow active and inactive tables to share dm_devs · 86f1152b
      Benjamin Marzinski 提交于
      Until this change, when loading a new DM table, DM core would re-open
      all of the devices in the DM table.  Now, DM core will avoid redundant
      device opens (and closes when destroying the old table) if the old
      table already has a device open using the same mode.  This is achieved
      by managing reference counts on the table_devices that DM core now
      stores in the mapped_device structure (rather than in the dm_table
      structure).  So a mapped_device's active and inactive dm_tables' dm_dev
      lists now just point to the dm_devs stored in the mapped_device's
      table_devices list.
      
      This improvement in DM core's device reference counting has the
      side-effect of fixing a long-standing limitation of the multipath
      target: a DM multipath table couldn't include any paths that were unusable
      (failed).  For example: if all paths have failed and you add a new,
      working, path to the table; you can't use it since the table load would
      fail due to it still containing failed paths.  Now a re-load of a
      multipath table can include failed devices and when those devices become
      active again they can be used instantly.
      
      The device list code in dm.c isn't a straight copy/paste from the code in
      dm-table.c, but it's very close (aside from some variable renames).  One
      subtle difference is that find_table_device for the tables_devices list
      will only match devices with the same name and mode.  This is because we
      don't want to upgrade a device's mode in the active table when an
      inactive table is loaded.
      
      Access to the mapped_device structure's tables_devices list requires a
      mutex (tables_devices_lock), so that tables cannot be created and
      destroyed concurrently.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      86f1152b
    • J
      dm: use bioset_create_nobvec() · 3d8aab2d
      Junichi Nomura 提交于
      Since DM core uses bio_clone_fast() for both bio-based and request-based
      DM devices there is no need for DM's bioset to have a bvec mempool.
      
      With this patch, on arch with 4KB page for example, memory usage will be
      reduced by 64KB for each bio-based DM device and 1MB for each
      request-based DM device.
      
      For example, when you create 10,000 bio-based DM devices and 1,000
      request-based DM devices, memory usage of biovec under no load is:
        # grep biovec /proc/slabinfo
      
        biovec-256        418068 418068   4096  ...
        biovec-128             0      0   2048  ...
        biovec-64              0      0   1024  ...
        biovec-16              0      0    256  ...
      
      With this patch series applied, the usage becomes:
        # grep biovec /proc/slabinfo
      
        biovec-256           116    116   4096  ...
        biovec-128             0      0   2048  ...
        biovec-64              0      0   1024  ...
        biovec-16              0      0    256  ...
      
      So 4096 * (418068 - 116) = 1.6GB of memory is saved in this example.
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3d8aab2d
    • J
      dm: remove nr_iovecs parameter from alloc_tio() · 99778273
      Junichi Nomura 提交于
      alloc_tio() uses bio_alloc_bioset() to allocate a clone-bio for a bio.
      alloc_tio() takes the number of bvecs to allocate for the clone-bio.
      However, with v3.14's immutable biovec changes DM now uses
      __bio_clone_fast() and no longer needs to allocate bvecs.
      
      In practice, the 'nr_iovecs' passed to alloc_tio() is always effectively
      0.  __clone_and_map_simple_bio() looked like it was passing non-zero
      nr_iovecs, but its value was always within the range of inline bvecs and
      no allocation actually happened.  If allocation happened, the BUG_ON() in
      __bio_clone_fast() would've triggered.
      
      Remove the nr_iovecs parameter from alloc_tio() to prevent possible
      future bio_alloc_bioset() mis-use of a new bioset interface that will no
      longer allow bvecs to be allocated.
      
      Also fix extra whitespace before the __bio_clone_fast() call in
      __clone_and_map_simple_bio().
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      99778273
  4. 11 7月, 2014 1 次提交
    • M
      dm: allocate a special workqueue for deferred device removal · acfe0ad7
      Mikulas Patocka 提交于
      The commit 2c140a24 ("dm: allow remove to be deferred") introduced a
      deferred removal feature for the device mapper.  When this feature is
      used (by passing a flag DM_DEFERRED_REMOVE to DM_DEV_REMOVE_CMD ioctl)
      and the user tries to remove a device that is currently in use, the
      device will be removed automatically in the future when the last user
      closes it.
      
      Device mapper used the system workqueue to perform deferred removals.
      However, some targets (dm-raid1, dm-mpath, dm-stripe) flush work items
      scheduled for the system workqueue from their destructor.  If the
      destructor itself is called from the system workqueue during deferred
      removal, it introduces a possible deadlock - the workqueue tries to flush
      itself.
      
      Fix this possible deadlock by introducing a new workqueue for deferred
      removals.  We allocate just one workqueue for all dm targets.  The
      ability of dm targets to process IOs isn't dependent on deferred removal
      of unused targets, so a deadlock due to shared workqueue isn't possible.
      
      Also, cleanup local_init() to eliminate potential for returning success
      on failure.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.13+
      acfe0ad7
  5. 04 6月, 2014 4 次提交
  6. 18 4月, 2014 1 次提交
  7. 16 4月, 2014 1 次提交
    • J
      block: remove struct request buffer member · b4f42e28
      Jens Axboe 提交于
      This was used in the olden days, back when onions were proper
      yellow. Basically it mapped to the current buffer to be
      transferred. With highmem being added more than a decade ago,
      most drivers map pages out of a bio, and rq->buffer isn't
      pointing at anything valid.
      
      Convert old style drivers to just use bio_data().
      
      For the discard payload use case, just reference the page
      in the bio.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b4f42e28
  8. 28 3月, 2014 4 次提交
  9. 15 1月, 2014 1 次提交
    • M
      dm sysfs: fix a module unload race · 2995fa78
      Mikulas Patocka 提交于
      This reverts commit be35f486 ("dm: wait until embedded kobject is
      released before destroying a device") and provides an improved fix.
      
      The kobject release code that calls the completion must be placed in a
      non-module file, otherwise there is a module unload race (if the process
      calling dm_kobject_release is preempted and the DM module unloaded after
      the completion is triggered, but before dm_kobject_release returns).
      
      To fix this race, this patch moves the completion code to dm-builtin.c
      which is always compiled directly into the kernel if BLK_DEV_DM is
      selected.
      
      The patch introduces a new dm_kobject_holder structure, its purpose is
      to keep the completion and kobject in one place, so that it can be
      accessed from non-module code without the need to export the layout of
      struct mapped_device to that code.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2995fa78
  10. 08 1月, 2014 2 次提交
    • M
      dm: wait until embedded kobject is released before destroying a device · be35f486
      Mikulas Patocka 提交于
      There may be other parts of the kernel holding a reference on the dm
      kobject.  We must wait until all references are dropped before
      deallocating the mapped_device structure.
      
      The dm_kobject_release method signals that all references are dropped
      via completion.  But dm_kobject_release doesn't free the kobject (which
      is embedded in the mapped_device structure).
      
      This is the sequence of operations:
      * when destroying a DM device, call kobject_put from dm_sysfs_exit
      * wait until all users stop using the kobject, when it happens the
        release method is called
      * the release method signals the completion and should return without
        delay
      * the dm device removal code that waits on the completion continues
      * the dm device removal code drops the dm_mod reference the device had
      * the dm device removal code frees the mapped_device structure that
        contains the kobject
      
      Using kobject this way should avoid the module unload race that was
      mentioned at the beginning of this thread:
      https://lkml.org/lkml/2014/1/4/83Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      be35f486
    • M
      dm: remove pointless kobject comparison in dm_get_from_kobject · 1ddd641d
      Mikulas Patocka 提交于
      The comparison is always true and the compiler optimizes it out anyway.
      
      Milan offered additional context relative to the original commit
      784aae73 ("dm: add name and uuid to sysfs") which introduced the code:
      "I think it is just relict of some experiments before I committed this
      simple embedded sysfs kobj handling".
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMilan Broz <gmazyland@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1ddd641d
  11. 24 11月, 2013 2 次提交
    • K
      dm: Refactor for new bio cloning/splitting · 1c3b13e6
      Kent Overstreet 提交于
      We need to convert the dm code to the new bvec_iter primitives which
      respect bi_bvec_done; they also allow us to drastically simplify dm's
      bio splitting code.
      
      Also, it's no longer necessary to save/restore the bvec array anymore -
      driver conversions for immutable bvecs are done, so drivers should never
      be modifying it.
      
      Also kill bio_sector_offset(), dm was the only user and it doesn't make
      much sense anymore.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      1c3b13e6
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  12. 10 11月, 2013 1 次提交
    • M
      dm: allow remove to be deferred · 2c140a24
      Mikulas Patocka 提交于
      This patch allows the removal of an open device to be deferred until
      it is closed.  (Previously such a removal attempt would fail.)
      
      The deferred remove functionality is enabled by setting the flag
      DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or
      DM_REMOVE_ALL ioctl.
      
      On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if
      the device was removed immediately or flagged to be removed on close -
      if the flag is clear, the device was removed.
      
      On return from DM_DEV_STATUS and other ioctls, the flag
      DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on
      closure.
      
      A device that is scheduled to be deleted can be revived using the
      message "@cancel_deferred_remove". This message clears the
      DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2c140a24
  13. 23 9月, 2013 3 次提交
  14. 20 9月, 2013 1 次提交
    • M
      dm mpath: disable WRITE SAME if it fails · f84cb8a4
      Mike Snitzer 提交于
      Workaround the SCSI layer's problematic WRITE SAME heuristics by
      disabling WRITE SAME in the DM multipath device's queue_limits if an
      underlying device disabled it.
      
      The WRITE SAME heuristics, with both the original commit 5db44863
      ("[SCSI] sd: Implement support for WRITE SAME") and the updated commit
      66c28f97 ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling
      WRITE SAME(10) even without successfully determining it is supported.
      After the first failed WRITE SAME the SCSI layer will disable WRITE SAME
      for the device (by setting sdkp->device->no_write_same which results in
      'max_write_same_sectors' in device's queue_limits to be set to 0).
      
      When a device is stacked ontop of such a SCSI device any changes to that
      SCSI device's queue_limits do not automatically propagate up the stack.
      As such, a DM multipath device will not have its WRITE SAME support
      disabled.  This causes the block layer to continue to issue WRITE SAME
      requests to the mpath device which causes paths to fail and (if mpath IO
      isn't configured to queue when no paths are available) it will result in
      actual IO errors to the upper layers.
      
      This fix doesn't help configurations that have additional devices
      stacked ontop of the mpath device (e.g. LVM created linear DM devices
      ontop).  A proper fix that restacks all the queue_limits from the bottom
      of the device stack up will need to be explored if SCSI will continue to
      use this model of optimistically allowing op codes and then disabling
      them after they fail for the first time.
      
      Before this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      device-mapper: multipath: Failing path 8:112.
      end_request: I/O error, dev dm-6, sector 4616
      dm-6: WRITE SAME failed. Manually zeroing.
      end_request: I/O error, dev dm-6, sector 4616
      end_request: I/O error, dev dm-6, sector 5640
      end_request: I/O error, dev dm-6, sector 6664
      end_request: I/O error, dev dm-6, sector 7688
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      end_request: I/O error, dev dm-6, sector 524296
      Aborting journal on device dm-6-8.
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      33553920
      
      After this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      0
      
      It should be noted that WRITE SAME support wasn't enabled in DM
      multipath until v3.10.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.10+
      f84cb8a4
  15. 06 9月, 2013 2 次提交
    • M
      dm: add statistics support · fd2ed4d2
      Mikulas Patocka 提交于
      Support the collection of I/O statistics on user-defined regions of
      a DM device.  If no regions are defined no statistics are collected so
      there isn't any performance impact.  Only bio-based DM devices are
      currently supported.
      
      Each user-defined region specifies a starting sector, length and step.
      Individual statistics will be collected for each step-sized area within
      the range specified.
      
      The I/O statistics counters for each step-sized area of a region are
      in the same format as /sys/block/*/stat or /proc/diskstats but extra
      counters (12 and 13) are provided: total time spent reading and
      writing in milliseconds.  All these counters may be accessed by sending
      the @stats_print message to the appropriate DM device via dmsetup.
      
      The creation of DM statistics will allocate memory via kmalloc or
      fallback to using vmalloc space.  At most, 1/4 of the overall system
      memory may be allocated by DM statistics.  The admin can see how much
      memory is used by reading
      /sys/module/dm_mod/parameters/stats_current_allocated_bytes
      
      See Documentation/device-mapper/statistics.txt for more details.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd2ed4d2
    • M
      dm ioctl: increase granularity of type_lock when loading table · 00c4fc3b
      Mike Snitzer 提交于
      Hold the mapped device's type_lock before calling populate_table() since
      it is where the table's type is determined based on the specified
      targets.  There is no need to allow concurrent table loads to race to
      establish the table's targets or type.
      
      This eliminates the need to grab the lock in dm_table_set_type().
      
      Also verify that the type_lock is held in both dm_set_md_type() and
      dm_get_md_type().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      00c4fc3b
  16. 23 8月, 2013 1 次提交
  17. 11 7月, 2013 3 次提交
    • M
      dm: optimize reorder structure · 2a7faeb1
      Mikulas Patocka 提交于
      This reorder actually improves performance by 20% (from 39.1s to 32.8s)
      on x86-64 quad core Opteron.
      
      I have no explanation for this, possibly it makes some other entries are
      better cache-aligned.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      2a7faeb1
    • M
      dm: optimize use SRCU and RCU · 83d5e5b0
      Mikulas Patocka 提交于
      This patch removes "io_lock" and "map_lock" in struct mapped_device and
      "holders" in struct dm_table and replaces these mechanisms with
      sleepable-rcu.
      
      Previously, the code would call "dm_get_live_table" and "dm_table_put" to
      get and release table. Now, the code is changed to call "dm_get_live_table"
      and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
      dm_put_live_table unlocks it.
      
      dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
      dm_get_live_table/dm_put_live_table. These *_fast functions use
      non-sleepable RCU, so the caller must not block between them.
      
      If the code changes active or inactive dm table, it must call
      dm_sync_table before destroying the old table.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      83d5e5b0
    • H
      dm mpath: fix ioctl deadlock when no paths · 6c182cd8
      Hannes Reinecke 提交于
      When multipath needs to retry an ioctl the reference to the
      current live table needs to be dropped. Otherwise a deadlock
      occurs when all paths are down:
      - dm_blk_ioctl takes a reference to the current table
        and spins in multipath_ioctl().
      - A new table is being loaded, but upon resume the process
        hangs in dm_table_destroy() waiting for references to
        drop to zero.
      
      With this patch the reference to the old table is dropped
      prior to retry, thereby avoiding the deadlock.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6c182cd8
  18. 07 5月, 2013 1 次提交
  19. 19 4月, 2013 1 次提交
  20. 02 3月, 2013 1 次提交
    • A
      dm: add target num_write_bios fn · b0d8ed4d
      Alasdair G Kergon 提交于
      Add a num_write_bios function to struct target.
      
      If an instance of a target sets this, it will be queried before the
      target's mapping function is called on a write bio, and the response
      controls the number of copies of the write bio that the target will
      receive.
      
      This provides a convenient way for a target to send the same data to
      more than one device.  The new cache target uses this in writethrough
      mode, to send the data both to the cache and the backing device.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b0d8ed4d