1. 28 3月, 2014 1 次提交
  2. 15 1月, 2014 1 次提交
    • M
      dm sysfs: fix a module unload race · 2995fa78
      Mikulas Patocka 提交于
      This reverts commit be35f486 ("dm: wait until embedded kobject is
      released before destroying a device") and provides an improved fix.
      
      The kobject release code that calls the completion must be placed in a
      non-module file, otherwise there is a module unload race (if the process
      calling dm_kobject_release is preempted and the DM module unloaded after
      the completion is triggered, but before dm_kobject_release returns).
      
      To fix this race, this patch moves the completion code to dm-builtin.c
      which is always compiled directly into the kernel if BLK_DEV_DM is
      selected.
      
      The patch introduces a new dm_kobject_holder structure, its purpose is
      to keep the completion and kobject in one place, so that it can be
      accessed from non-module code without the need to export the layout of
      struct mapped_device to that code.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2995fa78
  3. 08 1月, 2014 2 次提交
    • M
      dm: wait until embedded kobject is released before destroying a device · be35f486
      Mikulas Patocka 提交于
      There may be other parts of the kernel holding a reference on the dm
      kobject.  We must wait until all references are dropped before
      deallocating the mapped_device structure.
      
      The dm_kobject_release method signals that all references are dropped
      via completion.  But dm_kobject_release doesn't free the kobject (which
      is embedded in the mapped_device structure).
      
      This is the sequence of operations:
      * when destroying a DM device, call kobject_put from dm_sysfs_exit
      * wait until all users stop using the kobject, when it happens the
        release method is called
      * the release method signals the completion and should return without
        delay
      * the dm device removal code that waits on the completion continues
      * the dm device removal code drops the dm_mod reference the device had
      * the dm device removal code frees the mapped_device structure that
        contains the kobject
      
      Using kobject this way should avoid the module unload race that was
      mentioned at the beginning of this thread:
      https://lkml.org/lkml/2014/1/4/83Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      be35f486
    • M
      dm: remove pointless kobject comparison in dm_get_from_kobject · 1ddd641d
      Mikulas Patocka 提交于
      The comparison is always true and the compiler optimizes it out anyway.
      
      Milan offered additional context relative to the original commit
      784aae73 ("dm: add name and uuid to sysfs") which introduced the code:
      "I think it is just relict of some experiments before I committed this
      simple embedded sysfs kobj handling".
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMilan Broz <gmazyland@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1ddd641d
  4. 24 11月, 2013 2 次提交
    • K
      dm: Refactor for new bio cloning/splitting · 1c3b13e6
      Kent Overstreet 提交于
      We need to convert the dm code to the new bvec_iter primitives which
      respect bi_bvec_done; they also allow us to drastically simplify dm's
      bio splitting code.
      
      Also, it's no longer necessary to save/restore the bvec array anymore -
      driver conversions for immutable bvecs are done, so drivers should never
      be modifying it.
      
      Also kill bio_sector_offset(), dm was the only user and it doesn't make
      much sense anymore.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      1c3b13e6
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  5. 10 11月, 2013 1 次提交
    • M
      dm: allow remove to be deferred · 2c140a24
      Mikulas Patocka 提交于
      This patch allows the removal of an open device to be deferred until
      it is closed.  (Previously such a removal attempt would fail.)
      
      The deferred remove functionality is enabled by setting the flag
      DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or
      DM_REMOVE_ALL ioctl.
      
      On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if
      the device was removed immediately or flagged to be removed on close -
      if the flag is clear, the device was removed.
      
      On return from DM_DEV_STATUS and other ioctls, the flag
      DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on
      closure.
      
      A device that is scheduled to be deleted can be revived using the
      message "@cancel_deferred_remove". This message clears the
      DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2c140a24
  6. 23 9月, 2013 3 次提交
  7. 20 9月, 2013 1 次提交
    • M
      dm mpath: disable WRITE SAME if it fails · f84cb8a4
      Mike Snitzer 提交于
      Workaround the SCSI layer's problematic WRITE SAME heuristics by
      disabling WRITE SAME in the DM multipath device's queue_limits if an
      underlying device disabled it.
      
      The WRITE SAME heuristics, with both the original commit 5db44863
      ("[SCSI] sd: Implement support for WRITE SAME") and the updated commit
      66c28f97 ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling
      WRITE SAME(10) even without successfully determining it is supported.
      After the first failed WRITE SAME the SCSI layer will disable WRITE SAME
      for the device (by setting sdkp->device->no_write_same which results in
      'max_write_same_sectors' in device's queue_limits to be set to 0).
      
      When a device is stacked ontop of such a SCSI device any changes to that
      SCSI device's queue_limits do not automatically propagate up the stack.
      As such, a DM multipath device will not have its WRITE SAME support
      disabled.  This causes the block layer to continue to issue WRITE SAME
      requests to the mpath device which causes paths to fail and (if mpath IO
      isn't configured to queue when no paths are available) it will result in
      actual IO errors to the upper layers.
      
      This fix doesn't help configurations that have additional devices
      stacked ontop of the mpath device (e.g. LVM created linear DM devices
      ontop).  A proper fix that restacks all the queue_limits from the bottom
      of the device stack up will need to be explored if SCSI will continue to
      use this model of optimistically allowing op codes and then disabling
      them after they fail for the first time.
      
      Before this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      device-mapper: multipath: Failing path 8:112.
      end_request: I/O error, dev dm-6, sector 4616
      dm-6: WRITE SAME failed. Manually zeroing.
      end_request: I/O error, dev dm-6, sector 4616
      end_request: I/O error, dev dm-6, sector 5640
      end_request: I/O error, dev dm-6, sector 6664
      end_request: I/O error, dev dm-6, sector 7688
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      end_request: I/O error, dev dm-6, sector 524296
      Aborting journal on device dm-6-8.
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      33553920
      
      After this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      0
      
      It should be noted that WRITE SAME support wasn't enabled in DM
      multipath until v3.10.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.10+
      f84cb8a4
  8. 06 9月, 2013 2 次提交
    • M
      dm: add statistics support · fd2ed4d2
      Mikulas Patocka 提交于
      Support the collection of I/O statistics on user-defined regions of
      a DM device.  If no regions are defined no statistics are collected so
      there isn't any performance impact.  Only bio-based DM devices are
      currently supported.
      
      Each user-defined region specifies a starting sector, length and step.
      Individual statistics will be collected for each step-sized area within
      the range specified.
      
      The I/O statistics counters for each step-sized area of a region are
      in the same format as /sys/block/*/stat or /proc/diskstats but extra
      counters (12 and 13) are provided: total time spent reading and
      writing in milliseconds.  All these counters may be accessed by sending
      the @stats_print message to the appropriate DM device via dmsetup.
      
      The creation of DM statistics will allocate memory via kmalloc or
      fallback to using vmalloc space.  At most, 1/4 of the overall system
      memory may be allocated by DM statistics.  The admin can see how much
      memory is used by reading
      /sys/module/dm_mod/parameters/stats_current_allocated_bytes
      
      See Documentation/device-mapper/statistics.txt for more details.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd2ed4d2
    • M
      dm ioctl: increase granularity of type_lock when loading table · 00c4fc3b
      Mike Snitzer 提交于
      Hold the mapped device's type_lock before calling populate_table() since
      it is where the table's type is determined based on the specified
      targets.  There is no need to allow concurrent table loads to race to
      establish the table's targets or type.
      
      This eliminates the need to grab the lock in dm_table_set_type().
      
      Also verify that the type_lock is held in both dm_set_md_type() and
      dm_get_md_type().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      00c4fc3b
  9. 23 8月, 2013 1 次提交
  10. 11 7月, 2013 3 次提交
    • M
      dm: optimize reorder structure · 2a7faeb1
      Mikulas Patocka 提交于
      This reorder actually improves performance by 20% (from 39.1s to 32.8s)
      on x86-64 quad core Opteron.
      
      I have no explanation for this, possibly it makes some other entries are
      better cache-aligned.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      2a7faeb1
    • M
      dm: optimize use SRCU and RCU · 83d5e5b0
      Mikulas Patocka 提交于
      This patch removes "io_lock" and "map_lock" in struct mapped_device and
      "holders" in struct dm_table and replaces these mechanisms with
      sleepable-rcu.
      
      Previously, the code would call "dm_get_live_table" and "dm_table_put" to
      get and release table. Now, the code is changed to call "dm_get_live_table"
      and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
      dm_put_live_table unlocks it.
      
      dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
      dm_get_live_table/dm_put_live_table. These *_fast functions use
      non-sleepable RCU, so the caller must not block between them.
      
      If the code changes active or inactive dm table, it must call
      dm_sync_table before destroying the old table.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      83d5e5b0
    • H
      dm mpath: fix ioctl deadlock when no paths · 6c182cd8
      Hannes Reinecke 提交于
      When multipath needs to retry an ioctl the reference to the
      current live table needs to be dropped. Otherwise a deadlock
      occurs when all paths are down:
      - dm_blk_ioctl takes a reference to the current table
        and spins in multipath_ioctl().
      - A new table is being loaded, but upon resume the process
        hangs in dm_table_destroy() waiting for references to
        drop to zero.
      
      With this patch the reference to the old table is dropped
      prior to retry, thereby avoiding the deadlock.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6c182cd8
  11. 07 5月, 2013 1 次提交
  12. 19 4月, 2013 1 次提交
  13. 02 3月, 2013 9 次提交
    • A
      dm: add target num_write_bios fn · b0d8ed4d
      Alasdair G Kergon 提交于
      Add a num_write_bios function to struct target.
      
      If an instance of a target sets this, it will be queried before the
      target's mapping function is called on a write bio, and the response
      controls the number of copies of the write bio that the target will
      receive.
      
      This provides a convenient way for a target to send the same data to
      more than one device.  The new cache target uses this in writethrough
      mode, to send the data both to the cache and the backing device.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b0d8ed4d
    • J
      dm: merge io_pool and tio_pool · 5f015204
      Jun'ichi Nomura 提交于
      This patch merges io_pool and tio_pool into io_pool and cleans up
      related functions.
      
      Though device-mapper used to have 2 pools of objects for each dm device,
      the use of bioset frontbad for per-bio data has shrunk the number of
      pools to 1 for both bio-based and request-based device types.
      (See c0820cf5 "dm: introduce per_bio_data" and
       94818742 "dm: Use bioset's front_pad for dm_rq_clone_bio_info")
      
      So dm no longer has to maintain 2 different pointers.
      
      No functional changes.
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5f015204
    • J
      dm: remove unused _rq_bio_info_cache · 23e5083b
      Jun'ichi Nomura 提交于
      Remove _rq_bio_info_cache, which is no longer used.
      No functional changes.
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      23e5083b
    • M
      dm: fix limits initialization when there are no data devices · 87eb5b21
      Mike Christie 提交于
      dm_calculate_queue_limits will first reset the provided limits to
      defaults using blk_set_stacking_limits; whereby defeating the purpose of
      retaining the original live table's limits -- as was intended via commit
      3ae70656 ("dm: retain table limits when
      swapping to new table with no devices").
      
      Fix this improper limits initialization (in the no data devices case) by
      avoiding the call to dm_calculate_queue_limits.
      
      [patch header revised by Mike Snitzer]
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.6+
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      87eb5b21
    • A
      dm: refactor bio cloning · e4c93811
      Alasdair G Kergon 提交于
      Refactor part of the bio splitting and cloning code to try to make it
      easier to understand.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      e4c93811
    • A
      dm: rename bio cloning functions · 14fe594d
      Alasdair G Kergon 提交于
      Rename functions involved in splitting and cloning bios.
      
      The sequence of functions is now:
        (1) __split_and_process* - entry point that selects the processing strategy
        (2) __send* - prepare the details for each bio needed and loop through them
        (3) __clone_and_map* - creates a clone and maps it
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      14fe594d
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • A
      dm: clean up clone_bio · bd2a49b8
      Alasdair G Kergon 提交于
      Remove the no-longer-used struct bio_set argument from clone_bio and split_bvec.
      Use tio->ti in __map_bio() instead of passing in ti.
      Factor out some code for setting up cloned bios.
      Take target_request_nr as a parameter to alloc_tio().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Joe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      bd2a49b8
    • J
      dm: do not replace bioset for request based dm · 16245bdc
      Jun'ichi Nomura 提交于
      This patch fixes a regression introduced in v3.8, which causes oops
      like this when dm-multipath is used:
      
      general protection fault: 0000 [#1] SMP
      RIP: 0010:[<ffffffff810fe754>]  [<ffffffff810fe754>] mempool_free+0x24/0xb0
      Call Trace:
        <IRQ>
        [<ffffffff81187417>] bio_put+0x97/0xc0
        [<ffffffffa02247a5>] end_clone_bio+0x35/0x90 [dm_mod]
        [<ffffffff81185efd>] bio_endio+0x1d/0x30
        [<ffffffff811f03a3>] req_bio_endio.isra.51+0xa3/0xe0
        [<ffffffff811f2f68>] blk_update_request+0x118/0x520
        [<ffffffff811f3397>] blk_update_bidi_request+0x27/0xa0
        [<ffffffff811f343c>] blk_end_bidi_request+0x2c/0x80
        [<ffffffff811f34d0>] blk_end_request+0x10/0x20
        [<ffffffffa000b32b>] scsi_io_completion+0xfb/0x6c0 [scsi_mod]
        [<ffffffffa000107d>] scsi_finish_command+0xbd/0x120 [scsi_mod]
        [<ffffffffa000b12f>] scsi_softirq_done+0x13f/0x160 [scsi_mod]
        [<ffffffff811f9fd0>] blk_done_softirq+0x80/0xa0
        [<ffffffff81044551>] __do_softirq+0xf1/0x250
        [<ffffffff8142ee8c>] call_softirq+0x1c/0x30
        [<ffffffff8100420d>] do_softirq+0x8d/0xc0
        [<ffffffff81044885>] irq_exit+0xd5/0xe0
        [<ffffffff8142f3e3>] do_IRQ+0x63/0xe0
        [<ffffffff814257af>] common_interrupt+0x6f/0x6f
        <EOI>
        [<ffffffffa021737c>] srp_queuecommand+0x8c/0xcb0 [ib_srp]
        [<ffffffffa0002f18>] scsi_dispatch_cmd+0x148/0x310 [scsi_mod]
        [<ffffffffa000a38e>] scsi_request_fn+0x31e/0x520 [scsi_mod]
        [<ffffffff811f1e57>] __blk_run_queue+0x37/0x50
        [<ffffffff811f1f69>] blk_delay_work+0x29/0x40
        [<ffffffff81059003>] process_one_work+0x1c3/0x5c0
        [<ffffffff8105b22e>] worker_thread+0x15e/0x440
        [<ffffffff8106164b>] kthread+0xdb/0xe0
        [<ffffffff8142db9c>] ret_from_fork+0x7c/0xb0
      
      The regression was introduced by the change
      c0820cf5 "dm: introduce per_bio_data", where dm started to replace
      bioset during table replacement.
      For bio-based dm, it is good because clone bios do not exist during the
      table replacement.
      For request-based dm, however, (not-yet-mapped) clone bios may stay in
      request queue and survive during the table replacement.
      So freeing the old bioset could cause the oops in bio_put().
      
      Since the size of front_pad may change only with bio-based dm,
      it is not necessary to replace bioset for request-based dm.
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      16245bdc
  14. 28 2月, 2013 2 次提交
  15. 31 1月, 2013 1 次提交
  16. 14 1月, 2013 1 次提交
    • T
      block: add missing block_bio_complete() tracepoint · 3a366e61
      Tejun Heo 提交于
      bio completion didn't kick block_bio_complete TP.  Only dm was
      explicitly triggering the TP on IO completion.  This makes
      block_bio_complete TP useless for tracers which want to know about
      bios, and all other bio based drivers skip generating blktrace
      completion events.
      
      This patch makes all bio completions via bio_endio() generate
      block_bio_complete TP.
      
      * Explicit trace_block_bio_complete() invocation removed from dm and
        the trace point is unexported.
      
      * @rq dropped from trace_block_bio_complete().  bios may fly around
        w/o queue associated.  Verifying and accessing the assocaited queue
        belongs to TP probes.
      
      * blktrace now gets both request and bio completions.  Make it ignore
        bio completions if request completion path is happening.
      
      This makes all bio based drivers generate blktrace completion events
      properly and makes the block_bio_complete TP actually useful.
      
      v2: With this change, block_bio_complete TP could be invoked on sg
          commands which have bio's with %NULL bi_bdev.  Update TP
          assignment code to check whether bio->bi_bdev is %NULL before
          dereferencing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Original-patch-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a366e61
  17. 22 12月, 2012 4 次提交
  18. 23 11月, 2012 1 次提交
  19. 13 10月, 2012 1 次提交
    • M
      dm: store dm_target_io in bio front_pad · dba14160
      Mikulas Patocka 提交于
      Use the recently-added bio front_pad field to allocate struct dm_target_io.
      
      Prior to this patch, dm_target_io was allocated from a mempool. For each
      dm_target_io, there is exactly one bio allocated from a bioset.
      
      This patch merges these two allocations into one allocation: we create a
      bioset with front_pad equal to the size of dm_target_io so that every
      bio allocated from the bioset has sizeof(struct dm_target_io) bytes
      before it. We allocate a bio and use the bytes before the bio as
      dm_target_io.
      
      _tio_cache is removed and the tio_pool mempool is now only used for
      request-based devices.
      
      This idea was introduced by Kent Overstreet.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: tj@kernel.org
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Bill Pemberton <wfp5p@viridian.itc.virginia.edu>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      dba14160
  20. 27 9月, 2012 2 次提交
    • M
      dm: retain table limits when swapping to new table with no devices · 3ae70656
      Mike Snitzer 提交于
      Add a safety net that will re-use the DM device's existing limits in the
      event that DM device has a temporary table that doesn't have any
      component devices.  This is to reduce the chance that requests not
      respecting the hardware limits will reach the device.
      
      DM recalculates queue limits based only on devices which currently exist
      in the table.  This creates a problem in the event all devices are
      temporarily removed such as all paths being lost in multipath.  DM will
      reset the limits to the maximum permissible, which can then assemble
      requests which exceed the limits of the paths when the paths are
      restored.  The request will fail the blk_rq_check_limits() test when
      sent to a path with lower limits, and will be retried without end by
      multipath.  This became a much bigger issue after v3.6 commit fe86cdce
      ("block: do not artificially constrain max_sectors for stacking
      drivers").
      Reported-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3ae70656
    • M
      dm: handle requests beyond end of device instead of using BUG_ON · ba1cbad9
      Mike Snitzer 提交于
      The access beyond the end of device BUG_ON that was introduced to
      dm_request_fn via commit 29e4013d ("dm: implement
      REQ_FLUSH/FUA support for request-based dm") was an overly
      drastic (but simple) response to this situation.
      
      I have received a report that this BUG_ON was hit and now think
      it would be better to use dm_kill_unmapped_request() to fail the clone
      and original request with -EIO.
      
      map_request() will assign the valid target returned by
      dm_table_find_target to tio->ti.  But when the target
      isn't valid tio->ti is never assigned (because map_request isn't
      called); so add a check for tio->ti != NULL to dm_done().
      Reported-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@vger.kernel.org # v2.6.37+
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ba1cbad9