1. 27 9月, 2014 4 次提交
  2. 02 7月, 2014 1 次提交
    • G
      bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payload · cbcd1054
      Gu Zheng 提交于
      Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from
      Martin introduces the function bip_integrity_vecs(get the useful vectors)
      to fix the issue about nr_vecs for inline integrity vectors that reported
      by David Milburn.
      
      But it seems that bip_integrity_vecs() will return the wrong number if the
      bio is not based on any bio_set for some reason(bio->bi_pool == NULL),
      because in that case, the bip_inline_vecs[0] is malloced directly.  So
      here we add the bip_max_vcnt to record the count of vector slots, and
      cleanup the function bip_integrity_vecs().
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      cbcd1054
  3. 25 6月, 2014 2 次提交
  4. 23 4月, 2014 1 次提交
  5. 09 4月, 2014 1 次提交
  6. 02 4月, 2014 1 次提交
  7. 11 2月, 2014 1 次提交
  8. 10 2月, 2014 1 次提交
  9. 24 11月, 2013 13 次提交
    • K
      block: Kill bio_pair_split() · 4b1faf93
      Kent Overstreet 提交于
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      4b1faf93
    • K
      block: Introduce new bio_split() · 20d0189b
      Kent Overstreet 提交于
      The new bio_split() can split arbitrary bios - it's not restricted to
      single page bios, like the old bio_split() (previously renamed to
      bio_pair_split()). It also has different semantics - it doesn't allocate
      a struct bio_pair, leaving it up to the caller to handle completions.
      
      Then convert the existing bio_pair_split() users to the new bio_split()
      - and also nvme, which was open coding bio splitting.
      
      (We have to take that BUG_ON() out of bio_integrity_trim() because this
      bio_split() needs to use it, and there's no reason it has to be used on
      bios marked as cloned; BIO_CLONED doesn't seem to have clearly
      documented semantics anyways.)
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Neil Brown <neilb@suse.de>
      20d0189b
    • K
      block: Rename bio_split() -> bio_pair_split() · ee67891b
      Kent Overstreet 提交于
      This is prep work for introducing a more general bio_split().
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Sage Weil <sage@inktank.com>
      ee67891b
    • K
      block: Generic bio chaining · 196d38bc
      Kent Overstreet 提交于
      This adds a generic mechanism for chaining bio completions. This is
      going to be used for a bio_split() replacement, and it turns out to be
      very useful in a fair amount of driver code - a fair number of drivers
      were implementing this in their own roundabout ways, often painfully.
      
      Note that this means it's no longer to call bio_endio() more than once
      on the same bio! This can cause problems for drivers that save/restore
      bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
      - in all but the simplest cases they'd be better off just cloning the
      bio, and immutable biovecs is making bio cloning cheaper. But for now,
      we add a bio_endio_nodec() for these cases.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      196d38bc
    • K
      dm: Refactor for new bio cloning/splitting · 1c3b13e6
      Kent Overstreet 提交于
      We need to convert the dm code to the new bvec_iter primitives which
      respect bi_bvec_done; they also allow us to drastically simplify dm's
      bio splitting code.
      
      Also, it's no longer necessary to save/restore the bvec array anymore -
      driver conversions for immutable bvecs are done, so drivers should never
      be modifying it.
      
      Also kill bio_sector_offset(), dm was the only user and it doesn't make
      much sense anymore.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      1c3b13e6
    • K
      block: Add bio_clone_fast() · 59d276fe
      Kent Overstreet 提交于
      bio_clone() just got more expensive - however, most users of bio_clone()
      don't actually need to modify the biovec. If they aren't modifying the
      biovec, and they can guarantee that the original bio isn't freed before
      the clone (also true in most cases), we can just point the clone at the
      original bio's biovec.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      59d276fe
    • K
      block: Kill bio_iovec_idx(), __bio_iovec() · f619d254
      Kent Overstreet 提交于
      bio_iovec_idx() and __bio_iovec() don't have any valid uses anymore -
      previous users have been converted to bio_iovec_iter() or other methods.
      
      __BVEC_END() has to go too - the bvec array can't be used directly for
      the last biovec because we might only be using the first portion of it,
      we have to iterate over the bvec array with bio_for_each_segment() which
      checks against the current value of bi_iter.bi_size.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      f619d254
    • K
      block: Kill bio_segments()/bi_vcnt usage · 458b76ed
      Kent Overstreet 提交于
      When we start sharing biovecs, keeping bi_vcnt accurate for splits is
      going to be error prone - and unnecessary, if we refactor some code.
      
      So bio_segments() has to go - but most of the existing users just needed
      to know if the bio had multiple segments, which is easier - add a
      bio_multiple_segments() for them.
      
      (Two of the current uses of bio_segments() are going to go away in a
      couple patches, but the current implementation of bio_segments() is
      unsafe as soon as we start doing driver conversions for immutable
      biovecs - so implement a dumb version for bisectability, it'll go away
      in a couple patches)
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      458b76ed
    • K
      bio-integrity: Convert to bvec_iter · d57a5f7c
      Kent Overstreet 提交于
      The bio integrity is also stored in a bvec array, so if we use the bvec
      iter code we just added, the integrity code won't need to implement its
      own iteration stuff (bio_integrity_mark_head(), bio_integrity_mark_tail())
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      d57a5f7c
    • K
      block: Immutable bio vecs · 4550dd6c
      Kent Overstreet 提交于
      This adds a mechanism by which we can advance a bio by an arbitrary
      number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
      indicates the number of bytes completed in the current bvec.
      
      Various driver code still needs to be updated to not refer to the bvec
      directly before we can use this for interesting things, like efficient
      bio splitting.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      4550dd6c
    • K
      block: Convert bio_for_each_segment() to bvec_iter · 7988613b
      Kent Overstreet 提交于
      More prep work for immutable biovecs - with immutable bvecs drivers
      won't be able to use the biovec directly, they'll need to use helpers
      that take into account bio->bi_iter.bi_bvec_done.
      
      This updates callers for the new usage without changing the
      implementation yet.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Cc: support@lsi.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      Cc: cbe-oss-dev@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: DL-MPTFusionLinux@lsi.com
      Cc: linux-scsi@vger.kernel.org
      Cc: devel@driverdev.osuosl.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: cluster-devel@redhat.com
      Cc: linux-mm@kvack.org
      Acked-by: NGeoff Levand <geoff@infradead.org>
      7988613b
    • K
      block: Convert bio_iovec() to bvec_iter · a4ad39b1
      Kent Overstreet 提交于
      For immutable biovecs, we'll be introducing a new bio_iovec() that uses
      our new bvec iterator to construct a biovec, taking into account
      bvec_iter->bi_bvec_done - this patch updates existing users for the new
      usage.
      
      Some of the existing users really do need a pointer into the bvec array
      - those uses are all going to be removed, but we'll need the
      functionality from immutable to remove them - so for now rename the
      existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
      patches.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      a4ad39b1
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  10. 09 11月, 2013 1 次提交
  11. 25 10月, 2013 1 次提交
    • J
      blk-mq: new multi-queue block IO queueing mechanism · 320ae51f
      Jens Axboe 提交于
      Linux currently has two models for block devices:
      
      - The classic request_fn based approach, where drivers use struct
        request units for IO. The block layer provides various helper
        functionalities to let drivers share code, things like tag
        management, timeout handling, queueing, etc.
      
      - The "stacked" approach, where a driver squeezes in between the
        block layer and IO submitter. Since this bypasses the IO stack,
        driver generally have to manage everything themselves.
      
      With drivers being written for new high IOPS devices, the classic
      request_fn based driver doesn't work well enough. The design dates
      back to when both SMP and high IOPS was rare. It has problems with
      scaling to bigger machines, and runs into scaling issues even on
      smaller machines when you have IOPS in the hundreds of thousands
      per device.
      
      The stacked approach is then most often selected as the model
      for the driver. But this means that everybody has to re-invent
      everything, and along with that we get all the problems again
      that the shared approach solved.
      
      This commit introduces blk-mq, block multi queue support. The
      design is centered around per-cpu queues for queueing IO, which
      then funnel down into x number of hardware submission queues.
      We might have a 1:1 mapping between the two, or it might be
      an N:M mapping. That all depends on what the hardware supports.
      
      blk-mq provides various helper functions, which include:
      
      - Scalable support for request tagging. Most devices need to
        be able to uniquely identify a request both in the driver and
        to the hardware. The tagging uses per-cpu caches for freed
        tags, to enable cache hot reuse.
      
      - Timeout handling without tracking request on a per-device
        basis. Basically the driver should be able to get a notification,
        if a request happens to fail.
      
      - Optional support for non 1:1 mappings between issue and
        submission queues. blk-mq can redirect IO completions to the
        desired location.
      
      - Support for per-request payloads. Drivers almost always need
        to associate a request structure with some driver private
        command structure. Drivers can tell blk-mq this at init time,
        and then any request handed to the driver will have the
        required size of memory associated with it.
      
      - Support for merging of IO, and plugging. The stacked model
        gets neither of these. Even for high IOPS devices, merging
        sequential IO reduces per-command overhead and thus
        increases bandwidth.
      
      For now, this is provided as a potential 3rd queueing model, with
      the hope being that, as it matures, it can replace both the classic
      and stacked model. That would get us back to having just 1 real
      model for block devices, leaving the stacked approach to dm/md
      devices (as it was originally intended).
      
      Contributions in this patch from the following people:
      
      Shaohua Li <shli@fusionio.com>
      Alexander Gordeev <agordeev@redhat.com>
      Christoph Hellwig <hch@infradead.org>
      Mike Christie <michaelc@cs.wisc.edu>
      Matias Bjorling <m@bjorling.me>
      Jeff Moyer <jmoyer@redhat.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      320ae51f
  12. 08 7月, 2013 1 次提交
  13. 24 3月, 2013 12 次提交
    • K
      bio-integrity: Add explicit field for owner of bip_buf · 29ed7813
      Kent Overstreet 提交于
      This was the only real user of BIO_CLONED, which didn't have very clear
      semantics. Convert to its own flag so we can get rid of BIO_CLONED.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      29ed7813
    • K
      block: Add an explicit bio flag for bios that own their bvec · a38352e0
      Kent Overstreet 提交于
      This is for the new bio splitting code. When we split a bio, if the
      split occured on a bvec boundry we reuse the bvec for the new bio. But
      that means bio_free() can't free it, hence the explicit flag.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      a38352e0
    • K
      block: Add bio_alloc_pages() · a0787606
      Kent Overstreet 提交于
      More utility code to replace stuff that's getting open coded.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      a0787606
    • K
      block: Add bio_for_each_segment_all() · d74c6d51
      Kent Overstreet 提交于
      __bio_for_each_segment() iterates bvecs from the specified index
      instead of bio->bv_idx.  Currently, the only usage is to walk all the
      bvecs after the bio has been advanced by specifying 0 index.
      
      For immutable bvecs, we need to split these apart;
      bio_for_each_segment() is going to have a different implementation.
      This will also help document the intent of code that's using it -
      bio_for_each_segment_all() is only legal to use for code that owns the
      bio.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Neil Brown <neilb@suse.de>
      CC: Boaz Harrosh <bharrosh@panasas.com>
      d74c6d51
    • K
      block: Add bio_copy_data() · 16ac3d63
      Kent Overstreet 提交于
      This gets open coded quite a bit and it's tricky to get right, so make a
      generic version and convert some existing users over to it instead.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      16ac3d63
    • K
      block: Add submit_bio_wait(), remove from md · 9e882242
      Kent Overstreet 提交于
      Random cleanup - this code was duplicated and it's not really specific
      to md.
      
      Also added the ability to return the actual error code.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      9e882242
    • K
      block: Add bio_end_sector() · f73a1c7d
      Kent Overstreet 提交于
      Just a little convenience macro - main reason to add it now is preparing
      for immutable bio vecs, it'll reduce the size of the patch that puts
      bi_sector/bi_size/bi_idx into a struct bvec_iter.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      CC: Heiko Carstens <heiko.carstens@de.ibm.com>
      CC: linux-s390@vger.kernel.org
      CC: Chris Mason <chris.mason@fusionio.com>
      CC: Steven Whitehouse <swhiteho@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      f73a1c7d
    • K
      block: Add bio_advance() · 054bdf64
      Kent Overstreet 提交于
      This is prep work for immutable bio vecs; we first want to centralize
      where bvecs are modified.
      
      Next two patches convert some existing code to use this function.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      054bdf64
    • K
      block: Convert integrity to bvec_alloc_bs() · 9f060e22
      Kent Overstreet 提交于
      This adds a pointer to the bvec array to struct bio_integrity_payload,
      instead of the bvecs always being inline; then the bvecs are allocated
      with bvec_alloc_bs().
      
      Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
      mempool instead of the bioset, so that bio integrity can use a different
      mempool for its bvecs, and thus avoid a potential deadlock.
      
      This is eventually for immutable bio vecs - immutable bvecs aren't
      useful if we still have to copy them, hence the need for the pointer.
      Less code is always nice too, though.
      
      Also, bio_integrity_alloc() was using fs_bio_set if no bio_set was
      specified. This was wrong - using the bio_set doesn't protect us from
      memory allocation failures, because we just used kmalloc for the
      bio_integrity_payload. But it does introduce the possibility of
      deadlock, if for some reason we weren't supposed to be using fs_bio_set.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      9f060e22
    • K
      block: Fix a buffer overrun in bio_integrity_split() · 6fda981c
      Kent Overstreet 提交于
      bio_integrity_split() seemed to be confusing pointers and arrays -
      bip_vec in bio_integrity_payload was an array appended to the end of the
      payload, so the bio_vecs in struct bio_pair should have come after the
      bio_integrity_payload they're for.
      
      Fix it by making bip_vec a pointer to the inline vecs - a later patch is
      going to make more use of this pointer.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      6fda981c
    • K
      block: Avoid deadlocks with bio allocation by stacking drivers · df2cb6da
      Kent Overstreet 提交于
      Previously, if we ever try to allocate more than once from the same bio
      set while running under generic_make_request() (i.e. a stacking block
      driver), we risk deadlock.
      
      This is because of the code in generic_make_request() that converts
      recursion to iteration; any bios we submit won't actually be submitted
      (so they can complete and eventually be freed) until after we return -
      this means if we allocate a second bio, we're blocking the first one
      from ever being freed.
      
      Thus if enough threads call into a stacking block driver at the same
      time with bios that need multiple splits, and the bio_set's reserve gets
      used up, we deadlock.
      
      This can be worked around in the driver code - we could check if we're
      running under generic_make_request(), then mask out __GFP_WAIT when we
      go to allocate a bio, and if the allocation fails punt to workqueue and
      retry the allocation.
      
      But this is tricky and not a generic solution. This patch solves it for
      all users by inverting the previously described technique. We allocate a
      rescuer workqueue for each bio_set, and then in the allocation code if
      there are bios on current->bio_list we would be blocking, we punt them
      to the rescuer workqueue to be submitted.
      
      This guarantees forward progress for bio allocations under
      generic_make_request() provided each bio is submitted before allocating
      the next, and provided the bios are freed after they complete.
      
      Note that this doesn't do anything for allocation from other mempools.
      Instead of allocating per bio data structures from a mempool, code
      should use bio_set's front_pad.
      
      Tested it by forcing the rescue codepath to be taken (by disabling the
      first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
      of arbitrary bio splitting) and verified that the rescuer was being
      invoked.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMuthukumar Ratty <muthur@gmail.com>
      df2cb6da
    • K
      block: Reorder struct bio_set · 57fb233f
      Kent Overstreet 提交于
      This is prep work for the next patch, which embeds a struct bio_list in
      struct bio_set.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      57fb233f