1. 24 11月, 2013 10 次提交
    • K
      dm: Refactor for new bio cloning/splitting · 1c3b13e6
      Kent Overstreet 提交于
      We need to convert the dm code to the new bvec_iter primitives which
      respect bi_bvec_done; they also allow us to drastically simplify dm's
      bio splitting code.
      
      Also, it's no longer necessary to save/restore the bvec array anymore -
      driver conversions for immutable bvecs are done, so drivers should never
      be modifying it.
      
      Also kill bio_sector_offset(), dm was the only user and it doesn't make
      much sense anymore.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      1c3b13e6
    • K
      block: Add bio_clone_fast() · 59d276fe
      Kent Overstreet 提交于
      bio_clone() just got more expensive - however, most users of bio_clone()
      don't actually need to modify the biovec. If they aren't modifying the
      biovec, and they can guarantee that the original bio isn't freed before
      the clone (also true in most cases), we can just point the clone at the
      original bio's biovec.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      59d276fe
    • K
      block: Refactor bio_clone_bioset() for immutable biovecs · bdb53207
      Kent Overstreet 提交于
      bio_clone() needs to produce a bio that's suitable for the caller to
      munge with the biovec. Part of the immutable biovec patch series is
      fixing stuff up so that submitting partially completed bios is safe and
      works: thus, we now need bio_clone() on a partially completed bio to
      produce a bio for which bi_idx and bi_bvec done are 0 - like they would
      be if the caller had just allocated a new bio.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      bdb53207
    • K
      block: Convert drivers to immutable biovecs · 003b5c57
      Kent Overstreet 提交于
      Now that we've got a mechanism for immutable biovecs -
      bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
      respect it instead of using the bvec array directly.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      003b5c57
    • K
      block: Kill bio_segments()/bi_vcnt usage · 458b76ed
      Kent Overstreet 提交于
      When we start sharing biovecs, keeping bi_vcnt accurate for splits is
      going to be error prone - and unnecessary, if we refactor some code.
      
      So bio_segments() has to go - but most of the existing users just needed
      to know if the bio had multiple segments, which is easier - add a
      bio_multiple_segments() for them.
      
      (Two of the current uses of bio_segments() are going to go away in a
      couple patches, but the current implementation of bio_segments() is
      unsafe as soon as we start doing driver conversions for immutable
      biovecs - so implement a dumb version for bisectability, it'll go away
      in a couple patches)
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      458b76ed
    • K
      block: Convert bio_copy_data() to bvec_iter · 1cb9dda4
      Kent Overstreet 提交于
      Our fancy new bvec iterator makes code like this much easier to write.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      1cb9dda4
    • K
      block: Immutable bio vecs · 4550dd6c
      Kent Overstreet 提交于
      This adds a mechanism by which we can advance a bio by an arbitrary
      number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
      indicates the number of bytes completed in the current bvec.
      
      Various driver code still needs to be updated to not refer to the bvec
      directly before we can use this for interesting things, like efficient
      bio splitting.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      4550dd6c
    • K
      block: Convert bio_for_each_segment() to bvec_iter · 7988613b
      Kent Overstreet 提交于
      More prep work for immutable biovecs - with immutable bvecs drivers
      won't be able to use the biovec directly, they'll need to use helpers
      that take into account bio->bi_iter.bi_bvec_done.
      
      This updates callers for the new usage without changing the
      implementation yet.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Cc: support@lsi.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      Cc: cbe-oss-dev@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: DL-MPTFusionLinux@lsi.com
      Cc: linux-scsi@vger.kernel.org
      Cc: devel@driverdev.osuosl.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: cluster-devel@redhat.com
      Cc: linux-mm@kvack.org
      Acked-by: NGeoff Levand <geoff@infradead.org>
      7988613b
    • K
      block: Convert bio_iovec() to bvec_iter · a4ad39b1
      Kent Overstreet 提交于
      For immutable biovecs, we'll be introducing a new bio_iovec() that uses
      our new bvec iterator to construct a biovec, taking into account
      bvec_iter->bi_bvec_done - this patch updates existing users for the new
      usage.
      
      Some of the existing users really do need a pointer into the bvec array
      - those uses are all going to be removed, but we'll need the
      functionality from immutable to remove them - so for now rename the
      existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
      patches.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      a4ad39b1
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  2. 19 11月, 2013 1 次提交
  3. 09 11月, 2013 1 次提交
  4. 25 9月, 2013 1 次提交
  5. 22 8月, 2013 1 次提交
    • R
      [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal · 35dc2483
      Roland Dreier 提交于
      There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
      leads to one process writing data into the address space of some other
      random unrelated process if the ioctl is interrupted by a signal.
      What happens is the following:
      
       - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
         underlying SCSI command will transfer data from the SCSI device to
         the buffer provided in the ioctl)
      
       - Before the command finishes, a signal is sent to the process waiting
         in the ioctl.  This will end up waking up the sg_ioctl() code:
      
      		result = wait_event_interruptible(sfp->read_wait,
      			(srp_done(sfp, srp) || sdp->detached));
      
         but neither srp_done() nor sdp->detached is true, so we end up just
         setting srp->orphan and returning to userspace:
      
      		srp->orphan = 1;
      		write_unlock_irq(&sfp->rq_list_lock);
      		return result;	/* -ERESTARTSYS because signal hit process */
      
         At this point the original process is done with the ioctl and
         blithely goes ahead handling the signal, reissuing the ioctl, etc.
      
       - Eventually, the SCSI command issued by the first ioctl finishes and
         ends up in sg_rq_end_io().  At the end of that function, we run through:
      
      	write_lock_irqsave(&sfp->rq_list_lock, iflags);
      	if (unlikely(srp->orphan)) {
      		if (sfp->keep_orphan)
      			srp->sg_io_owned = 0;
      		else
      			done = 0;
      	}
      	srp->done = done;
      	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
      
      	if (likely(done)) {
      		/* Now wake up any sg_read() that is waiting for this
      		 * packet.
      		 */
      		wake_up_interruptible(&sfp->read_wait);
      		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
      		kref_put(&sfp->f_ref, sg_remove_sfp);
      	} else {
      		INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
      		schedule_work(&srp->ew.work);
      	}
      
         Since srp->orphan *is* set, we set done to 0 (assuming the
         userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
         ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
         to run in a workqueue.
      
       - In workqueue context we go through sg_rq_end_io_usercontext() ->
         sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
         bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().
      
         The key point here is that we are doing copy_to_user() on a
         workqueue -- that is, we're on a kernel thread with current->mm
         equal to whatever random previous user process was scheduled before
         this kernel thread.  So we end up copying whatever data the SCSI
         command returned to the virtual address of the buffer passed into
         the original ioctl, but it's quite likely we do this copying into a
         different address space!
      
      As suggested by James Bottomley <James.Bottomley@hansenpartnership.com>,
      add a check for current->mm (which is NULL if we're on a kernel thread
      without a real userspace address space) in bio_uncopy_user(), and skip
      the copy if we're on a kernel thread.
      
      There's no reason that I can think of for any caller of bio_uncopy_user()
      to want to do copying on a kernel thread with a random active userspace
      address space.
      
      Huge thanks to Costa Sapuntzakis <costa@purestorage.com> for the
      original pointer to this bug in the sg code.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Tested-by: NDavid Milburn <dmilburn@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      35dc2483
  6. 09 8月, 2013 1 次提交
    • T
      cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ · 8af01f56
      Tejun Heo 提交于
      The names of the two struct cgroup_subsys_state accessors -
      cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
      The former clashes with the type name and the latter doesn't even
      indicate it's somehow related to cgroup.
      
      We're about to revamp large portion of cgroup API, so, let's rename
      them so that they're less awkward.  Most per-controller usages of the
      accessors are localized in accessor wrappers and given the amount of
      scheduled changes, this isn't gonna add any noticeable headache.
      
      Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
      to task_css().  This patch is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8af01f56
  7. 08 5月, 2013 1 次提交
  8. 19 4月, 2013 1 次提交
  9. 24 3月, 2013 10 次提交
    • K
      block: Add an explicit bio flag for bios that own their bvec · a38352e0
      Kent Overstreet 提交于
      This is for the new bio splitting code. When we split a bio, if the
      split occured on a bvec boundry we reuse the bvec for the new bio. But
      that means bio_free() can't free it, hence the explicit flag.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      a38352e0
    • K
      block: Add bio_alloc_pages() · a0787606
      Kent Overstreet 提交于
      More utility code to replace stuff that's getting open coded.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      a0787606
    • K
      block: Convert some code to bio_for_each_segment_all() · cb34e057
      Kent Overstreet 提交于
      More prep work for immutable bvecs:
      
      A few places in the code were either open coding or using the wrong
      version - fix.
      
      After we introduce the bvec iter, it'll no longer be possible to modify
      the biovec through bio_for_each_segment_all() - it doesn't increment a
      pointer to the current bvec, you pass in a struct bio_vec (not a
      pointer) which is updated with what the current biovec would be (taking
      into account bi_bvec_done and bi_size).
      
      So because of that it's more worthwhile to be consistent about
      bio_for_each_segment()/bio_for_each_segment_all() usage.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Alexander Viro <viro@zeniv.linux.org.uk>
      cb34e057
    • K
      block: Add bio_for_each_segment_all() · d74c6d51
      Kent Overstreet 提交于
      __bio_for_each_segment() iterates bvecs from the specified index
      instead of bio->bv_idx.  Currently, the only usage is to walk all the
      bvecs after the bio has been advanced by specifying 0 index.
      
      For immutable bvecs, we need to split these apart;
      bio_for_each_segment() is going to have a different implementation.
      This will also help document the intent of code that's using it -
      bio_for_each_segment_all() is only legal to use for code that owns the
      bio.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Neil Brown <neilb@suse.de>
      CC: Boaz Harrosh <bharrosh@panasas.com>
      d74c6d51
    • K
      block: Add bio_copy_data() · 16ac3d63
      Kent Overstreet 提交于
      This gets open coded quite a bit and it's tricky to get right, so make a
      generic version and convert some existing users over to it instead.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      16ac3d63
    • K
      block: Add submit_bio_wait(), remove from md · 9e882242
      Kent Overstreet 提交于
      Random cleanup - this code was duplicated and it's not really specific
      to md.
      
      Also added the ability to return the actual error code.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      9e882242
    • K
      block: Change bio_split() to respect the current value of bi_idx · 5b83636a
      Kent Overstreet 提交于
      In the current code bio_split() won't be seeing partially completed bios
      so this doesn't change any behaviour, but this makes the code a bit
      clearer as to what bio_split() actually requires.
      
      The immediate purpose of the patch is removing unnecessary bi_idx
      references, but the end goal is to allow partial completed bios to be
      submitted, which along with immutable biovecs enables effecient bio
      splitting.
      
      Some of the callers were (double) checking that bios could be split, so
      update their checks too.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Neil Brown <neilb@suse.de>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      5b83636a
    • K
      block: Add bio_advance() · 054bdf64
      Kent Overstreet 提交于
      This is prep work for immutable bio vecs; we first want to centralize
      where bvecs are modified.
      
      Next two patches convert some existing code to use this function.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      054bdf64
    • K
      block: Convert integrity to bvec_alloc_bs() · 9f060e22
      Kent Overstreet 提交于
      This adds a pointer to the bvec array to struct bio_integrity_payload,
      instead of the bvecs always being inline; then the bvecs are allocated
      with bvec_alloc_bs().
      
      Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
      mempool instead of the bioset, so that bio integrity can use a different
      mempool for its bvecs, and thus avoid a potential deadlock.
      
      This is eventually for immutable bio vecs - immutable bvecs aren't
      useful if we still have to copy them, hence the need for the pointer.
      Less code is always nice too, though.
      
      Also, bio_integrity_alloc() was using fs_bio_set if no bio_set was
      specified. This was wrong - using the bio_set doesn't protect us from
      memory allocation failures, because we just used kmalloc for the
      bio_integrity_payload. But it does introduce the possibility of
      deadlock, if for some reason we weren't supposed to be using fs_bio_set.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      9f060e22
    • K
      block: Avoid deadlocks with bio allocation by stacking drivers · df2cb6da
      Kent Overstreet 提交于
      Previously, if we ever try to allocate more than once from the same bio
      set while running under generic_make_request() (i.e. a stacking block
      driver), we risk deadlock.
      
      This is because of the code in generic_make_request() that converts
      recursion to iteration; any bios we submit won't actually be submitted
      (so they can complete and eventually be freed) until after we return -
      this means if we allocate a second bio, we're blocking the first one
      from ever being freed.
      
      Thus if enough threads call into a stacking block driver at the same
      time with bios that need multiple splits, and the bio_set's reserve gets
      used up, we deadlock.
      
      This can be worked around in the driver code - we could check if we're
      running under generic_make_request(), then mask out __GFP_WAIT when we
      go to allocate a bio, and if the allocation fails punt to workqueue and
      retry the allocation.
      
      But this is tricky and not a generic solution. This patch solves it for
      all users by inverting the previously described technique. We allocate a
      rescuer workqueue for each bio_set, and then in the allocation code if
      there are bios on current->bio_list we would be blocking, we punt them
      to the rescuer workqueue to be submitted.
      
      This guarantees forward progress for bio allocations under
      generic_make_request() provided each bio is submitted before allocating
      the next, and provided the bios are freed after they complete.
      
      Note that this doesn't do anything for allocation from other mempools.
      Instead of allocating per bio data structures from a mempool, code
      should use bio_set's front_pad.
      
      Tested it by forcing the rescue codepath to be taken (by disabling the
      first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
      of arbitrary bio splitting) and verified that the rescuer was being
      invoked.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMuthukumar Ratty <muthur@gmail.com>
      df2cb6da
  10. 14 1月, 2013 1 次提交
    • T
      block: add missing block_bio_complete() tracepoint · 3a366e61
      Tejun Heo 提交于
      bio completion didn't kick block_bio_complete TP.  Only dm was
      explicitly triggering the TP on IO completion.  This makes
      block_bio_complete TP useless for tracers which want to know about
      bios, and all other bio based drivers skip generating blktrace
      completion events.
      
      This patch makes all bio completions via bio_endio() generate
      block_bio_complete TP.
      
      * Explicit trace_block_bio_complete() invocation removed from dm and
        the trace point is unexported.
      
      * @rq dropped from trace_block_bio_complete().  bios may fly around
        w/o queue associated.  Verifying and accessing the assocaited queue
        belongs to TP probes.
      
      * blktrace now gets both request and bio completions.  Make it ignore
        bio completions if request completion path is happening.
      
      This makes all bio based drivers generate blktrace completion events
      properly and makes the block_bio_complete TP actually useful.
      
      v2: With this change, block_bio_complete TP could be invoked on sg
          commands which have bio's with %NULL bi_bdev.  Update TP
          assignment code to check whether bio->bi_bdev is %NULL before
          dereferencing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Original-patch-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a366e61
  11. 23 10月, 2012 1 次提交
  12. 28 9月, 2012 1 次提交
  13. 20 9月, 2012 1 次提交
  14. 09 9月, 2012 6 次提交
    • K
      block: Add bio_clone_bioset(), bio_clone_kmalloc() · bf800ef1
      Kent Overstreet 提交于
      Previously, there was bio_clone() but it only allocated from the fs bio
      set; as a result various users were open coding it and using
      __bio_clone().
      
      This changes bio_clone() to become bio_clone_bioset(), and then we add
      bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
      the functionality the last patch adedd.
      
      This will also help in a later patch changing how bio cloning works.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: Boaz Harrosh <bharrosh@panasas.com>
      CC: Jeff Garzik <jeff@garzik.org>
      Acked-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bf800ef1
    • K
      block: Consolidate bio_alloc_bioset(), bio_kmalloc() · 3f86a82a
      Kent Overstreet 提交于
      Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
      different because there was some almost-duplicated code - this fixes
      some of that.
      
      The important change is that previously bio_kmalloc() always set
      bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
      bio_alloc_bioset(). This would cause bio_has_data() to return true; I
      don't know if this resulted in any actual bugs but it was certainly
      wrong.
      
      bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
      limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
      (BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
      at least they're enforced closer together and hopefully they will be
      fixed in a later patch.
      
      This'll also help with some future cleanups - there are a fair number of
      functions that allocate bios (e.g. bio_clone()), and now they don't have
      to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      v7: Re-add dropped comments, improv patch description
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f86a82a
    • K
      block: Kill bi_destructor · 4254bba1
      Kent Overstreet 提交于
      Now that we've got generic code for freeing bios allocated from bio
      pools, this isn't needed anymore.
      
      This patch also makes bio_free() static, since without bi_destructor
      there should be no need for it to be called anywhere else.
      
      bio_free() is now only called from bio_put, so we can refactor those a
      bit - move some code from bio_put() to bio_free() and kill the redundant
      bio->bi_next = NULL.
      
      v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
      v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
      v7: No #define BIO_KMALLOC_POOL anymore
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4254bba1
    • K
      block: Add bio_reset() · f44b48c7
      Kent Overstreet 提交于
      Reusing bios is something that's been highly frowned upon in the past,
      but driver code keeps doing it anyways. If it's going to happen anyways,
      we should provide a generic method.
      
      This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
      was open coding it, by doing a bio_init() and resetting bi_destructor.
      
      This required reordering struct bio, but the block layer is not yet
      nearly fast enough for any cacheline effects to matter here.
      
      v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
      bio->bi_flags are saved.
      v6: Further commenting verbosity, per Tejun
      v9: Add a function comment
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f44b48c7
    • K
      block: Ues bi_pool for bio_integrity_alloc() · 1e2a410f
      Kent Overstreet 提交于
      Now that bios keep track of where they were allocated from,
      bio_integrity_alloc_bioset() becomes redundant.
      
      Remove bio_integrity_alloc_bioset() and drop bio_set argument from the
      related functions and make them use bio->bi_pool.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1e2a410f
    • K
      block: Generalized bio pool freeing · 395c72a7
      Kent Overstreet 提交于
      With the old code, when you allocate a bio from a bio pool you have to
      implement your own destructor that knows how to find the bio pool the
      bio was originally allocated from.
      
      This adds a new field to struct bio (bi_pool) and changes
      bio_alloc_bioset() to use it. This makes various bio destructors
      unnecessary, so they're then deleted.
      
      v6: Explain the temporary if statement in bio_put
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: Nicholas Bellinger <nab@linux-iscsi.org>
      CC: Lars Ellenberg <lars.ellenberg@linbit.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      395c72a7
  15. 09 8月, 2012 1 次提交
  16. 04 8月, 2012 1 次提交
  17. 11 5月, 2012 1 次提交
    • B
      bio allocation failure due to bio_get_nr_vecs() · f908ee94
      Bernd Schubert 提交于
      The number of bio_get_nr_vecs() is passed down via bio_alloc() to
      bvec_alloc_bs(), which fails the bio allocation if
      nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
      unexpected bio allocation failure.
      Limiting to queue_max_segments() is not sufficient, as max_segments
      also might be very large.
      
      bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
      bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
      bio_alloc(GFP_NOIO, nvecs)
      xfs_alloc_ioend_bio()
      Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f908ee94