1. 24 3月, 2013 2 次提交
    • K
      block: Convert integrity to bvec_alloc_bs() · 9f060e22
      Kent Overstreet 提交于
      This adds a pointer to the bvec array to struct bio_integrity_payload,
      instead of the bvecs always being inline; then the bvecs are allocated
      with bvec_alloc_bs().
      
      Changed bvec_alloc_bs() and bvec_free_bs() to take a pointer to a
      mempool instead of the bioset, so that bio integrity can use a different
      mempool for its bvecs, and thus avoid a potential deadlock.
      
      This is eventually for immutable bio vecs - immutable bvecs aren't
      useful if we still have to copy them, hence the need for the pointer.
      Less code is always nice too, though.
      
      Also, bio_integrity_alloc() was using fs_bio_set if no bio_set was
      specified. This was wrong - using the bio_set doesn't protect us from
      memory allocation failures, because we just used kmalloc for the
      bio_integrity_payload. But it does introduce the possibility of
      deadlock, if for some reason we weren't supposed to be using fs_bio_set.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      9f060e22
    • K
      block: Avoid deadlocks with bio allocation by stacking drivers · df2cb6da
      Kent Overstreet 提交于
      Previously, if we ever try to allocate more than once from the same bio
      set while running under generic_make_request() (i.e. a stacking block
      driver), we risk deadlock.
      
      This is because of the code in generic_make_request() that converts
      recursion to iteration; any bios we submit won't actually be submitted
      (so they can complete and eventually be freed) until after we return -
      this means if we allocate a second bio, we're blocking the first one
      from ever being freed.
      
      Thus if enough threads call into a stacking block driver at the same
      time with bios that need multiple splits, and the bio_set's reserve gets
      used up, we deadlock.
      
      This can be worked around in the driver code - we could check if we're
      running under generic_make_request(), then mask out __GFP_WAIT when we
      go to allocate a bio, and if the allocation fails punt to workqueue and
      retry the allocation.
      
      But this is tricky and not a generic solution. This patch solves it for
      all users by inverting the previously described technique. We allocate a
      rescuer workqueue for each bio_set, and then in the allocation code if
      there are bios on current->bio_list we would be blocking, we punt them
      to the rescuer workqueue to be submitted.
      
      This guarantees forward progress for bio allocations under
      generic_make_request() provided each bio is submitted before allocating
      the next, and provided the bios are freed after they complete.
      
      Note that this doesn't do anything for allocation from other mempools.
      Instead of allocating per bio data structures from a mempool, code
      should use bio_set's front_pad.
      
      Tested it by forcing the rescue codepath to be taken (by disabling the
      first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
      of arbitrary bio splitting) and verified that the rescuer was being
      invoked.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMuthukumar Ratty <muthur@gmail.com>
      df2cb6da
  2. 14 1月, 2013 1 次提交
    • T
      block: add missing block_bio_complete() tracepoint · 3a366e61
      Tejun Heo 提交于
      bio completion didn't kick block_bio_complete TP.  Only dm was
      explicitly triggering the TP on IO completion.  This makes
      block_bio_complete TP useless for tracers which want to know about
      bios, and all other bio based drivers skip generating blktrace
      completion events.
      
      This patch makes all bio completions via bio_endio() generate
      block_bio_complete TP.
      
      * Explicit trace_block_bio_complete() invocation removed from dm and
        the trace point is unexported.
      
      * @rq dropped from trace_block_bio_complete().  bios may fly around
        w/o queue associated.  Verifying and accessing the assocaited queue
        belongs to TP probes.
      
      * blktrace now gets both request and bio completions.  Make it ignore
        bio completions if request completion path is happening.
      
      This makes all bio based drivers generate blktrace completion events
      properly and makes the block_bio_complete TP actually useful.
      
      v2: With this change, block_bio_complete TP could be invoked on sg
          commands which have bio's with %NULL bi_bdev.  Update TP
          assignment code to check whether bio->bi_bdev is %NULL before
          dereferencing.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Original-patch-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a366e61
  3. 23 10月, 2012 1 次提交
  4. 28 9月, 2012 1 次提交
  5. 20 9月, 2012 1 次提交
  6. 09 9月, 2012 6 次提交
    • K
      block: Add bio_clone_bioset(), bio_clone_kmalloc() · bf800ef1
      Kent Overstreet 提交于
      Previously, there was bio_clone() but it only allocated from the fs bio
      set; as a result various users were open coding it and using
      __bio_clone().
      
      This changes bio_clone() to become bio_clone_bioset(), and then we add
      bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
      the functionality the last patch adedd.
      
      This will also help in a later patch changing how bio cloning works.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: Boaz Harrosh <bharrosh@panasas.com>
      CC: Jeff Garzik <jeff@garzik.org>
      Acked-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bf800ef1
    • K
      block: Consolidate bio_alloc_bioset(), bio_kmalloc() · 3f86a82a
      Kent Overstreet 提交于
      Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
      different because there was some almost-duplicated code - this fixes
      some of that.
      
      The important change is that previously bio_kmalloc() always set
      bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
      bio_alloc_bioset(). This would cause bio_has_data() to return true; I
      don't know if this resulted in any actual bugs but it was certainly
      wrong.
      
      bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
      limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
      (BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
      at least they're enforced closer together and hopefully they will be
      fixed in a later patch.
      
      This'll also help with some future cleanups - there are a fair number of
      functions that allocate bios (e.g. bio_clone()), and now they don't have
      to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      v7: Re-add dropped comments, improv patch description
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f86a82a
    • K
      block: Kill bi_destructor · 4254bba1
      Kent Overstreet 提交于
      Now that we've got generic code for freeing bios allocated from bio
      pools, this isn't needed anymore.
      
      This patch also makes bio_free() static, since without bi_destructor
      there should be no need for it to be called anywhere else.
      
      bio_free() is now only called from bio_put, so we can refactor those a
      bit - move some code from bio_put() to bio_free() and kill the redundant
      bio->bi_next = NULL.
      
      v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
      v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
      v7: No #define BIO_KMALLOC_POOL anymore
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4254bba1
    • K
      block: Add bio_reset() · f44b48c7
      Kent Overstreet 提交于
      Reusing bios is something that's been highly frowned upon in the past,
      but driver code keeps doing it anyways. If it's going to happen anyways,
      we should provide a generic method.
      
      This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
      was open coding it, by doing a bio_init() and resetting bi_destructor.
      
      This required reordering struct bio, but the block layer is not yet
      nearly fast enough for any cacheline effects to matter here.
      
      v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
      bio->bi_flags are saved.
      v6: Further commenting verbosity, per Tejun
      v9: Add a function comment
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f44b48c7
    • K
      block: Ues bi_pool for bio_integrity_alloc() · 1e2a410f
      Kent Overstreet 提交于
      Now that bios keep track of where they were allocated from,
      bio_integrity_alloc_bioset() becomes redundant.
      
      Remove bio_integrity_alloc_bioset() and drop bio_set argument from the
      related functions and make them use bio->bi_pool.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1e2a410f
    • K
      block: Generalized bio pool freeing · 395c72a7
      Kent Overstreet 提交于
      With the old code, when you allocate a bio from a bio pool you have to
      implement your own destructor that knows how to find the bio pool the
      bio was originally allocated from.
      
      This adds a new field to struct bio (bi_pool) and changes
      bio_alloc_bioset() to use it. This makes various bio destructors
      unnecessary, so they're then deleted.
      
      v6: Explain the temporary if statement in bio_put
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: Nicholas Bellinger <nab@linux-iscsi.org>
      CC: Lars Ellenberg <lars.ellenberg@linbit.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      395c72a7
  7. 09 8月, 2012 1 次提交
  8. 04 8月, 2012 1 次提交
  9. 11 5月, 2012 1 次提交
    • B
      bio allocation failure due to bio_get_nr_vecs() · f908ee94
      Bernd Schubert 提交于
      The number of bio_get_nr_vecs() is passed down via bio_alloc() to
      bvec_alloc_bs(), which fails the bio allocation if
      nr_iovecs > BIO_MAX_PAGES. For the underlying caller this causes an
      unexpected bio allocation failure.
      Limiting to queue_max_segments() is not sufficient, as max_segments
      also might be very large.
      
      bvec_alloc_bs(gfp_mask, nr_iovecs, ) => NULL when nr_iovecs  > BIO_MAX_PAGES
      bio_alloc_bioset(gfp_mask, nr_iovecs, ...)
      bio_alloc(GFP_NOIO, nvecs)
      xfs_alloc_ioend_bio()
      Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f908ee94
  10. 07 3月, 2012 1 次提交
    • T
      block: implement bio_associate_current() · 852c788f
      Tejun Heo 提交于
      IO scheduling and cgroup are tied to the issuing task via io_context
      and cgroup of %current.  Unfortunately, there are cases where IOs need
      to be routed via a different task which makes scheduling and cgroup
      limit enforcement applied completely incorrectly.
      
      For example, all bios delayed by blk-throttle end up being issued by a
      delayed work item and get assigned the io_context of the worker task
      which happens to serve the work item and dumped to the default block
      cgroup.  This is double confusing as bios which aren't delayed end up
      in the correct cgroup and makes using blk-throttle and cfq propio
      together impossible.
      
      Any code which punts IO issuing to another task is affected which is
      getting more and more common (e.g. btrfs).  As both io_context and
      cgroup are firmly tied to task including userland visible APIs to
      manipulate them, it makes a lot of sense to match up tasks to bios.
      
      This patch implements bio_associate_current() which associates the
      specified bio with %current.  The bio will record the associated ioc
      and blkcg at that point and block layer will use the recorded ones
      regardless of which task actually ends up issuing the bio.  bio
      release puts the associated ioc and blkcg.
      
      It grabs and remembers ioc and blkcg instead of the task itself
      because task may already be dead by the time the bio is issued making
      ioc and blkcg inaccessible and those are all block layer cares about.
      
      elevator_set_req_fn() is updated such that the bio elvdata is being
      allocated for is available to the elevator.
      
      This doesn't update block cgroup policies yet.  Further patches will
      implement the support.
      
      -v2: #ifdef CONFIG_BLK_CGROUP added around bio->bi_ioc dereference in
           rq_ioc() to fix build breakage.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      852c788f
  11. 29 2月, 2012 1 次提交
  12. 09 2月, 2012 1 次提交
    • K
      bio: don't overflow in bio_get_nr_vecs() · 5abebfdd
      Kent Overstreet 提交于
      There were two places bio_get_nr_vecs() could overflow:
      
      First, it did a left shift to convert from sectors to bytes immediately
      before dividing by PAGE_SIZE.  If PAGE_SIZE ever was less than 512 a great
      many things would break, so dividing by PAGE_SIZE >> 9 is safe and will
      generate smaller code too.
      
      The nastier overflow was in the DIV_ROUND_UP() (that's what the code was
      effectively doing, anyways).  If n + d overflowed, the whole thing would
      return 0 which breaks things rather effectively.
      
      bio_get_nr_vecs() doesn't claim to give an exact value anyways, so the
      DIV_ROUND_UP() is silly; we could do a straight divide except if a
      device's queue_max_sectors was less than PAGE_SIZE we'd return 0.  So we
      just add 1; this should always be safe - things will break badly if
      bio_get_nr_vecs() returns > BIO_MAX_PAGES (bio_alloc() will suddenly start
      failing) but it's queue_max_segments that must guard against this, if
      queue_max_sectors is preventing this from happen things are going to
      explode on architectures with different PAGE_SIZE.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5abebfdd
  13. 16 11月, 2011 1 次提交
  14. 24 10月, 2011 1 次提交
    • T
      block: Remove the control of complete cpu from bio. · 9562ad9a
      Tao Ma 提交于
      bio originally has the functionality to set the complete cpu, but
      it is broken.
      
      Chirstoph said that "This code is unused, and from the all the
      discussions lately pretty obviously broken.  The only thing keeping
      it serves is creating more confusion and possibly more bugs."
      
      And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
      with leaving cpu control to the request based drivers, they are the
      only ones that can toggle the setting anyway".
      
      So this patch tries to remove all the work of controling complete cpu
      from a bio.
      
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9562ad9a
  15. 28 5月, 2011 1 次提交
  16. 27 5月, 2011 1 次提交
  17. 31 3月, 2011 1 次提交
  18. 23 3月, 2011 1 次提交
  19. 17 3月, 2011 1 次提交
  20. 08 3月, 2011 1 次提交
  21. 10 11月, 2010 2 次提交
  22. 08 8月, 2010 1 次提交
    • C
      block: unify flags for struct bio and struct request · 7b6d91da
      Christoph Hellwig 提交于
      Remove the current bio flags and reuse the request flags for the bio, too.
      This allows to more easily trace the type of I/O from the filesystem
      down to the block driver.  There were two flags in the bio that were
      missing in the requests:  BIO_RW_UNPLUG and BIO_RW_AHEAD.  Also I've
      renamed two request flags that had a superflous RW in them.
      
      Note that the flags are in bio.h despite having the REQ_ name - as
      blkdev.h includes bio.h that is the only way to go for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7b6d91da
  23. 08 3月, 2010 1 次提交
  24. 03 3月, 2010 1 次提交
  25. 01 3月, 2010 1 次提交
  26. 26 2月, 2010 1 次提交
  27. 05 2月, 2010 1 次提交
  28. 28 1月, 2010 1 次提交
  29. 19 1月, 2010 1 次提交
  30. 04 12月, 2009 1 次提交
  31. 26 11月, 2009 1 次提交
    • I
      block: add helpers to run flush_dcache_page() against a bio and a request's pages · 2d4dc890
      Ilya Loginov 提交于
      Mtdblock driver doesn't call flush_dcache_page for pages in request.  So,
      this causes problems on architectures where the icache doesn't fill from
      the dcache or with dcache aliases.  The patch fixes this.
      
      The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
      pointless empty cache-thrashing loops on architectures for which
      flush_dcache_page() is a no-op.  Every architecture was provided with this
      flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
      equal 1 or do nothing otherwise.
      
      See "fix mtd_blkdevs problem with caches on some architectures" discussion
      on LKML for more information.
      Signed-off-by: NIlya Loginov <isloginov@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Peter Horton <phorton@bitbox.co.uk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2d4dc890
  32. 02 11月, 2009 2 次提交