1. 24 11月, 2013 2 次提交
    • K
      block: Immutable bio vecs · 4550dd6c
      Kent Overstreet 提交于
      This adds a mechanism by which we can advance a bio by an arbitrary
      number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
      indicates the number of bytes completed in the current bvec.
      
      Various driver code still needs to be updated to not refer to the bvec
      directly before we can use this for interesting things, like efficient
      bio splitting.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      4550dd6c
    • K
      block: Convert bio_for_each_segment() to bvec_iter · 7988613b
      Kent Overstreet 提交于
      More prep work for immutable biovecs - with immutable bvecs drivers
      won't be able to use the biovec directly, they'll need to use helpers
      that take into account bio->bi_iter.bi_bvec_done.
      
      This updates callers for the new usage without changing the
      implementation yet.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Cc: support@lsi.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      Cc: cbe-oss-dev@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: DL-MPTFusionLinux@lsi.com
      Cc: linux-scsi@vger.kernel.org
      Cc: devel@driverdev.osuosl.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: cluster-devel@redhat.com
      Cc: linux-mm@kvack.org
      Acked-by: NGeoff Levand <geoff@infradead.org>
      7988613b
  2. 20 11月, 2013 1 次提交
  3. 25 10月, 2013 3 次提交
    • J
      blk-mq: new multi-queue block IO queueing mechanism · 320ae51f
      Jens Axboe 提交于
      Linux currently has two models for block devices:
      
      - The classic request_fn based approach, where drivers use struct
        request units for IO. The block layer provides various helper
        functionalities to let drivers share code, things like tag
        management, timeout handling, queueing, etc.
      
      - The "stacked" approach, where a driver squeezes in between the
        block layer and IO submitter. Since this bypasses the IO stack,
        driver generally have to manage everything themselves.
      
      With drivers being written for new high IOPS devices, the classic
      request_fn based driver doesn't work well enough. The design dates
      back to when both SMP and high IOPS was rare. It has problems with
      scaling to bigger machines, and runs into scaling issues even on
      smaller machines when you have IOPS in the hundreds of thousands
      per device.
      
      The stacked approach is then most often selected as the model
      for the driver. But this means that everybody has to re-invent
      everything, and along with that we get all the problems again
      that the shared approach solved.
      
      This commit introduces blk-mq, block multi queue support. The
      design is centered around per-cpu queues for queueing IO, which
      then funnel down into x number of hardware submission queues.
      We might have a 1:1 mapping between the two, or it might be
      an N:M mapping. That all depends on what the hardware supports.
      
      blk-mq provides various helper functions, which include:
      
      - Scalable support for request tagging. Most devices need to
        be able to uniquely identify a request both in the driver and
        to the hardware. The tagging uses per-cpu caches for freed
        tags, to enable cache hot reuse.
      
      - Timeout handling without tracking request on a per-device
        basis. Basically the driver should be able to get a notification,
        if a request happens to fail.
      
      - Optional support for non 1:1 mappings between issue and
        submission queues. blk-mq can redirect IO completions to the
        desired location.
      
      - Support for per-request payloads. Drivers almost always need
        to associate a request structure with some driver private
        command structure. Drivers can tell blk-mq this at init time,
        and then any request handed to the driver will have the
        required size of memory associated with it.
      
      - Support for merging of IO, and plugging. The stacked model
        gets neither of these. Even for high IOPS devices, merging
        sequential IO reduces per-command overhead and thus
        increases bandwidth.
      
      For now, this is provided as a potential 3rd queueing model, with
      the hope being that, as it matures, it can replace both the classic
      and stacked model. That would get us back to having just 1 real
      model for block devices, leaving the stacked approach to dm/md
      devices (as it was originally intended).
      
      Contributions in this patch from the following people:
      
      Shaohua Li <shli@fusionio.com>
      Alexander Gordeev <agordeev@redhat.com>
      Christoph Hellwig <hch@infradead.org>
      Mike Christie <michaelc@cs.wisc.edu>
      Matias Bjorling <m@bjorling.me>
      Jeff Moyer <jmoyer@redhat.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      320ae51f
    • C
      block: remove request ref_count · 71fe07d0
      Christoph Hellwig 提交于
      This reference count has been around since before git history, but the only
      place where it's used is in blk_execute_rq, and ther it is entirely useless
      as it is incremented before submitting the request and decremented in the
      end_io handler before waking up the submitter thread.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      71fe07d0
    • J
      block: make rq->cmd_flags be 64-bit · 5953316d
      Jens Axboe 提交于
      We have officially run out of flags in a 32-bit space. Extend it
      to 64-bit even on 32-bit archs.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5953316d
  4. 22 9月, 2013 1 次提交
  5. 07 5月, 2013 1 次提交
  6. 24 4月, 2013 1 次提交
    • J
      block: fix max discard sectors limit · 871dd928
      James Bottomley 提交于
      linux-v3.8-rc1 and later support for plug for blkdev_issue_discard with
      commit 0cfbcafc
      (block: add plug for blkdev_issue_discard )
      
      For example,
      1) DISCARD rq-1 with size size 4GB
      2) DISCARD rq-2 with size size 1GB
      
      If these 2 discard requests get merged, final request size will be 5GB.
      
      In this case, request's __data_len field may overflow as it can store
      max 4GB(unsigned int).
      
      This issue was observed while doing mkfs.f2fs on 5GB SD card:
      https://lkml.org/lkml/2013/4/1/292
      
      Info: sector size = 512
      Info: total sectors = 11370496 (in 512bytes)
      Info: zone aligned segment0 blkaddr: 512
      [  257.789764] blk_update_request: bio idx 0 >= vcnt 0
      
      mkfs process gets stuck in D state and I see the following in the dmesg:
      
      [  257.789733] __end_that: dev mmcblk0: type=1, flags=122c8081
      [  257.789764]   sector 4194304, nr/cnr 2981888/4294959104
      [  257.789764]   bio df3840c0, biotail df3848c0, buffer   (null), len
      1526726656
      [  257.789764] blk_update_request: bio idx 0 >= vcnt 0
      [  257.794921] request botched: dev mmcblk0: type=1, flags=122c8081
      [  257.794921]   sector 4194304, nr/cnr 2981888/4294959104
      [  257.794921]   bio df3840c0, biotail df3848c0, buffer   (null), len
      1526726656
      
      This patch fixes this issue.
      Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Tested-by: NMax Filippov <jcmvbkbc@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      871dd928
  7. 23 3月, 2013 1 次提交
    • L
      block: add runtime pm helpers · 6c954667
      Lin Ming 提交于
      Add runtime pm helper functions:
      
      void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
        - Initialization function for drivers to call.
      
      int blk_pre_runtime_suspend(struct request_queue *q)
        - If any requests are in the queue, mark last busy and return -EBUSY.
          Otherwise set q->rpm_status to RPM_SUSPENDING and return 0.
      
      void blk_post_runtime_suspend(struct request_queue *q, int err)
        - If the suspend succeeded then set q->rpm_status to RPM_SUSPENDED.
          Otherwise set it to RPM_ACTIVE and mark last busy.
      
      void blk_pre_runtime_resume(struct request_queue *q)
        - Set q->rpm_status to RPM_RESUMING.
      
      void blk_post_runtime_resume(struct request_queue *q, int err)
        - If the resume succeeded then set q->rpm_status to RPM_ACTIVE
          and call __blk_run_queue, then mark last busy and autosuspend.
          Otherwise set q->rpm_status to RPM_SUSPENDED.
      
      The idea and API is designed by Alan Stern and described here:
      http://marc.info/?l=linux-scsi&m=133727953625963&w=2Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6c954667
  8. 11 1月, 2013 1 次提交
  9. 10 1月, 2013 1 次提交
  10. 19 12月, 2012 1 次提交
    • L
      blk: avoid divide-by-zero with zero discard granularity · 59771079
      Linus Torvalds 提交于
      Commit 8dd2cb7e ("block: discard granularity might not be power of
      2") changed a couple of 'binary and' operations into modulus operations.
      Which turned the harmless case of a zero discard_granularity into a
      possible divide-by-zero.
      
      The code also had a much more subtle bug: it was doing the modulus of a
      value in bytes using 'sector_t'.  That was always conceptually wrong,
      but didn't actually matter back when the code assumed a power-of-two
      granularity: we only looked at the low bits anyway.
      
      But with potentially arbitrary sector numbers, using a 'sector_t' to
      express bytes is very very wrong: depending on configuration it limits
      the starting offset of the device to just 32 bits, and any overflow
      would result in a wrong value if the modulus wasn't a power-of-two.
      
      So re-write the code to not only protect against the divide-by-zero, but
      to do the starting sector arithmetic in sectors, and using the proper
      types.
      
      [ For any mathematicians out there: it also looks monumentally stupid to
        do the 'modulo granularity' operation *twice*, never mind having a "+
        granularity" in the second modulus op.
      
        But that's the easiest way to avoid negative values or overflow, and
        it is how the original code was done. ]
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: NDoug Anderson <dianders@chromium.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Shaohua Li <shli@fusionio.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59771079
  11. 15 12月, 2012 1 次提交
  12. 06 12月, 2012 3 次提交
    • B
      block: Make blk_cleanup_queue() wait until request_fn finished · 24faf6f6
      Bart Van Assche 提交于
      Some request_fn implementations, e.g. scsi_request_fn(), unlock
      the queue lock internally. This may result in multiple threads
      executing request_fn for the same queue simultaneously. Keep
      track of the number of active request_fn calls and make sure that
      blk_cleanup_queue() waits until all active request_fn invocations
      have finished. A block driver may start cleaning up resources
      needed by its request_fn as soon as blk_cleanup_queue() finished,
      so blk_cleanup_queue() must wait for all outstanding request_fn
      invocations to finish.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reported-by: NChanho Min <chanho.min@lge.com>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      24faf6f6
    • B
      block: Avoid that request_fn is invoked on a dead queue · c246e80d
      Bart Van Assche 提交于
      A block driver may start cleaning up resources needed by its
      request_fn as soon as blk_cleanup_queue() finished, so request_fn
      must not be invoked after draining finished. This is important
      when blk_run_queue() is invoked without any requests in progress.
      As an example, if blk_drain_queue() and scsi_run_queue() run in
      parallel, blk_drain_queue() may have finished all requests after
      scsi_run_queue() has taken a SCSI device off the starved list but
      before that last function has had a chance to run the queue.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Chanho Min <chanho.min@lge.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c246e80d
    • B
      block: Rename queue dead flag · 3f3299d5
      Bart Van Assche 提交于
      QUEUE_FLAG_DEAD is used to indicate that queuing new requests must
      stop. After this flag has been set queue draining starts. However,
      during the queue draining phase it is still safe to invoke the
      queue's request_fn, so QUEUE_FLAG_DYING is a better name for this
      flag.
      
      This patch has been generated by running the following command
      over the kernel source tree:
      
      git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' |
          xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g'      \
              -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g';                \
      sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \
          include/linux/blkdev.h;                                       \
      sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \
          -e 's/Dead queue/A dying queue/' block/blk-core.c
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Chanho Min <chanho.min@lge.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f3299d5
  13. 20 9月, 2012 3 次提交
  14. 09 8月, 2012 1 次提交
  15. 03 8月, 2012 1 次提交
    • A
      block: Add blk_bio_map_sg() helper · 85b9f66a
      Asias He 提交于
      Add a helper to map a bio to a scatterlist, modelled after
      blk_rq_map_sg.
      
      This helper is useful for any driver that wants to create
      a scatterlist from its ->make_request_fn method.
      
      Changes in v2:
       - Use __blk_segment_map_sg to avoid duplicated code
       - Add cocbook style function comment
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      85b9f66a
  16. 02 8月, 2012 1 次提交
    • P
      block: split discard into aligned requests · c6e66634
      Paolo Bonzini 提交于
      When a disk has large discard_granularity and small max_discard_sectors,
      discards are not split with optimal alignment.  In the limit case of
      discard_granularity == max_discard_sectors, no request could be aligned
      correctly, so in fact you might end up with no discarded logical blocks
      at all.
      
      Another example that helps showing the condition in the patch is with
      discard_granularity == 64, max_discard_sectors == 128.  A request that is
      submitted for 256 sectors 2..257 will be split in two: 2..129, 130..257.
      However, only 2 aligned blocks out of 3 are included in the request;
      128..191 may be left intact and not discarded.  With this patch, the
      first request will be truncated to ensure good alignment of what's left,
      and the split will be 2..127, 128..255, 256..257.  The patch will also
      take into account the discard_alignment.
      
      At most one extra request will be introduced, because the first request
      will be reduced by at most granularity-1 sectors, and granularity
      must be less than max_discard_sectors.  Subsequent requests will run
      on round_down(max_discard_sectors, granularity) sectors, as in the
      current code.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c6e66634
  17. 31 7月, 2012 2 次提交
  18. 27 6月, 2012 1 次提交
    • T
      blkcg: implement per-blkg request allocation · a051661c
      Tejun Heo 提交于
      Currently, request_queue has one request_list to allocate requests
      from regardless of blkcg of the IO being issued.  When the unified
      request pool is used up, cfq proportional IO limits become meaningless
      - whoever grabs the next request being freed wins the race regardless
      of the configured weights.
      
      This can be easily demonstrated by creating a blkio cgroup w/ very low
      weight, put a program which can issue a lot of random direct IOs there
      and running a sequential IO from a different cgroup.  As soon as the
      request pool is used up, the sequential IO bandwidth crashes.
      
      This patch implements per-blkg request_list.  Each blkg has its own
      request_list and any IO allocates its request from the matching blkg
      making blkcgs completely isolated in terms of request allocation.
      
      * Root blkcg uses the request_list embedded in each request_queue,
        which was renamed to @q->root_rl from @q->rq.  While making blkcg rl
        handling a bit harier, this enables avoiding most overhead for root
        blkcg.
      
      * Queue fullness is properly per request_list but bdi isn't blkcg
        aware yet, so congestion state currently just follows the root
        blkcg.  As writeback isn't aware of blkcg yet, this works okay for
        async congestion but readahead may get the wrong signals.  It's
        better than blkcg completely collapsing with shared request_list but
        needs to be improved with future changes.
      
      * After this change, each block cgroup gets a full request pool making
        resource consumption of each cgroup higher.  This makes allowing
        non-root users to create cgroups less desirable; however, note that
        allowing non-root users to directly manage cgroups is already
        severely broken regardless of this patch - each block cgroup
        consumes kernel memory and skews IO weight (IO weights are not
        hierarchical).
      
      v2: queue-sysfs.txt updated and patch description udpated as suggested
          by Vivek.
      
      v3: blk_get_rl() wasn't checking error return from
          blkg_lookup_create() and may cause oops on lookup failure.  Fix it
          by falling back to root_rl on blkg lookup failures.  This problem
          was spotted by Rakesh Iyer <rni@google.com>.
      
      v4: Updated to accomodate 458f27a9 "block: Avoid missed wakeup in
          request waitqueue".  blk_drain_queue() now wakes up waiters on all
          blkg->rl on the target queue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a051661c
  19. 25 6月, 2012 2 次提交
    • T
      block: prepare for multiple request_lists · 5b788ce3
      Tejun Heo 提交于
      Request allocation is about to be made per-blkg meaning that there'll
      be multiple request lists.
      
      * Make queue full state per request_list.  blk_*queue_full() functions
        are renamed to blk_*rl_full() and takes @rl instead of @q.
      
      * Rename blk_init_free_list() to blk_init_rl() and make it take @rl
        instead of @q.  Also add @gfp_mask parameter.
      
      * Add blk_exit_rl() instead of destroying rl directly from
        blk_release_queue().
      
      * Add request_list->q and make request alloc/free functions -
        blk_free_request(), [__]freed_request(), __get_request() - take @rl
        instead of @q.
      
      This patch doesn't introduce any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5b788ce3
    • T
      block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv · 8a5ecdd4
      Tejun Heo 提交于
      Add q->nr_rqs[] which currently behaves the same as q->rq.count[] and
      move q->rq.elvpriv to q->nr_rqs_elvpriv.  blk_drain_queue() is updated
      to use q->nr_rqs[] instead of q->rq.count[].
      
      These counters separates queue-wide request statistics from the
      request list and allow implementation of per-queue request allocation.
      
      While at it, properly indent fields of struct request_list.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8a5ecdd4
  20. 15 6月, 2012 1 次提交
    • A
      block: Drop dead function blk_abort_queue() · 76aaa510
      Asias He 提交于
      This function was only used by btrfs code in btrfs_abort_devices()
      (seems in a wrong way).
      
      It was removed in commit d07eb911,
      So, Let's remove the dead code to avoid any confusion.
      
      Changes in v2: update commit log, btrfs_abort_devices() was removed
      already.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-kernel@vger.kernel.org
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: linux-btrfs@vger.kernel.org
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      76aaa510
  21. 14 5月, 2012 1 次提交
    • R
      Fix blkdev.h build errors when BLOCK=n · 85fd0bc9
      Russell King 提交于
      I see builds failing with:
      
        CC [M]  drivers/mmc/host/dw_mmc.o
      In file included from drivers/mmc/host/dw_mmc.c:15:
      include/linux/blkdev.h:1404: warning: 'struct task_struct' declared inside parameter list
      include/linux/blkdev.h:1404: warning: its scope is only this definition or declaration, which is probably not what you want
      include/linux/blkdev.h:1408: warning: 'struct task_struct' declared inside parameter list
      include/linux/blkdev.h:1413: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'blk_needs_flush_plug'
      make[4]: *** [drivers/mmc/host/dw_mmc.o] Error 1
      
      This is because dw_mmc.c includes linux/blkdev.h as the very first file,
      and when CONFIG_BLOCK=n, blkdev.h omits all includes.
      
      As it requires linux/sched.h even when CONFIG_BLOCK=n, move this out of
      the #ifdef.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      85fd0bc9
  22. 20 4月, 2012 4 次提交
    • T
      blkcg: mass rename of blkcg API · 3c798398
      Tejun Heo 提交于
      During the recent blkcg cleanup, most of blkcg API has changed to such
      extent that mass renaming wouldn't cause any noticeable pain.  Take
      the chance and cleanup the naming.
      
      * Rename blkio_cgroup to blkcg.
      
      * Drop blkio / blkiocg prefixes and consistently use blkcg.
      
      * Rename blkio_group to blkcg_gq, which is consistent with io_cq but
        keep the blkg prefix / variable name.
      
      * Rename policy method type and field names to signify they're dealing
        with policy data.
      
      * Rename blkio_policy_type to blkcg_policy.
      
      This patch doesn't cause any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3c798398
    • T
      blkcg: implement per-queue policy activation · a2b1693b
      Tejun Heo 提交于
      All blkcg policies were assumed to be enabled on all request_queues.
      Due to various implementation obstacles, during the recent blkcg core
      updates, this was temporarily implemented as shooting down all !root
      blkgs on elevator switch and policy [de]registration combined with
      half-broken in-place root blkg updates.  In addition to being buggy
      and racy, this meant losing all blkcg configurations across those
      events.
      
      Now that blkcg is cleaned up enough, this patch replaces the temporary
      implementation with proper per-queue policy activation.  Each blkcg
      policy should call the new blkcg_[de]activate_policy() to enable and
      disable the policy on a specific queue.  blkcg_activate_policy()
      allocates and installs policy data for the policy for all existing
      blkgs.  blkcg_deactivate_policy() does the reverse.  If a policy is
      not enabled for a given queue, blkg printing / config functions skip
      the respective blkg for the queue.
      
      blkcg_activate_policy() also takes care of root blkg creation, and
      cfq_init_queue() and blk_throtl_init() are updated accordingly.
      
      This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
      unnecessary.  Dropped.
      
      v2: cfq_init_queue() was returning uninitialized @ret on root_group
          alloc failure if !CONFIG_CFQ_GROUP_IOSCHED.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a2b1693b
    • T
      blkcg: add request_queue->root_blkg · 03d8e111
      Tejun Heo 提交于
      With per-queue policy activation, root blkg creation will be moved to
      blkcg core.  Add q->root_blkg in preparation.  For blk-throtl, this
      replaces throtl_data->root_tg; however, cfq needs to keep
      cfqd->root_group for !CONFIG_CFQ_GROUP_IOSCHED.
      
      This is to prepare for per-queue policy activation and doesn't cause
      any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      03d8e111
    • T
      blkcg: remove static policy ID enums · 8bd435b3
      Tejun Heo 提交于
      Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate
      @pol->plid dynamically on registration.  The maximum number of blkcg
      policies which can be registered at the same time is defined by
      BLKCG_MAX_POLS constant added to include/linux/blkdev.h.
      
      Note that blkio_policy_register() now may fail.  Policy init functions
      updated accordingly and unnecessary ifdefs removed from cfq_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8bd435b3
  23. 30 3月, 2012 1 次提交
  24. 07 3月, 2012 5 次提交
    • T
      blkcg: drop unnecessary RCU locking · c875f4d0
      Tejun Heo 提交于
      Now that blkg additions / removals are always done under both q and
      blkcg locks, the only places RCU locking is necessary are
      blkg_lookup[_create]() for lookup w/o blkcg lock.  This patch drops
      unncessary RCU locking replacing it with plain blkcg locking as
      necessary.
      
      * blkiocg_pre_destroy() already perform proper locking and don't need
        RCU.  Dropped.
      
      * blkio_read_blkg_stats() now uses blkcg->lock instead of RCU read
        lock.  This isn't a hot path.
      
      * Now unnecessary synchronize_rcu() from queue exit paths removed.
        This makes q->nr_blkgs unnecessary.  Dropped.
      
      * RCU annotation on blkg->q removed.
      
      -v2: Vivek pointed out that blkg_lookup_create() still needs to be
           called under rcu_read_lock().  Updated.
      
      -v3: After the update, stats_lock locking in blkio_read_blkg_stats()
           shouldn't be using _irq variant as it otherwise ends up enabling
           irq while blkcg->lock is locked.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c875f4d0
    • T
      blkcg: let blkcg core manage per-queue blkg list and counter · 03aa264a
      Tejun Heo 提交于
      With the previous patch to move blkg list heads and counters to
      request_queue and blkg, logic to manage them in both policies are
      almost identical and can be moved to blkcg core.
      
      This patch moves blkg link logic into blkg_lookup_create(), implements
      common blkg unlink code in blkg_destroy(), and updates
      blkg_destory_all() so that it's policy specific and can skip root
      group.  The updated blkg_destroy_all() is now used to both clear queue
      for bypassing and elv switching, and release all blkgs on q exit.
      
      This patch introduces a race window where policy [de]registration may
      race against queue blkg clearing.  This can only be a problem on cfq
      unload and shouldn't be a real problem in practice (and we have many
      other places where this race already exists).  Future patches will
      remove these unlikely races.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      03aa264a
    • T
      blkcg: move per-queue blkg list heads and counters to queue and blkg · 4eef3049
      Tejun Heo 提交于
      Currently, specific policy implementations are responsible for
      maintaining list and number of blkgs.  This duplicates code
      unnecessarily, and hinders factoring common code and providing blkcg
      API with better defined semantics.
      
      After this patch, request_queue hosts list heads and counters and blkg
      has list nodes for both policies.  This patch only relocates the
      necessary fields and the next patch will actually move management code
      into blkcg core.
      
      Note that request_queue->blkg_list[] and ->nr_blkgs[] are hardcoded to
      have 2 elements.  This is to avoid include dependency and will be
      removed by the next patch.
      
      This patch doesn't introduce any behavior change.
      
      -v2: Now unnecessary conditional on CONFIG_BLK_CGROUP_MODULE removed
           as pointed out by Vivek.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4eef3049
    • T
      blkcg: clear all request_queues on blkcg policy [un]registrations · 923adde1
      Tejun Heo 提交于
      Keep track of all request_queues which have blkcg initialized and turn
      on bypass and invoke blkcg_clear_queue() on all before making changes
      to blkcg policies.
      
      This is to prepare for moving blkg management into blkcg core.  Note
      that this uses more brute force than necessary.  Finer grained shoot
      down will be implemented later and given that policy [un]registration
      almost never happens on running systems (blk-throtl can't be built as
      a module and cfq usually is the builtin default iosched), this
      shouldn't be a problem for the time being.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      923adde1
    • T
      block: implement blk_queue_bypass_start/end() · d732580b
      Tejun Heo 提交于
      Rename and extend elv_queisce_start/end() to
      blk_queue_bypass_start/end() which are exported and supports nesting
      via @q->bypass_depth.  Also add blk_queue_bypass() to test bypass
      state.
      
      This will be further extended and used for blkio_group management.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d732580b