1. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  2. 18 8月, 2017 2 次提交
  3. 14 6月, 2017 2 次提交
  4. 09 6月, 2017 1 次提交
  5. 01 11月, 2016 1 次提交
  6. 08 6月, 2016 2 次提交
  7. 05 1月, 2016 5 次提交
  8. 18 12月, 2015 1 次提交
  9. 23 10月, 2015 2 次提交
  10. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  11. 24 7月, 2015 1 次提交
  12. 17 6月, 2015 1 次提交
  13. 06 6月, 2015 1 次提交
    • B
      xen/block: add multi-page ring support · 86839c56
      Bob Liu 提交于
      Extend xen/block to support multi-page ring, so that more requests can be
      issued by using more than one pages as the request ring between blkfront
      and backend.
      As a result, the performance can get improved significantly.
      
      We got some impressive improvements on our highend iscsi storage cluster
      backend. If using 64 pages as the ring, the IOPS increased about 15 times
      for the throughput testing and above doubled for the latency testing.
      
      The reason was the limit on outstanding requests is 32 if use only one-page
      ring, but in our case the iscsi lun was spread across about 100 physical
      drives, 32 was really not enough to keep them busy.
      
      Changes in v2:
       - Rebased to 4.0-rc6.
       - Document on how multi-page ring feature working to linux io/blkif.h.
      
      Changes in v3:
       - Remove changes to linux io/blkif.h and follow the protocol defined
         in io/blkif.h of XEN tree.
       - Rebased to 4.1-rc3
      
      Changes in v4:
       - Turn to use 'ring-page-order' and 'max-ring-page-order'.
       - A few comments from Roger.
      
      Changes in v5:
       - Clarify with 4k granularity to comment
       - Address more comments from Roger
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      86839c56
  14. 27 4月, 2015 2 次提交
  15. 07 4月, 2015 1 次提交
  16. 28 1月, 2015 2 次提交
  17. 02 10月, 2014 1 次提交
  18. 12 2月, 2014 1 次提交
  19. 08 2月, 2014 4 次提交
  20. 03 2月, 2014 1 次提交
  21. 31 1月, 2014 1 次提交
    • Z
      xen/grant-table: Avoid m2p_override during mapping · 08ece5bb
      Zoltan Kiss 提交于
      The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
      for blkback and future netback patches it just cause a lock contention, as
      those pages never go to userspace. Therefore this series does the following:
      - the original functions were renamed to __gnttab_[un]map_refs, with a new
        parameter m2p_override
      - based on m2p_override either they follow the original behaviour, or just set
        the private flag and call set_phys_to_machine
      - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
        m2p_override false
      - a new function gnttab_[un]map_refs_userspace provides the old behaviour
      
      It also removes a stray space from page.h and change ret to 0 if
      XENFEAT_auto_translated_physmap, as that is the only possible return value
      there.
      
      v2:
      - move the storing of the old mfn in page->index to gnttab_map_refs
      - move the function header update to a separate patch
      
      v3:
      - a new approach to retain old behaviour where it needed
      - squash the patches into one
      
      v4:
      - move out the common bits from m2p* functions, and pass pfn/mfn as parameter
      - clear page->private before doing anything with the page, so m2p_find_override
        won't race with this
      
      v5:
      - change return value handling in __gnttab_[un]map_refs
      - remove a stray space in page.h
      - add detail why ret = 0 now at some places
      
      v6:
      - don't pass pfn to m2p* functions, just get it locally
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      08ece5bb
  22. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  23. 09 11月, 2013 1 次提交
  24. 25 6月, 2013 1 次提交
  25. 22 6月, 2013 1 次提交
  26. 18 6月, 2013 1 次提交
    • K
      xen/blkback: Check for insane amounts of request on the ring (v6). · 8e3f8755
      Konrad Rzeszutek Wilk 提交于
      Check that the ring does not have an insane amount of requests
      (more than there could fit on the ring).
      
      If we detect this case we will stop processing the requests
      and wait until the XenBus disconnects the ring.
      
      The existing check RING_REQUEST_CONS_OVERFLOW which checks for how
      many responses we have created in the past (rsp_prod_pvt) vs
      requests consumed (req_cons) and whether said difference is greater or
      equal to the size of the ring, does not catch this case.
      
      Wha the condition does check if there is a need to process more
      as we still have a backlog of responses to finish. Note that both
      of those values (rsp_prod_pvt and req_cons) are not exposed on the
      shared ring.
      
      To understand this problem a mini crash course in ring protocol
      response/request updates is in place.
      
      There are four entries: req_prod and rsp_prod; req_event and rsp_event
      to track the ring entries. We are only concerned about the first two -
      which set the tone of this bug.
      
      The req_prod is a value incremented by frontend for each request put
      on the ring. Conversely the rsp_prod is a value incremented by the backend
      for each response put on the ring (rsp_prod gets set by rsp_prod_pvt when
      pushing the responses on the ring).  Both values can
      wrap and are modulo the size of the ring (in block case that is 32).
      Please see RING_GET_REQUEST and RING_GET_RESPONSE for the more details.
      
      The culprit here is that if the difference between the
      req_prod and req_cons is greater than the ring size we have a problem.
      Fortunately for us, the '__do_block_io_op' loop:
      
      	rc = blk_rings->common.req_cons;
      	rp = blk_rings->common.sring->req_prod;
      
      	while (rc != rp) {
      
      		..
      		blk_rings->common.req_cons = ++rc; /* before make_response() */
      
      	}
      
      will loop up to the point when rc == rp. The macros inside of the
      loop (RING_GET_REQUEST) is smart and is indexing based on the modulo
      of the ring size. If the frontend has provided a bogus req_prod value
      we will loop until the 'rc == rp' - which means we could be processing
      already processed requests (or responses) often.
      
      The reason the RING_REQUEST_CONS_OVERFLOW is not helping here is
      b/c it only tracks how many responses we have internally produced
      and whether we would should process more. The astute reader will
      notice that the macro RING_REQUEST_CONS_OVERFLOW provides two
      arguments - more on this later.
      
      For example, if we were to enter this function with these values:
      
             	blk_rings->common.sring->req_prod =  X+31415 (X is the value from
      		the last time __do_block_io_op was called).
              blk_rings->common.req_cons = X
              blk_rings->common.rsp_prod_pvt = X
      
      The RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, blk_rings->common.req_cons)
      is doing:
      
      	req_cons - rsp_prod_pvt >= 32
      
      Which is,
      	X - X >= 32 or 0 >= 32
      
      And that is false, so we continue on looping (this bug).
      
      If we re-use said macro RING_REQUEST_CONS_OVERFLOW and pass in the rp
      instead (sring->req_prod) of rc, the this macro can do the check:
      
           req_prod - rsp_prov_pvt >= 32
      
      Which is,
             X + 31415 - X >= 32 , or 31415 >= 32
      
      which is true, so we can error out and break out of the function.
      
      Unfortunatly the difference between rsp_prov_pvt and req_prod can be
      at 32 (which would error out in the macro). This condition exists when
      the backend is lagging behind with the responses and still has not finished
      responding to all of them (so make_response has not been called), and
      the rsp_prov_pvt + 32 == req_cons. This ends up with us not being able
      to use said macro.
      
      Hence introducing a new macro called RING_REQUEST_PROD_OVERFLOW which does
      a simple check of:
      
          req_prod - rsp_prod_pvt > RING_SIZE
      
      And with the X values from above:
      
         X + 31415 - X > 32
      
      Returns true. Also not that if the ring is full (which is where
      the RING_REQUEST_CONS_OVERFLOW triggered), we would not hit the
      same condition:
      
         X + 32 - X > 32
      
      Which is false.
      
      Lets use that macro.
      Note that in v5 of this patchset the macro was different - we used an
      earlier version.
      
      Cc: stable@vger.kernel.org
      [v1: Move the check outside the loop]
      [v2: Add a pr_warn as suggested by David]
      [v3: Use RING_REQUEST_CONS_OVERFLOW as suggested by Jan]
      [v4: Move wake_up after kthread_stop as suggested by Jan]
      [v5: Use RING_REQUEST_PROD_OVERFLOW instead]
      [v6: Use RING_REQUEST_PROD_OVERFLOW - Jan's version]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      
      gadsa
      8e3f8755
  27. 08 6月, 2013 1 次提交