1. 18 6月, 2014 1 次提交
  2. 06 6月, 2014 2 次提交
    • J
      block: add blk_rq_set_block_pc() · f27b087b
      Jens Axboe 提交于
      With the optimizations around not clearing the full request at alloc
      time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
      up to the user allocating the request.
      
      Add a blk_rq_set_block_pc() that sets the command type to
      REQ_TYPE_BLOCK_PC, and properly initializes the members associated
      with this type of request. Update callers to use this function instead
      of manipulating rq->cmd_type directly.
      
      Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed
      attempt.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f27b087b
    • J
      block: add notion of a chunk size for request merging · 762380ad
      Jens Axboe 提交于
      Some drivers have different limits on what size a request should
      optimally be, depending on the offset of the request. Similar to
      dividing a device into chunks. Add a setting that allows the driver
      to inform the block layer of such a chunk size. The block layer will
      then prevent merging across the chunks.
      
      This is needed to optimally support NVMe with a non-zero stripe size.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      762380ad
  3. 05 6月, 2014 2 次提交
  4. 04 6月, 2014 1 次提交
  5. 29 5月, 2014 2 次提交
    • J
      block: add queue flag for disabling SG merging · 05f1dd53
      Jens Axboe 提交于
      If devices are not SG starved, we waste a lot of time potentially
      collapsing SG segments. Enough that 1.5% of the CPU time goes
      to this, at only 400K IOPS. Add a queue flag, QUEUE_FLAG_NO_SG_MERGE,
      which just returns the number of vectors in a bio instead of looping
      over all segments and checking for collapsible ones.
      
      Add a BLK_MQ_F_SG_MERGE flag so that drivers can opt-in on the sg
      merging, if they so desire.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      05f1dd53
    • J
      block: remove 'magic' from struct blk_plug · 4d92a9be
      Jens Axboe 提交于
      I don't think we've ever caught any bugs with this, and there's the
      list poisoning for the plug lists to catch uninitialized cases.
      So remove the magic member and save 8 bytes in the struct.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4d92a9be
  6. 28 5月, 2014 1 次提交
  7. 14 5月, 2014 1 次提交
    • J
      blk-mq: improve support for shared tags maps · 0d2602ca
      Jens Axboe 提交于
      This adds support for active queue tracking, meaning that the
      blk-mq tagging maintains a count of active users of a tag set.
      This allows us to maintain a notion of fairness between users,
      so that we can distribute the tag depth evenly without starving
      some users while allowing others to try unfair deep queues.
      
      If sharing of a tag set is detected, each hardware queue will
      track the depth of its own queue. And if this exceeds the total
      depth divided by the number of active queues, the user is actively
      throttled down.
      
      The active queue count is done lazily to avoid bouncing that data
      between submitter and completer. Each hardware queue gets marked
      active when it allocates its first tag, and gets marked inactive
      when 1) the last tag is cleared, and 2) the queue timeout grace
      period has passed.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0d2602ca
  8. 09 5月, 2014 1 次提交
  9. 17 4月, 2014 3 次提交
  10. 16 4月, 2014 2 次提交
  11. 10 4月, 2014 3 次提交
    • J
      block: fix regression with block enabled tagging · 360f92c2
      Jens Axboe 提交于
      Martin reported that his test system would not boot with
      current git, it oopsed with this:
      
      BUG: unable to handle kernel paging request at ffff88046c6c9e80
      IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
      PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060
      Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
      Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm
      rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon
      CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246
      Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
      Workqueue: events_unbound async_run_entry_fn
      task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000
      RIP: 0010:[<ffffffff812971e0>]  [<ffffffff812971e0>]
      blk_queue_start_tag+0x90/0x150
      RSP: 0018:ffff880273d03a58  EFLAGS: 00010092
      RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6
      RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d
      RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410
      R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0
      R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e
      FS:  0000000000000000(0000) GS:ffff880277b00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0
      Stack:
       ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000
       ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62
       ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8
      Call Trace:
       [<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510
       [<ffffffff81293167>] __blk_run_queue+0x37/0x50
       [<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130
       [<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0
       [<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0
       [<ffffffff813bba35>] scsi_execute+0xe5/0x180
       [<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110
       [<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod]
       [<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0
       [<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod]
       [<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod]
       [<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140
       [<ffffffff8106a1c5>] process_one_work+0x175/0x410
       [<ffffffff8106b373>] worker_thread+0x123/0x400
       [<ffffffff8106b250>] ? manage_workers+0x160/0x160
       [<ffffffff8107104e>] kthread+0xce/0xf0
       [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0
       [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
      Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89
      df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89
      58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d
      RIP  [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
       RSP <ffff880273d03a58>
      CR2: ffff88046c6c9e80
      
      Martin bisected and found this to be the problem patch;
      
      	commit 6d113398
      	Author: Jan Kara <jack@suse.cz>
      	Date:   Mon Feb 24 16:39:54 2014 +0100
      
      	    block: Stop abusing rq->csd.list in blk-softirq
      
      and the problem was immediately apparent. The patch states that
      it is safe to reuse queuelist at completion time, since it is
      no longer used. However, that is not true if a device is using
      block enabled tagging. If that is the case, then the queuelist
      is reused to keep track of busy tags. If a device also ended
      up using softirq completions, we'd reuse ->queuelist for the
      IPI handling while block tagging was still using it. Boom.
      
      Fix this by adding a new ipi_list list head, and share the
      memory used with the request hash table. The hash table is
      never used after the request is moved to the dispatch list,
      which happens long before any potential completion of the
      request. Add a new request bit for this, so we don't have
      cases that check rq->hash while it could potentially have
      been reused for the IPI completion.
      Reported-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      360f92c2
    • J
      block: add kblockd_schedule_delayed_work_on() · 8ab14595
      Jens Axboe 提交于
      Same function as kblockd_schedule_delayed_work(), but allow the
      caller to pass in a CPU that the work should be executed on. This
      just directly extends and maps into the workqueue API, and will
      be used to make the blk-mq mappings more strict.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ab14595
    • J
      block: remove 'q' parameter from kblockd_schedule_*_work() · 59c3d45e
      Jens Axboe 提交于
      The queue parameter is never used, just get rid of it.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      59c3d45e
  12. 02 4月, 2014 1 次提交
  13. 25 2月, 2014 1 次提交
  14. 11 2月, 2014 1 次提交
    • C
      blk-mq: rework flush sequencing logic · 18741986
      Christoph Hellwig 提交于
      Witch to using a preallocated flush_rq for blk-mq similar to what's done
      with the old request path.  This allows us to set up the request properly
      with a tag from the actually allowed range and ->rq_disk as needed by
      some drivers.  To make life easier we also switch to dynamic allocation
      of ->flush_rq for the old path.
      
      This effectively reverts most of
      
          "blk-mq: fix for flush deadlock"
      
      and
      
          "blk-mq: Don't reserve a tag for flush request"
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      18741986
  15. 31 1月, 2014 1 次提交
  16. 09 1月, 2014 1 次提交
    • K
      bcache/md: Use raid stripe size · c78afc62
      Kent Overstreet 提交于
      Now that we've got code for raid5/6 stripe awareness, bcache just needs
      to know about the stripes and when writing partial stripes is expensive
      - we probably don't want to enable this optimization for raid1 or 10,
      even though they have stripes. So add a flag to queue_limits.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      c78afc62
  17. 24 11月, 2013 2 次提交
    • K
      block: Immutable bio vecs · 4550dd6c
      Kent Overstreet 提交于
      This adds a mechanism by which we can advance a bio by an arbitrary
      number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
      indicates the number of bytes completed in the current bvec.
      
      Various driver code still needs to be updated to not refer to the bvec
      directly before we can use this for interesting things, like efficient
      bio splitting.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      4550dd6c
    • K
      block: Convert bio_for_each_segment() to bvec_iter · 7988613b
      Kent Overstreet 提交于
      More prep work for immutable biovecs - with immutable bvecs drivers
      won't be able to use the biovec directly, they'll need to use helpers
      that take into account bio->bi_iter.bi_bvec_done.
      
      This updates callers for the new usage without changing the
      implementation yet.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Paul Clements <Paul.Clements@steeleye.com>
      Cc: Jim Paris <jim@jtan.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
      Cc: support@lsi.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: drbd-user@lists.linbit.com
      Cc: nbd-general@lists.sourceforge.net
      Cc: cbe-oss-dev@lists.ozlabs.org
      Cc: xen-devel@lists.xensource.com
      Cc: virtualization@lists.linux-foundation.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: DL-MPTFusionLinux@lsi.com
      Cc: linux-scsi@vger.kernel.org
      Cc: devel@driverdev.osuosl.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: cluster-devel@redhat.com
      Cc: linux-mm@kvack.org
      Acked-by: NGeoff Levand <geoff@infradead.org>
      7988613b
  18. 20 11月, 2013 1 次提交
  19. 25 10月, 2013 3 次提交
    • J
      blk-mq: new multi-queue block IO queueing mechanism · 320ae51f
      Jens Axboe 提交于
      Linux currently has two models for block devices:
      
      - The classic request_fn based approach, where drivers use struct
        request units for IO. The block layer provides various helper
        functionalities to let drivers share code, things like tag
        management, timeout handling, queueing, etc.
      
      - The "stacked" approach, where a driver squeezes in between the
        block layer and IO submitter. Since this bypasses the IO stack,
        driver generally have to manage everything themselves.
      
      With drivers being written for new high IOPS devices, the classic
      request_fn based driver doesn't work well enough. The design dates
      back to when both SMP and high IOPS was rare. It has problems with
      scaling to bigger machines, and runs into scaling issues even on
      smaller machines when you have IOPS in the hundreds of thousands
      per device.
      
      The stacked approach is then most often selected as the model
      for the driver. But this means that everybody has to re-invent
      everything, and along with that we get all the problems again
      that the shared approach solved.
      
      This commit introduces blk-mq, block multi queue support. The
      design is centered around per-cpu queues for queueing IO, which
      then funnel down into x number of hardware submission queues.
      We might have a 1:1 mapping between the two, or it might be
      an N:M mapping. That all depends on what the hardware supports.
      
      blk-mq provides various helper functions, which include:
      
      - Scalable support for request tagging. Most devices need to
        be able to uniquely identify a request both in the driver and
        to the hardware. The tagging uses per-cpu caches for freed
        tags, to enable cache hot reuse.
      
      - Timeout handling without tracking request on a per-device
        basis. Basically the driver should be able to get a notification,
        if a request happens to fail.
      
      - Optional support for non 1:1 mappings between issue and
        submission queues. blk-mq can redirect IO completions to the
        desired location.
      
      - Support for per-request payloads. Drivers almost always need
        to associate a request structure with some driver private
        command structure. Drivers can tell blk-mq this at init time,
        and then any request handed to the driver will have the
        required size of memory associated with it.
      
      - Support for merging of IO, and plugging. The stacked model
        gets neither of these. Even for high IOPS devices, merging
        sequential IO reduces per-command overhead and thus
        increases bandwidth.
      
      For now, this is provided as a potential 3rd queueing model, with
      the hope being that, as it matures, it can replace both the classic
      and stacked model. That would get us back to having just 1 real
      model for block devices, leaving the stacked approach to dm/md
      devices (as it was originally intended).
      
      Contributions in this patch from the following people:
      
      Shaohua Li <shli@fusionio.com>
      Alexander Gordeev <agordeev@redhat.com>
      Christoph Hellwig <hch@infradead.org>
      Mike Christie <michaelc@cs.wisc.edu>
      Matias Bjorling <m@bjorling.me>
      Jeff Moyer <jmoyer@redhat.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      320ae51f
    • C
      block: remove request ref_count · 71fe07d0
      Christoph Hellwig 提交于
      This reference count has been around since before git history, but the only
      place where it's used is in blk_execute_rq, and ther it is entirely useless
      as it is incremented before submitting the request and decremented in the
      end_io handler before waking up the submitter thread.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      71fe07d0
    • J
      block: make rq->cmd_flags be 64-bit · 5953316d
      Jens Axboe 提交于
      We have officially run out of flags in a 32-bit space. Extend it
      to 64-bit even on 32-bit archs.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5953316d
  20. 22 9月, 2013 1 次提交
  21. 07 5月, 2013 1 次提交
  22. 24 4月, 2013 1 次提交
    • J
      block: fix max discard sectors limit · 871dd928
      James Bottomley 提交于
      linux-v3.8-rc1 and later support for plug for blkdev_issue_discard with
      commit 0cfbcafc
      (block: add plug for blkdev_issue_discard )
      
      For example,
      1) DISCARD rq-1 with size size 4GB
      2) DISCARD rq-2 with size size 1GB
      
      If these 2 discard requests get merged, final request size will be 5GB.
      
      In this case, request's __data_len field may overflow as it can store
      max 4GB(unsigned int).
      
      This issue was observed while doing mkfs.f2fs on 5GB SD card:
      https://lkml.org/lkml/2013/4/1/292
      
      Info: sector size = 512
      Info: total sectors = 11370496 (in 512bytes)
      Info: zone aligned segment0 blkaddr: 512
      [  257.789764] blk_update_request: bio idx 0 >= vcnt 0
      
      mkfs process gets stuck in D state and I see the following in the dmesg:
      
      [  257.789733] __end_that: dev mmcblk0: type=1, flags=122c8081
      [  257.789764]   sector 4194304, nr/cnr 2981888/4294959104
      [  257.789764]   bio df3840c0, biotail df3848c0, buffer   (null), len
      1526726656
      [  257.789764] blk_update_request: bio idx 0 >= vcnt 0
      [  257.794921] request botched: dev mmcblk0: type=1, flags=122c8081
      [  257.794921]   sector 4194304, nr/cnr 2981888/4294959104
      [  257.794921]   bio df3840c0, biotail df3848c0, buffer   (null), len
      1526726656
      
      This patch fixes this issue.
      Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Tested-by: NMax Filippov <jcmvbkbc@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      871dd928
  23. 23 3月, 2013 1 次提交
    • L
      block: add runtime pm helpers · 6c954667
      Lin Ming 提交于
      Add runtime pm helper functions:
      
      void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
        - Initialization function for drivers to call.
      
      int blk_pre_runtime_suspend(struct request_queue *q)
        - If any requests are in the queue, mark last busy and return -EBUSY.
          Otherwise set q->rpm_status to RPM_SUSPENDING and return 0.
      
      void blk_post_runtime_suspend(struct request_queue *q, int err)
        - If the suspend succeeded then set q->rpm_status to RPM_SUSPENDED.
          Otherwise set it to RPM_ACTIVE and mark last busy.
      
      void blk_pre_runtime_resume(struct request_queue *q)
        - Set q->rpm_status to RPM_RESUMING.
      
      void blk_post_runtime_resume(struct request_queue *q, int err)
        - If the resume succeeded then set q->rpm_status to RPM_ACTIVE
          and call __blk_run_queue, then mark last busy and autosuspend.
          Otherwise set q->rpm_status to RPM_SUSPENDED.
      
      The idea and API is designed by Alan Stern and described here:
      http://marc.info/?l=linux-scsi&m=133727953625963&w=2Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6c954667
  24. 11 1月, 2013 1 次提交
  25. 10 1月, 2013 1 次提交
  26. 19 12月, 2012 1 次提交
    • L
      blk: avoid divide-by-zero with zero discard granularity · 59771079
      Linus Torvalds 提交于
      Commit 8dd2cb7e ("block: discard granularity might not be power of
      2") changed a couple of 'binary and' operations into modulus operations.
      Which turned the harmless case of a zero discard_granularity into a
      possible divide-by-zero.
      
      The code also had a much more subtle bug: it was doing the modulus of a
      value in bytes using 'sector_t'.  That was always conceptually wrong,
      but didn't actually matter back when the code assumed a power-of-two
      granularity: we only looked at the low bits anyway.
      
      But with potentially arbitrary sector numbers, using a 'sector_t' to
      express bytes is very very wrong: depending on configuration it limits
      the starting offset of the device to just 32 bits, and any overflow
      would result in a wrong value if the modulus wasn't a power-of-two.
      
      So re-write the code to not only protect against the divide-by-zero, but
      to do the starting sector arithmetic in sectors, and using the proper
      types.
      
      [ For any mathematicians out there: it also looks monumentally stupid to
        do the 'modulo granularity' operation *twice*, never mind having a "+
        granularity" in the second modulus op.
      
        But that's the easiest way to avoid negative values or overflow, and
        it is how the original code was done. ]
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: NDoug Anderson <dianders@chromium.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Shaohua Li <shli@fusionio.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59771079
  27. 15 12月, 2012 1 次提交
  28. 06 12月, 2012 2 次提交
    • B
      block: Make blk_cleanup_queue() wait until request_fn finished · 24faf6f6
      Bart Van Assche 提交于
      Some request_fn implementations, e.g. scsi_request_fn(), unlock
      the queue lock internally. This may result in multiple threads
      executing request_fn for the same queue simultaneously. Keep
      track of the number of active request_fn calls and make sure that
      blk_cleanup_queue() waits until all active request_fn invocations
      have finished. A block driver may start cleaning up resources
      needed by its request_fn as soon as blk_cleanup_queue() finished,
      so blk_cleanup_queue() must wait for all outstanding request_fn
      invocations to finish.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reported-by: NChanho Min <chanho.min@lge.com>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      24faf6f6
    • B
      block: Avoid that request_fn is invoked on a dead queue · c246e80d
      Bart Van Assche 提交于
      A block driver may start cleaning up resources needed by its
      request_fn as soon as blk_cleanup_queue() finished, so request_fn
      must not be invoked after draining finished. This is important
      when blk_run_queue() is invoked without any requests in progress.
      As an example, if blk_drain_queue() and scsi_run_queue() run in
      parallel, blk_drain_queue() may have finished all requests after
      scsi_run_queue() has taken a SCSI device off the starved list but
      before that last function has had a chance to run the queue.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Chanho Min <chanho.min@lge.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c246e80d