1. 21 6月, 2017 8 次提交
  2. 20 6月, 2017 1 次提交
    • G
      block: return on congested block device · 03a07c92
      Goldwyn Rodrigues 提交于
      A new bio operation flag REQ_NOWAIT is introduced to identify bio's
      orignating from iocb with IOCB_NOWAIT. This flag indicates
      to return immediately if a request cannot be made instead
      of retrying.
      
      Stacked devices such as md (the ones with make_request_fn hooks)
      currently are not supported because it may block for housekeeping.
      For example, an md can have a part of the device suspended.
      For this reason, only request based devices are supported.
      In the future, this feature will be expanded to stacked devices
      by teaching them how to handle the REQ_NOWAIT flags.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      03a07c92
  3. 19 6月, 2017 23 次提交
  4. 16 6月, 2017 1 次提交
  5. 13 6月, 2017 1 次提交
  6. 09 6月, 2017 3 次提交
    • C
      block: switch bios to blk_status_t · 4e4cbee9
      Christoph Hellwig 提交于
      Replace bi_error with a new bi_status to allow for a clear conversion.
      Note that device mapper overloaded bi_error with a private value, which
      we'll have to keep arround at least for now and thus propagate to a
      proper blk_status_t value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4e4cbee9
    • C
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig 提交于
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fc17b653
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  7. 08 6月, 2017 1 次提交
    • P
      block, bfq: access and cache blkg data only when safe · 8f9bebc3
      Paolo Valente 提交于
      In blk-cgroup, operations on blkg objects are protected with the
      request_queue lock. This is no more the lock that protects
      I/O-scheduler operations in blk-mq. In fact, the latter are now
      protected with a finer-grained per-scheduler-instance lock. As a
      consequence, although blkg lookups are also rcu-protected, blk-mq I/O
      schedulers may see inconsistent data when they access blkg and
      blkg-related objects. BFQ does access these objects, and does incur
      this problem, in the following case.
      
      The blkg_lookup performed in bfq_get_queue, being protected (only)
      through rcu, may happen to return the address of a copy of the
      original blkg. If this is the case, then the blkg_get performed in
      bfq_get_queue, to pin down the blkg, is useless: it does not prevent
      blk-cgroup code from destroying both the original blkg and all objects
      directly or indirectly referred by the copy of the blkg. BFQ accesses
      these objects, which typically causes a crash for NULL-pointer
      dereference of memory-protection violation.
      
      Some additional protection mechanism should be added to blk-cgroup to
      address this issue. In the meantime, this commit provides a quick
      temporary fix for BFQ: cache (when safe) blkg data that might
      disappear right after a blkg_lookup.
      
      In particular, this commit exploits the following facts to achieve its
      goal without introducing further locks.  Destroy operations on a blkg
      invoke, as a first step, hooks of the scheduler associated with the
      blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
      consequence, for any blkg associated with the request queue an
      instance of BFQ is attached to, we are guaranteed that such a blkg is
      not destroyed, and that all the pointers it contains are consistent,
      while that instance is holding its bfqd->lock. A blkg_lookup performed
      with bfqd->lock held then returns a fully consistent blkg, which
      remains consistent until this lock is held. In more detail, this holds
      even if the returned blkg is a copy of the original one.
      
      Finally, also the object describing a group inside BFQ needs to be
      protected from destruction on the blkg_free of the original blkg
      (which invokes bfq_pd_free). This commit adds private refcounting for
      this object, to let it disappear only after no bfq_queue refers to it
      any longer.
      
      This commit also removes or updates some stale comments on locking
      issues related to blk-cgroup operations.
      Reported-by: NTomas Konir <tomas.konir@gmail.com>
      Reported-by: NLee Tibbert <lee.tibbert@gmail.com>
      Reported-by: NMarco Piazza <mpiazza@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Tested-by: NTomas Konir <tomas.konir@gmail.com>
      Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: NMarco Piazza <mpiazza@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8f9bebc3
  8. 07 6月, 2017 2 次提交
    • S
      blk-throttle: set default latency baseline for harddisk · 6679a90c
      Shaohua Li 提交于
      hard disk IO latency varies a lot depending on spindle move. The latency
      range could be from several microseconds to several milliseconds. It's
      pretty hard to get the baseline latency used by io.low.
      
      We will use a different stragety here. The idea is only using IO with
      spindle move to determine if cgroup IO is in good state. For HD, if io
      latency is small (< 1ms), we ignore the IO. Such IO is likely from
      sequential IO, and is helpless to help determine if a cgroup's IO is
      impacted by other cgroups. With this, we only account IO with big
      latency. Then we can choose a hardcoded baseline latency for HD (4ms,
      which is typical IO latency with seek).  With all these settings, the
      io.low latency works for both HD and SSD.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6679a90c
    • J
      blk-throttle: fix NULL pointer dereference in throtl_schedule_pending_timer · a41b816c
      Joseph Qi 提交于
      I have encountered a NULL pointer dereference in
      throtl_schedule_pending_timer:
        [  413.735396] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
        [  413.735535] IP: [<ffffffff812ebbbf>] throtl_schedule_pending_timer+0x3f/0x210
        [  413.735643] PGD 22c8cf067 PUD 22cb34067 PMD 0
        [  413.735713] Oops: 0000 [#1] SMP
        ......
      
      This is caused by the following case:
        blk_throtl_bio
          throtl_schedule_next_dispatch  <= sq is top level one without parent
            throtl_schedule_pending_timer
              sq_to_tg(sq)->td->throtl_slice  <= sq_to_tg(sq) returns NULL
      
      Fix it by using sq_to_td instead of sq_to_tg(sq)->td, which will always
      return a valid td.
      
      Fixes: 297e3d85 ("blk-throttle: make throtl_slice tunable")
      Signed-off-by: NJoseph Qi <qijiang.qj@alibaba-inc.com>
      Reviewed-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a41b816c