1. 03 11月, 2016 6 次提交
  2. 01 11月, 2016 3 次提交
  3. 28 10月, 2016 6 次提交
  4. 25 10月, 2016 1 次提交
  5. 19 10月, 2016 6 次提交
  6. 12 10月, 2016 2 次提交
  7. 11 10月, 2016 1 次提交
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
  8. 30 9月, 2016 1 次提交
  9. 24 9月, 2016 2 次提交
    • C
      blk-mq: skip unmapped queues in blk_mq_alloc_request_hctx · c8712c6a
      Christoph Hellwig 提交于
      This provides the caller a feedback that a given hctx is not mapped and thus
      no command can be sent on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c8712c6a
    • G
      cfq: fix starvation of asynchronous writes · 3932a86b
      Glauber Costa 提交于
      While debugging timeouts happening in my application workload (ScyllaDB), I have
      observed calls to open() taking a long time, ranging everywhere from 2 seconds -
      the first ones that are enough to time out my application - to more than 30
      seconds.
      
      The problem seems to happen because XFS may block on pending metadata updates
      under certain circumnstances, and that's confirmed with the following backtrace
      taken by the offcputime tool (iovisor/bcc):
      
          ffffffffb90c57b1 finish_task_switch
          ffffffffb97dffb5 schedule
          ffffffffb97e310c schedule_timeout
          ffffffffb97e1f12 __down
          ffffffffb90ea821 down
          ffffffffc046a9dc xfs_buf_lock
          ffffffffc046abfb _xfs_buf_find
          ffffffffc046ae4a xfs_buf_get_map
          ffffffffc046babd xfs_buf_read_map
          ffffffffc0499931 xfs_trans_read_buf_map
          ffffffffc044a561 xfs_da_read_buf
          ffffffffc0451390 xfs_dir3_leaf_read.constprop.16
          ffffffffc0452b90 xfs_dir2_leaf_lookup_int
          ffffffffc0452e0f xfs_dir2_leaf_lookup
          ffffffffc044d9d3 xfs_dir_lookup
          ffffffffc047d1d9 xfs_lookup
          ffffffffc0479e53 xfs_vn_lookup
          ffffffffb925347a path_openat
          ffffffffb9254a71 do_filp_open
          ffffffffb9242a94 do_sys_open
          ffffffffb9242b9e sys_open
          ffffffffb97e42b2 entry_SYSCALL_64_fastpath
          00007fb0698162ed [unknown]
      
      Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very
      high "Dispatch wait" times, on the dozens of seconds range and consistent with
      the open() times I have saw in that run.
      
      Still from the blktrace output, we can after searching a bit, identify the
      request that wasn't dispatched:
      
        8,0   11      152    81.092472813   804  A  WM 141698288 + 8 <- (8,1) 141696240
        8,0   11      153    81.092472889   804  Q  WM 141698288 + 8 [xfsaild/sda1]
        8,0   11      154    81.092473207   804  G  WM 141698288 + 8 [xfsaild/sda1]
        8,0   11      206    81.092496118   804  I  WM 141698288 + 8 (   22911) [xfsaild/sda1]
        <==== 'I' means Inserted (into the IO scheduler) ===================================>
        8,0    0   289372    96.718761435     0  D  WM 141698288 + 8 (15626265317) [swapper/0]
        <==== Only 15s later the CFQ scheduler dispatches the request ======================>
      
      As we can see above, in this particular example CFQ took 15 seconds to dispatch
      this request. Going back to the full trace, we can see that the xfsaild queue
      had plenty of opportunity to run, and it was selected as the active queue many
      times. It would just always be preempted by something else (example):
      
        8,0    1        0    81.117912979     0  m   N cfq1618SN / insert_request
        8,0    1        0    81.117913419     0  m   N cfq1618SN / add_to_rr
        8,0    1        0    81.117914044     0  m   N cfq1618SN / preempt
        8,0    1        0    81.117914398     0  m   N cfq767A  / slice expired t=1
        8,0    1        0    81.117914755     0  m   N cfq767A  / resid=40
        8,0    1        0    81.117915340     0  m   N / served: vt=1948520448 min_vt=1948520448
        8,0    1        0    81.117915858     0  m   N cfq767A  / sl_used=1 disp=0 charge=0 iops=1 sect=0
      
      where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB
      IO dispatchers.
      
      The requests preempting the xfsaild queue are synchronous requests. That's a
      characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests.
      While it can be argued that preempting ASYNC requests in favor of SYNC is part
      of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's
      goal.
      
      Moreover, unless I am misunderstanding something, that breaks the expectation
      set by the "fifo_expire_async" tunable, which in my system is set to the
      default.
      
      Looking at the code, it seems to me that the issue is that after we make
      an async queue active, there is no guarantee that it will execute any request.
      
      When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC
      requests in flight. An incoming request from another queue can also preempt it
      in such situation before we have the chance to execute anything (as seen in the
      trace above).
      
      This patch sets the must_dispatch flag if we notice that we have requests
      that are already fifo_expired. This flag is always cleared after
      cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin
      the queue for subsequent requests (unless they are themselves expired)
      
      Care is taken during preempt to still allow rt requests to preempt us
      regardless.
      
      Testing my workload with this patch applied produces much better results.
      From the application side I see no timeouts, and the open() latency histogram
      generated by systemtap looks much better, with the worst outlier at 131ms:
      
      Latency histogram of xfs_buf_lock acquisition (microseconds):
       value |-------------------------------------------------- count
           0 |                                                     11
           1 |@@@@                                                161
           2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1966
           4 |@                                                    54
           8 |                                                     36
          16 |                                                      7
          32 |                                                      0
          64 |                                                      0
             ~
        1024 |                                                      0
        2048 |                                                      0
        4096 |                                                      1
        8192 |                                                      1
       16384 |                                                      2
       32768 |                                                      0
       65536 |                                                      0
      131072 |                                                      1
      262144 |                                                      0
      524288 |                                                      0
      Signed-off-by: NGlauber Costa <glauber@scylladb.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: linux-block@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: NGlauber Costa <glauber@scylladb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      3932a86b
  10. 23 9月, 2016 3 次提交
  11. 22 9月, 2016 4 次提交
  12. 21 9月, 2016 1 次提交
  13. 20 9月, 2016 2 次提交
    • V
      blk-throttle: Extend slice if throttle group is not empty · 164c80ed
      Vivek Goyal 提交于
      Right now, if slice is expired, we start a new slice. If a bio is
      queued, we keep on extending slice by throtle_slice interval (100ms).
      
      This worked well as long as pending timer function got executed with-in
      few milli seconds of scheduled time. But looks like with recent changes
      in timer subsystem, slack can be much longer depending on the expiry time
      of the scheduled timer.
      
      commit 500462a9 ("timers: Switch to a non-cascading wheel")
      
      This means, by the time timer function gets executed, it is possible the
      delay from scheduled time is more than 100ms. That means current code
      will conclude that existing slice has expired and a new one needs to
      be started. New slice will be 100ms by default and that will not be
      sufficient to meet rate requirement of group given the bio size and
      bio will not be dispatched and we will start a new timer function to
      wait. And when that timer expires, same process will repeat and we
      will wait again and this can easily be an infinite loop.
      
      Solve this issue by starting a new slice only if throttle gropup is
      empty. If it is not empty, that means there should be an active slice
      going on. Ideally it should not be expired but given the slack, it is
      possible that it has expired.
      Reported-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      164c80ed
    • S
      block/softirq: Convert to hotplug state machine · 9a659f43
      Sebastian Andrzej Siewior 提交于
      Install the callbacks via the state machine.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160906170457.32393-9-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9a659f43
  14. 19 9月, 2016 1 次提交
  15. 17 9月, 2016 1 次提交
    • O
      sbitmap: randomize initial alloc_hint values · 98d95416
      Omar Sandoval 提交于
      In order to get good cache behavior from a sbitmap, we want each CPU to
      stick to its own cacheline(s) as much as possible. This might happen
      naturally as the bitmap gets filled up and the alloc_hint values spread
      out, but we really want this behavior from the start. blk-mq apparently
      intended to do this, but the code to do this was never wired up. Get rid
      of the dead code and make it part of the sbitmap library.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      98d95416