1. 10 1月, 2018 13 次提交
    • J
      sched/cpufreq: Change the worker kthread to SCHED_DEADLINE · 794a56eb
      Juri Lelli 提交于
      Worker kthread needs to be able to change frequency for all other
      threads.
      
      Make it special, just under STOP class.
      Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Claudio Scordino <claudio@evidence.eu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: alessio.balsini@arm.com
      Cc: bristot@redhat.com
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: mathieu.poirier@linaro.org
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: rjw@rjwysocki.net
      Cc: rostedt@goodmis.org
      Cc: tkjos@android.com
      Cc: tommaso.cucinotta@santannapisa.it
      Cc: vincent.guittot@linaro.org
      Link: http://lkml.kernel.org/r/20171204102325.5110-4-juri.lelli@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      794a56eb
    • J
      sched/deadline: Move CPU frequency selection triggering points · e0367b12
      Juri Lelli 提交于
      Since SCHED_DEADLINE doesn't track utilization signal (but reserves a
      fraction of CPU bandwidth to tasks admitted to the system), there is no
      point in evaluating frequency changes during each tick event.
      
      Move frequency selection triggering points to where running_bw changes.
      Co-authored-by: NClaudio Scordino <claudio@evidence.eu.com>
      Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: alessio.balsini@arm.com
      Cc: bristot@redhat.com
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: mathieu.poirier@linaro.org
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: rjw@rjwysocki.net
      Cc: rostedt@goodmis.org
      Cc: tkjos@android.com
      Cc: tommaso.cucinotta@santannapisa.it
      Cc: vincent.guittot@linaro.org
      Link: http://lkml.kernel.org/r/20171204102325.5110-3-juri.lelli@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e0367b12
    • J
      sched/cpufreq: Use the DEADLINE utilization signal · d4edd662
      Juri Lelli 提交于
      SCHED_DEADLINE tracks active utilization signal with a per dl_rq
      variable named running_bw.
      
      Make use of that to drive CPU frequency selection: add up FAIR and
      DEADLINE contribution to get the required CPU capacity to handle both
      requirements (while RT still selects max frequency).
      Co-authored-by: NClaudio Scordino <claudio@evidence.eu.com>
      Signed-off-by: NJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: alessio.balsini@arm.com
      Cc: bristot@redhat.com
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: mathieu.poirier@linaro.org
      Cc: morten.rasmussen@arm.com
      Cc: patrick.bellasi@arm.com
      Cc: rjw@rjwysocki.net
      Cc: rostedt@goodmis.org
      Cc: tkjos@android.com
      Cc: tommaso.cucinotta@santannapisa.it
      Cc: vincent.guittot@linaro.org
      Link: http://lkml.kernel.org/r/20171204102325.5110-2-juri.lelli@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4edd662
    • J
      sched/deadline: Implement "runtime overrun signal" support · 34be3930
      Juri Lelli 提交于
      This patch adds the possibility of getting the delivery of a SIGXCPU
      signal whenever there is a runtime overrun. The request is done through
      the sched_flags field within the sched_attr structure.
      
      Forward port of https://lkml.org/lkml/2009/10/16/170Tested-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NClaudio Scordino <claudio@evidence.eu.com>
      Signed-off-by: NLuca Abeni <luca.abeni@santannapisa.it>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Link: http://lkml.kernel.org/r/1513077024-25461-1-git-send-email-claudio@evidence.eu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      34be3930
    • M
      sched/fair: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache · 7332dec0
      Mel Gorman 提交于
      If waking from an idle CPU due to an interrupt then it's possible that
      the waker task will be pulled to wake on the current CPU. Unfortunately,
      depending on the type of interrupt and IRQ configuration, there may not
      be a strong relationship between the CPU an interrupt was delivered on
      and the CPU a task was running on. For example, the interrupts could all
      be delivered to CPUs on one particular node due to the machine topology
      or IRQ affinity configuration. Another example is an interrupt for an IO
      completion which can be delivered to any CPU where there is no guarantee
      the data is either cache hot or even local.
      
      This patch was motivated by the observation that an IO workload was
      being pulled cross-node on a frequent basis when IO completed.  From a
      wakeup latency perspective, it's still useful to know that an idle CPU is
      immediately available for use but lets only consider an automatic migration
      if the CPUs share cache to limit damage due to NUMA migrations. Migrations
      may still occur if wake_affine_weight determines it's appropriate.
      
      These are the throughput results for dbench running on ext4 comparing
      4.15-rc3 and this patch on a 2-socket machine where interrupts due to IO
      completions can happen on any CPU.
      
                                4.15.0-rc3             4.15.0-rc3
                                   vanilla            lessmigrate
      Hmean     1        854.64 (   0.00%)      865.01 (   1.21%)
      Hmean     2       1229.60 (   0.00%)     1274.44 (   3.65%)
      Hmean     4       1591.81 (   0.00%)     1628.08 (   2.28%)
      Hmean     8       1845.04 (   0.00%)     1831.80 (  -0.72%)
      Hmean     16      2038.61 (   0.00%)     2091.44 (   2.59%)
      Hmean     32      2327.19 (   0.00%)     2430.29 (   4.43%)
      Hmean     64      2570.61 (   0.00%)     2568.54 (  -0.08%)
      Hmean     128     2481.89 (   0.00%)     2499.28 (   0.70%)
      Stddev    1         14.31 (   0.00%)        5.35 (  62.65%)
      Stddev    2         21.29 (   0.00%)       11.09 (  47.92%)
      Stddev    4          7.22 (   0.00%)        6.80 (   5.92%)
      Stddev    8         26.70 (   0.00%)        9.41 (  64.76%)
      Stddev    16        22.40 (   0.00%)       20.01 (  10.70%)
      Stddev    32        45.13 (   0.00%)       44.74 (   0.85%)
      Stddev    64        93.10 (   0.00%)       93.18 (  -0.09%)
      Stddev    128      184.28 (   0.00%)      177.85 (   3.49%)
      
      Note the small increase in throughput for low thread counts but also
      note that the standard deviation for each sample during the test run is
      lower. The throughput figures for dbench can be misleading so the benchmark
      is actually modified to time the latency of the processing of one load
      file with many samples taken. The difference in latency is
      
                                 4.15.0-rc3             4.15.0-rc3
                                    vanilla            lessmigrate
      Amean      1         21.71 (   0.00%)       21.47 (   1.08%)
      Amean      2         30.89 (   0.00%)       29.58 (   4.26%)
      Amean      4         47.54 (   0.00%)       46.61 (   1.97%)
      Amean      8         82.71 (   0.00%)       82.81 (  -0.12%)
      Amean      16       149.45 (   0.00%)      145.01 (   2.97%)
      Amean      32       265.49 (   0.00%)      248.43 (   6.42%)
      Amean      64       463.23 (   0.00%)      463.55 (  -0.07%)
      Amean      128      933.97 (   0.00%)      935.50 (  -0.16%)
      Stddev     1          1.58 (   0.00%)        1.54 (   2.26%)
      Stddev     2          2.84 (   0.00%)        2.95 (  -4.15%)
      Stddev     4          6.78 (   0.00%)        6.85 (  -0.99%)
      Stddev     8         16.85 (   0.00%)       16.37 (   2.85%)
      Stddev     16        41.59 (   0.00%)       41.04 (   1.32%)
      Stddev     32       111.05 (   0.00%)      105.11 (   5.35%)
      Stddev     64       285.94 (   0.00%)      288.01 (  -0.72%)
      Stddev     128      803.39 (   0.00%)      809.73 (  -0.79%)
      
      It's a small improvement which is not surprising given that migrations that
      migrate to a different node as not that common. However, it is noticeable
      in the CPU migration statistics which are reduced by 24%.
      
      There was a query for v1 of this patch about NAS so here are the results
      for C-class using MPI for parallelisation on the same machine
      
      nas-mpi
                            4.15.0-rc3             4.15.0-rc3
                               vanilla                  noirq
      Time cg.C       24.25 (   0.00%)       23.17 (   4.45%)
      Time ep.C        8.22 (   0.00%)        8.29 (  -0.85%)
      Time ft.C       22.67 (   0.00%)       20.34 (  10.28%)
      Time is.C        1.42 (   0.00%)        1.47 (  -3.52%)
      Time lu.C       55.62 (   0.00%)       54.81 (   1.46%)
      Time mg.C        7.93 (   0.00%)        7.91 (   0.25%)
      
                4.15.0-rc3  4.15.0-rc3
                   vanilla  noirq-v1r1
      User         3799.96     3748.34
      System        672.10      626.15
      Elapsed        91.91       79.49
      
      lu.C sees a small gain, ft.C a large gain and ep.C and is.C see small
      regressions but in terms of absolute time, the difference is small and
      likely within run-to-run variance. System CPU usage is slightly reduced.
      
      schbench from Facebook was also requested. This is a bit of a mixed bag but
      it's important to note that this workload should not be heavily impacted
      by wakeups from interrupt context.
      
                                       4.15.0-rc3             4.15.0-rc3
                                          vanilla             noirq-v1r1
      Lat 50.00th-qrtle-1        41.00 (   0.00%)       41.00 (   0.00%)
      Lat 75.00th-qrtle-1        42.00 (   0.00%)       42.00 (   0.00%)
      Lat 90.00th-qrtle-1        43.00 (   0.00%)       44.00 (  -2.33%)
      Lat 95.00th-qrtle-1        44.00 (   0.00%)       46.00 (  -4.55%)
      Lat 99.00th-qrtle-1        57.00 (   0.00%)       58.00 (  -1.75%)
      Lat 99.50th-qrtle-1        59.00 (   0.00%)       59.00 (   0.00%)
      Lat 99.90th-qrtle-1        67.00 (   0.00%)       78.00 ( -16.42%)
      Lat 50.00th-qrtle-2        40.00 (   0.00%)       51.00 ( -27.50%)
      Lat 75.00th-qrtle-2        45.00 (   0.00%)       56.00 ( -24.44%)
      Lat 90.00th-qrtle-2        53.00 (   0.00%)       59.00 ( -11.32%)
      Lat 95.00th-qrtle-2        57.00 (   0.00%)       61.00 (  -7.02%)
      Lat 99.00th-qrtle-2        67.00 (   0.00%)       71.00 (  -5.97%)
      Lat 99.50th-qrtle-2        69.00 (   0.00%)       74.00 (  -7.25%)
      Lat 99.90th-qrtle-2        83.00 (   0.00%)       77.00 (   7.23%)
      Lat 50.00th-qrtle-4        51.00 (   0.00%)       51.00 (   0.00%)
      Lat 75.00th-qrtle-4        57.00 (   0.00%)       56.00 (   1.75%)
      Lat 90.00th-qrtle-4        60.00 (   0.00%)       59.00 (   1.67%)
      Lat 95.00th-qrtle-4        62.00 (   0.00%)       62.00 (   0.00%)
      Lat 99.00th-qrtle-4        73.00 (   0.00%)       72.00 (   1.37%)
      Lat 99.50th-qrtle-4        76.00 (   0.00%)       74.00 (   2.63%)
      Lat 99.90th-qrtle-4        85.00 (   0.00%)       78.00 (   8.24%)
      Lat 50.00th-qrtle-8        54.00 (   0.00%)       58.00 (  -7.41%)
      Lat 75.00th-qrtle-8        59.00 (   0.00%)       62.00 (  -5.08%)
      Lat 90.00th-qrtle-8        65.00 (   0.00%)       66.00 (  -1.54%)
      Lat 95.00th-qrtle-8        67.00 (   0.00%)       70.00 (  -4.48%)
      Lat 99.00th-qrtle-8        78.00 (   0.00%)       79.00 (  -1.28%)
      Lat 99.50th-qrtle-8        81.00 (   0.00%)       80.00 (   1.23%)
      Lat 99.90th-qrtle-8       116.00 (   0.00%)       83.00 (  28.45%)
      Lat 50.00th-qrtle-16       65.00 (   0.00%)       64.00 (   1.54%)
      Lat 75.00th-qrtle-16       77.00 (   0.00%)       71.00 (   7.79%)
      Lat 90.00th-qrtle-16       83.00 (   0.00%)       82.00 (   1.20%)
      Lat 95.00th-qrtle-16       87.00 (   0.00%)       87.00 (   0.00%)
      Lat 99.00th-qrtle-16       95.00 (   0.00%)       96.00 (  -1.05%)
      Lat 99.50th-qrtle-16       99.00 (   0.00%)      103.00 (  -4.04%)
      Lat 99.90th-qrtle-16      104.00 (   0.00%)      122.00 ( -17.31%)
      Lat 50.00th-qrtle-32       71.00 (   0.00%)       73.00 (  -2.82%)
      Lat 75.00th-qrtle-32       91.00 (   0.00%)       92.00 (  -1.10%)
      Lat 90.00th-qrtle-32      108.00 (   0.00%)      107.00 (   0.93%)
      Lat 95.00th-qrtle-32      118.00 (   0.00%)      115.00 (   2.54%)
      Lat 99.00th-qrtle-32      134.00 (   0.00%)      129.00 (   3.73%)
      Lat 99.50th-qrtle-32      138.00 (   0.00%)      133.00 (   3.62%)
      Lat 99.90th-qrtle-32      149.00 (   0.00%)      146.00 (   2.01%)
      Lat 50.00th-qrtle-39       83.00 (   0.00%)       81.00 (   2.41%)
      Lat 75.00th-qrtle-39      105.00 (   0.00%)      102.00 (   2.86%)
      Lat 90.00th-qrtle-39      120.00 (   0.00%)      119.00 (   0.83%)
      Lat 95.00th-qrtle-39      129.00 (   0.00%)      128.00 (   0.78%)
      Lat 99.00th-qrtle-39      153.00 (   0.00%)      149.00 (   2.61%)
      Lat 99.50th-qrtle-39      166.00 (   0.00%)      156.00 (   6.02%)
      Lat 99.90th-qrtle-39    12304.00 (   0.00%)    12848.00 (  -4.42%)
      
      When heavily loaded (e.g. 99.50th-qrtle-39 indicates 39 threads), there
      are small gains in many cases. Otherwise it depends on the quartile used
      where it can be bad -- e.g. 75.00th-qrtle-2. However, even these results
      are probably a co-incidence. For this workload, much depends on what node
      the threads get placed on and their relative locality and not wakeups from
      interrupt context. A larger component on how it behaves would be automatic
      NUMA balancing where a fault incurred to measure locality would be a much
      larger contributer to latency than the wakeup path.
      
      This is the results from an almost identical machine that happened to run
      the same test.  They only differ in terms of storage which is irrelevant
      for this test.
      
                                       4.15.0-rc3             4.15.0-rc3
                                          vanilla             noirq-v1r1
      Lat 50.00th-qrtle-1        41.00 (   0.00%)       41.00 (   0.00%)
      Lat 75.00th-qrtle-1        42.00 (   0.00%)       42.00 (   0.00%)
      Lat 90.00th-qrtle-1        44.00 (   0.00%)       43.00 (   2.27%)
      Lat 95.00th-qrtle-1        53.00 (   0.00%)       45.00 (  15.09%)
      Lat 99.00th-qrtle-1        59.00 (   0.00%)       58.00 (   1.69%)
      Lat 99.50th-qrtle-1        60.00 (   0.00%)       59.00 (   1.67%)
      Lat 99.90th-qrtle-1        86.00 (   0.00%)       61.00 (  29.07%)
      Lat 50.00th-qrtle-2        52.00 (   0.00%)       41.00 (  21.15%)
      Lat 75.00th-qrtle-2        57.00 (   0.00%)       46.00 (  19.30%)
      Lat 90.00th-qrtle-2        60.00 (   0.00%)       53.00 (  11.67%)
      Lat 95.00th-qrtle-2        62.00 (   0.00%)       57.00 (   8.06%)
      Lat 99.00th-qrtle-2        73.00 (   0.00%)       68.00 (   6.85%)
      Lat 99.50th-qrtle-2        74.00 (   0.00%)       71.00 (   4.05%)
      Lat 99.90th-qrtle-2        90.00 (   0.00%)       75.00 (  16.67%)
      Lat 50.00th-qrtle-4        57.00 (   0.00%)       52.00 (   8.77%)
      Lat 75.00th-qrtle-4        60.00 (   0.00%)       58.00 (   3.33%)
      Lat 90.00th-qrtle-4        62.00 (   0.00%)       62.00 (   0.00%)
      Lat 95.00th-qrtle-4        65.00 (   0.00%)       65.00 (   0.00%)
      Lat 99.00th-qrtle-4        76.00 (   0.00%)       75.00 (   1.32%)
      Lat 99.50th-qrtle-4        77.00 (   0.00%)       77.00 (   0.00%)
      Lat 99.90th-qrtle-4        87.00 (   0.00%)       81.00 (   6.90%)
      Lat 50.00th-qrtle-8        59.00 (   0.00%)       57.00 (   3.39%)
      Lat 75.00th-qrtle-8        63.00 (   0.00%)       62.00 (   1.59%)
      Lat 90.00th-qrtle-8        66.00 (   0.00%)       67.00 (  -1.52%)
      Lat 95.00th-qrtle-8        68.00 (   0.00%)       70.00 (  -2.94%)
      Lat 99.00th-qrtle-8        79.00 (   0.00%)       80.00 (  -1.27%)
      Lat 99.50th-qrtle-8        80.00 (   0.00%)       84.00 (  -5.00%)
      Lat 99.90th-qrtle-8        84.00 (   0.00%)       90.00 (  -7.14%)
      Lat 50.00th-qrtle-16       65.00 (   0.00%)       65.00 (   0.00%)
      Lat 75.00th-qrtle-16       77.00 (   0.00%)       75.00 (   2.60%)
      Lat 90.00th-qrtle-16       84.00 (   0.00%)       83.00 (   1.19%)
      Lat 95.00th-qrtle-16       88.00 (   0.00%)       87.00 (   1.14%)
      Lat 99.00th-qrtle-16       97.00 (   0.00%)       96.00 (   1.03%)
      Lat 99.50th-qrtle-16      100.00 (   0.00%)      104.00 (  -4.00%)
      Lat 99.90th-qrtle-16      110.00 (   0.00%)      126.00 ( -14.55%)
      Lat 50.00th-qrtle-32       70.00 (   0.00%)       71.00 (  -1.43%)
      Lat 75.00th-qrtle-32       92.00 (   0.00%)       94.00 (  -2.17%)
      Lat 90.00th-qrtle-32      110.00 (   0.00%)      110.00 (   0.00%)
      Lat 95.00th-qrtle-32      121.00 (   0.00%)      118.00 (   2.48%)
      Lat 99.00th-qrtle-32      135.00 (   0.00%)      137.00 (  -1.48%)
      Lat 99.50th-qrtle-32      140.00 (   0.00%)      146.00 (  -4.29%)
      Lat 99.90th-qrtle-32      150.00 (   0.00%)      160.00 (  -6.67%)
      Lat 50.00th-qrtle-39       80.00 (   0.00%)       71.00 (  11.25%)
      Lat 75.00th-qrtle-39      102.00 (   0.00%)       91.00 (  10.78%)
      Lat 90.00th-qrtle-39      118.00 (   0.00%)      108.00 (   8.47%)
      Lat 95.00th-qrtle-39      128.00 (   0.00%)      117.00 (   8.59%)
      Lat 99.00th-qrtle-39      149.00 (   0.00%)      133.00 (  10.74%)
      Lat 99.50th-qrtle-39      160.00 (   0.00%)      139.00 (  13.12%)
      Lat 99.90th-qrtle-39    13808.00 (   0.00%)     4920.00 (  64.37%)
      
      Despite being nearly identical, it showed a variety of major gains so
      I'm not convinced that heavy emphasis should be placed on this particular
      workload in terms of evaluating this particular patch. Further evidence of
      this is the fact that testing on a UMA machine showed small gains/losses
      even though the patch should be a no-op on UMA.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171219085947.13136-2-mgorman@techsingularity.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7332dec0
    • J
      sched/fair: Correct obsolete comment about cpufreq_update_util() · 9783be2c
      Joel Fernandes 提交于
      Since the remote cpufreq callback work, the cpufreq_update_util() call can happen
      from remote CPUs. The comment about local CPUs is thus obsolete. Update it
      accordingly.
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Android Kernel <kernel-team@android.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: EAS Dev <eas-dev@lists.linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rohit Jain <rohit.k.jain@oracle.com>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20171215153944.220146-2-joelaf@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9783be2c
    • J
      sched/fair: Remove impossible condition from find_idlest_group_cpu() · 18cec7e0
      Joel Fernandes 提交于
      find_idlest_group_cpu() goes through CPUs of a group previous selected by
      find_idlest_group(). find_idlest_group() returns NULL if the local group is the
      selected one and doesn't execute find_idlest_group_cpu if the group to which
      'cpu' belongs to is chosen. So we're always guaranteed to call
      find_idlest_group_cpu() with a group to which 'cpu' is non-local.
      
      This makes one of the conditions in find_idlest_group_cpu() an impossible one,
      which we can get rid off.
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NBrendan Jackman <brendan.jackman@arm.com>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Cc: Android Kernel <kernel-team@android.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: EAS Dev <eas-dev@lists.linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rohit Jain <rohit.k.jain@oracle.com>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171215153944.220146-3-joelaf@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18cec7e0
    • V
      sched/cpufreq: Don't pass flags to sugov_set_iowait_boost() · 5083452f
      Viresh Kumar 提交于
      We are already passing sg_cpu as argument to sugov_set_iowait_boost()
      helper and the same can be used to retrieve the flags value. Get rid of
      the redundant argument.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: tkjos@android.com
      Link: http://lkml.kernel.org/r/4ec5562b1a87e146ebab11fb5dde1ca9c763a7fb.1513158452.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5083452f
    • V
      sched/cpufreq: Initialize sg_cpu->flags to 0 · 6257e704
      Viresh Kumar 提交于
      Initializing sg_cpu->flags to SCHED_CPUFREQ_RT has no obvious benefit.
      The flags field wouldn't be used until the utilization update handler is
      called for the first time, and once that is called we will overwrite
      flags anyway.
      
      Initialize it to 0.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NJuri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: morten.rasmussen@arm.com
      Cc: tkjos@android.com
      Link: http://lkml.kernel.org/r/763feda6424ced8486b25a0c52979634e6104478.1513158452.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6257e704
    • J
      sched/fair: Consider RT/IRQ pressure in capacity_spare_wake() · f453ae22
      Joel Fernandes 提交于
      capacity_spare_wake() in the slow path influences choice of idlest groups,
      as we search for groups with maximum spare capacity. In scenarios where
      RT pressure is high, a sub optimal group can be chosen and hurt
      performance of the task being woken up.
      
      Fix this by using capacity_of() instead of capacity_orig_of() in capacity_spare_wake().
      
      Tests results from improvements with this change are below. More tests
      were also done by myself and Matt Fleming to ensure no degradation in
      different benchmarks.
      
      1) Rohit ran barrier.c test (details below) with following improvements:
      ------------------------------------------------------------------------
      This was Rohit's original use case for a patch he posted at [1] however
      from his recent tests he showed my patch can replace his slow path
      changes [1] and there's no need to selectively scan/skip CPUs in
      find_idlest_group_cpu in the slow path to get the improvement he sees.
      
      barrier.c (open_mp code) as a micro-benchmark. It does a number of
      iterations and barrier sync at the end of each for loop.
      
      Here barrier,c is running in along with ping on CPU 0 and 1 as:
      'ping -l 10000 -q -s 10 -f hostX'
      
      barrier.c can be found at:
      http://www.spinics.net/lists/kernel/msg2506955.html
      
      Following are the results for the iterations per second with this
      micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
      Intel x86 machine:
      +--------+------------------+---------------------------+
      |Threads | Without patch    | With patch                |
      |        |                  |                           |
      +--------+--------+---------+-----------------+---------+
      |        | Mean   | Std Dev | Mean            | Std Dev |
      +--------+--------+---------+-----------------+---------+
      |1       | 539.36 | 60.16   | 572.54 (+6.15%) | 40.95   |
      |2       | 481.01 | 19.32   | 530.64 (+10.32%)| 56.16   |
      |4       | 474.78 | 22.28   | 479.46 (+0.99%) | 18.89   |
      |8       | 450.06 | 24.91   | 447.82 (-0.50%) | 12.36   |
      |16      | 436.99 | 22.57   | 441.88 (+1.12%) | 7.39    |
      |32      | 388.28 | 55.59   | 429.4  (+10.59%)| 31.14   |
      |64      | 314.62 | 6.33    | 311.81 (-0.89%) | 11.99   |
      +--------+--------+---------+-----------------+---------+
      
      2) ping+hackbench test on bare-metal sever (by Rohit)
      -----------------------------------------------------
      Here hackbench is running in threaded mode along
      with, running ping on CPU 0 and 1 as:
      'ping -l 10000 -q -s 10 -f hostX'
      
      This test is running on 2 socket, 20 core and 40 threads Intel x86
      machine:
      Number of loops is 10000 and runtime is in seconds (Lower is better).
      
      +--------------+-----------------+--------------------------+
      |Task Groups   | Without patch   |  With patch              |
      |              +-------+---------+----------------+---------+
      |(Groups of 40)| Mean  | Std Dev |  Mean          | Std Dev |
      +--------------+-------+---------+----------------+---------+
      |1             | 0.851 | 0.007   |  0.828 (+2.77%)| 0.032   |
      |2             | 1.083 | 0.203   |  1.087 (-0.37%)| 0.246   |
      |4             | 1.601 | 0.051   |  1.611 (-0.62%)| 0.055   |
      |8             | 2.837 | 0.060   |  2.827 (+0.35%)| 0.031   |
      |16            | 5.139 | 0.133   |  5.107 (+0.63%)| 0.085   |
      |25            | 7.569 | 0.142   |  7.503 (+0.88%)| 0.143   |
      +--------------+-------+---------+----------------+---------+
      
      [1] https://patchwork.kernel.org/patch/9991635/
      
      Matt Fleming also ran several different hackbench tests and cyclic test
      to santiy-check that the patch doesn't harm other usecases.
      Tested-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Tested-by: NRohit Jain <rohit.k.jain@oracle.com>
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Brendan Jackman <brendan.jackman@arm.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171214212158.188190-1-joelaf@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f453ae22
    • P
      sched/fair: Use 'unsigned long' for utilization, consistently · f01415fd
      Patrick Bellasi 提交于
      Utilization and capacity are tracked as 'unsigned long', however some
      functions using them return an 'int' which is ultimately assigned back to
      'unsigned long' variables.
      
      Since there is not scope on using a different and signed type,
      consolidate the signature of functions returning utilization to always
      use the native type.
      
      This change improves code consistency, and it also benefits
      code paths where utilizations should be clamped by avoiding
      further type conversions or ugly type casts.
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NChris Redpath <chris.redpath@arm.com>
      Reviewed-by: NBrendan Jackman <brendan.jackman@arm.com>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@android.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171205171018.9203-2-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f01415fd
    • R
      sched/core: Rework and clarify prepare_lock_switch() · 31cb1bc0
      rodrigosiqueira 提交于
      The prepare_lock_switch() function has an unused parameter, and also the
      function name was not descriptive. To improve readability and remove
      the extra parameter, do the following changes:
      
      * Move prepare_lock_switch() from kernel/sched/sched.h to
        kernel/sched/core.c, rename it to prepare_task(), and remove the
        unused parameter.
      
      * Split the smp_store_release() out from finish_lock_switch() to a
        function named finish_task.
      
      * Comments ajdustments.
      Signed-off-by: NRodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171215140603.gxe5i2y6fg5ojfpp@smtp.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      31cb1bc0
    • M
      membarrier: Disable preemption when calling smp_call_function_many() · 54167607
      Mathieu Desnoyers 提交于
      smp_call_function_many() requires disabling preemption around the call.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: <stable@vger.kernel.org> # v4.14+
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171215192310.25293-1-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      54167607
  2. 05 1月, 2018 2 次提交
  3. 30 12月, 2017 8 次提交
    • T
      timers: Invoke timer_start_debug() where it makes sense · fd45bb77
      Thomas Gleixner 提交于
      The timer start debug function is called before the proper timer base is
      set. As a consequence the trace data contains the stale CPU and flags
      values.
      
      Call the debug function after setting the new base and flags.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: rt@linutronix.de
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de
      fd45bb77
    • T
      nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() · 5d62c183
      Thomas Gleixner 提交于
      The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
      subsequently invokes tick_nohz_stop_sched_tick() are:
      
        if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
      
      If need_resched() is not set, but a timer softirq is pending then this is
      an indication that the softirq code punted and delegated the execution to
      softirqd. need_resched() is not true because the current interrupted task
      takes precedence over softirqd.
      
      Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
      timer interrupts because the timer wheel contains an expired timer, but
      softirqs are not yet executed. So it returns an immediate expiry request,
      which causes the timer to fire immediately again. Lather, rinse and
      repeat....
      
      Prevent that by adding a check for a pending timer soft interrupt to the
      conditions in tick_nohz_stop_sched_tick() which avoid calling
      get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
      prevents a repetitive programming of an already expired timer.
      Reported-by: NSebastian Siewior <bigeasy@linutronix.d>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
      5d62c183
    • T
      timers: Reinitialize per cpu bases on hotplug · 26456f87
      Thomas Gleixner 提交于
      The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
      them with a potentially stale clk and next_expiry valuem, which can cause
      trouble then the CPU is plugged.
      
      Add a prepare callback which forwards the clock, sets next_expiry to far in
      the future and reset the control flags to a known state.
      
      Set base->must_forward_clk so the first timer which is queued will try to
      forward the clock to current jiffies.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
      26456f87
    • A
      timers: Use deferrable base independent of base::nohz_active · ced6d5c1
      Anna-Maria Gleixner 提交于
      During boot and before base::nohz_active is set in the timer bases, deferrable
      timers are enqueued into the standard timer base. This works correctly as
      long as base::nohz_active is false.
      
      Once it base::nohz_active is set and a timer which was enqueued before that
      is accessed the lock selector code choses the lock of the deferred
      base. This causes unlocked access to the standard base and in case the
      timer is removed it does not clear the pending flag in the standard base
      bitmap which causes get_next_timer_interrupt() to return bogus values.
      
      To prevent that, the deferrable timers must be enqueued in the deferrable
      base, even when base::nohz_active is not set. Those deferrable timers also
      need to be expired unconditional.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: rt@linutronix.de
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
      ced6d5c1
    • T
      genirq/msi, x86/vector: Prevent reservation mode for non maskable MSI · bc976233
      Thomas Gleixner 提交于
      The new reservation mode for interrupts assigns a dummy vector when the
      interrupt is allocated and assigns a real vector when the interrupt is
      requested. The reservation mode prevents vector pressure when devices with
      a large amount of queues/interrupts are initialized, but only a minimal
      subset of those queues/interrupts is actually used.
      
      This mode has an issue with MSI interrupts which cannot be masked. If the
      driver is not careful or the hardware emits an interrupt before the device
      irq is requestd by the driver then the interrupt ends up on the dummy
      vector as a spurious interrupt which can cause malfunction of the device or
      in the worst case a lockup of the machine.
      
      Change the logic for the reservation mode so that the early activation of
      MSI interrupts checks whether:
      
       - the device is a PCI/MSI device
       - the reservation mode of the underlying irqdomain is activated
       - PCI/MSI masking is globally enabled
       - the PCI/MSI device uses either MSI-X, which supports masking, or
         MSI with the maskbit supported.
      
      If one of those conditions is false, then clear the reservation mode flag
      in the irq data of the interrupt and invoke irq_domain_activate_irq() with
      the reserve argument cleared. In the x86 vector code, clear the can_reserve
      flag in the vector allocation data so a subsequent free_irq() won't create
      the same situation again. The interrupt stays assigned to a real vector
      until pci_disable_msi() is invoked and all allocations are undone.
      
      Fixes: 4900be83 ("x86/vector/msi: Switch to global reservation mode")
      Reported-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Reported-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Tested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712291406420.1899@nanos
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712291409460.1899@nanos
      bc976233
    • T
      genirq/irqdomain: Rename early argument of irq_domain_activate_irq() · 702cb0a0
      Thomas Gleixner 提交于
      The 'early' argument of irq_domain_activate_irq() is actually used to
      denote reservation mode. To avoid confusion, rename it before abuse
      happens.
      
      No functional change.
      
      Fixes: 72491643 ("genirq/irqdomain: Update irq_domain_ops.activate() signature")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      702cb0a0
    • T
      genirq: Introduce IRQD_CAN_RESERVE flag · 69790ba9
      Thomas Gleixner 提交于
      Add a new flag to mark interrupts which can use reservation mode. This is
      going to be used in subsequent patches to disable reservation mode for a
      certain class of MSI devices.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Tested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      69790ba9
    • T
      genirq/msi: Handle reactivation only on success · da5dd9e8
      Thomas Gleixner 提交于
      When analyzing the fallout of the x86 vector allocation rework it turned
      out that the error handling in msi_domain_alloc_irqs() is broken.
      
      If MSI_FLAG_MUST_REACTIVATE is set for a MSI domain then it clears the
      activation flag for a successfully initialized msi descriptor. If a
      subsequent initialization fails then the error handling code path does not
      deactivate the interrupt because the activation flag got cleared.
      
      Move the clearing of the activation flag outside of the initialization loop
      so that an eventual failure can be cleaned up correctly.
      
      Fixes: 22d0b12f ("genirq/irqdomain: Add force reactivation flag to irq domains")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Tested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      
      da5dd9e8
  4. 28 12月, 2017 9 次提交
  5. 24 12月, 2017 1 次提交
    • E
      pid: Handle failure to allocate the first pid in a pid namespace · c0ee5549
      Eric W. Biederman 提交于
      With the replacement of the pid bitmap and hashtable with an idr in
      alloc_pid started occassionally failing when allocating the first pid
      in a pid namespace.  Things were not completely reset resulting in
      the first allocated pid getting the number 2 (not 1).  Which
      further resulted in ns->proc_mnt not getting set and eventually
      causing an oops in proc_flush_task.
      
      Oops: 0000 [#1] SMP
      CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2
      RIP: 0010:proc_flush_task+0x8e/0x1b0
      RSP: 0018:ffffc9000bbffc40 EFLAGS: 00010286
      RAX: 0000000000000001 RBX: 0000000000000001 RCX: 00000000fffffffb
      RDX: 0000000000000000 RSI: ffffc9000bbffc50 RDI: 0000000000000000
      RBP: ffffc9000bbffc63 R08: 0000000000000000 R09: 0000000000000002
      R10: ffffc9000bbffb70 R11: ffffc9000bbffc64 R12: 0000000000000003
      R13: 0000000000000000 R14: 0000000000000003 R15: ffff8804c10d7840
      FS:  00007f7cb8965700(0000) GS:ffff88050a200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000003e21ae003 CR4: 00000000001606e0
      DR0: 00007fb1d6c22000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       ? release_task+0xaf/0x680
       release_task+0xd2/0x680
       ? wait_consider_task+0xb82/0xce0
       wait_consider_task+0xbe9/0xce0
       ? do_wait+0xe1/0x330
       do_wait+0x151/0x330
       kernel_wait4+0x8d/0x150
       ? task_stopped_code+0x50/0x50
       SYSC_wait4+0x95/0xa0
       ? rcu_read_lock_sched_held+0x6c/0x80
       ? syscall_trace_enter+0x2d7/0x340
       ? do_syscall_64+0x60/0x210
       do_syscall_64+0x60/0x210
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP: 0033:0x7f7cb82603aa
      RSP: 002b:00007ffd60770bc8 EFLAGS: 00000246
       ORIG_RAX: 000000000000003d
      RAX: ffffffffffffffda RBX: 00007f7cb6cd4000 RCX: 00007f7cb82603aa
      RDX: 000000000000000b RSI: 00007ffd60770bd0 RDI: 0000000000007cca
      RBP: 0000000000007cca R08: 00007f7cb8965700 R09: 00007ffd607c7080
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffd60770bd0 R14: 00007f7cb6cd4058 R15: 00000000cccccccd
      Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24
      RIP: proc_flush_task+0x8e/0x1b0 RSP: ffffc9000bbffc40
      CR2: 0000000000000000
      ---[ end trace 53d67a6481059862 ]---
      
      Improve the quality of the implementation by resetting the place to
      start allocating pids on failure to allocate the first pid.
      
      As improving the quality of the implementation is the goal remove the now
      unnecesarry disable_pid_allocations call when we fail to mount proc.
      
      Fixes: 95846ecf ("pid: replace pid bitmap implementation with IDR API")
      Fixes: 8ef047aa ("pid namespaces: make alloc_pid(), free_pid() and put_pid() work with struct upid")
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c0ee5549
  6. 23 12月, 2017 1 次提交
    • T
      arch, mm: Allow arch_dup_mmap() to fail · c10e83f5
      Thomas Gleixner 提交于
      In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
      allowed to fail. Fix up all instances.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: dan.j.williams@intel.com
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c10e83f5
  7. 21 12月, 2017 6 次提交