1. 09 4月, 2018 2 次提交
    • R
      cpuidle: menu: Refine idle state selection for running tick · 296bb1e5
      Rafael J. Wysocki 提交于
      If the tick isn't stopped, the target residency of the state selected
      by the menu governor may be greater than the actual time to the next
      tick and that means lost energy.
      
      To avoid that, make tick_nohz_get_sleep_length() return the current
      time to the next event (before stopping the tick) in addition to the
      estimated one via an extra pointer argument and make menu_select()
      use that value to refine the state selection when necessary.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      296bb1e5
    • R
      sched: idle: Select idle state before stopping the tick · 554c8aa8
      Rafael J. Wysocki 提交于
      In order to address the issue with short idle duration predictions
      by the idle governor after the scheduler tick has been stopped,
      reorder the code in cpuidle_idle_call() so that the governor idle
      state selection runs before tick_nohz_idle_go_idle() and use the
      "nohz" hint returned by cpuidle_select() to decide whether or not
      to stop the tick.
      
      This isn't straightforward, because menu_select() invokes
      tick_nohz_get_sleep_length() to get the time to the next timer
      event and the number returned by the latter comes from
      __tick_nohz_idle_stop_tick().  Fortunately, however, it is possible
      to compute that number without actually stopping the tick and with
      the help of the existing code.
      
      Namely, tick_nohz_get_sleep_length() can be made call
      tick_nohz_next_event(), introduced earlier, to get the time to the
      next non-highres timer event.  If that happens, tick_nohz_next_event()
      need not be called by __tick_nohz_idle_stop_tick() again.
      
      If it turns out that the scheduler tick cannot be stopped going
      forward or the next timer event is too close for the tick to be
      stopped, tick_nohz_get_sleep_length() can simply return the time to
      the next event currently programmed into the corresponding clock
      event device.
      
      In addition to knowing the return value of tick_nohz_next_event(),
      however, tick_nohz_get_sleep_length() needs to know the time to the
      next highres timer event, but with the scheduler tick timer excluded,
      which can be computed with the help of hrtimer_get_next_event().
      
      That minimum of that number and the tick_nohz_next_event() return
      value is the total time to the next timer event with the assumption
      that the tick will be stopped.  It can be returned to the idle
      governor which can use it for predicting idle duration (under the
      assumption that the tick will be stopped) and deciding whether or
      not it makes sense to stop the tick before putting the CPU into the
      selected idle state.
      
      With the above, the sleep_length field in struct tick_sched is not
      necessary any more, so drop it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227Reported-by: NDoug Smythies <dsmythies@telus.net>
      Reported-by: NThomas Ilsche <thomas.ilsche@tu-dresden.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      554c8aa8
  2. 06 4月, 2018 3 次提交
    • R
      cpuidle: Return nohz hint from cpuidle_select() · 45f1ff59
      Rafael J. Wysocki 提交于
      Add a new pointer argument to cpuidle_select() and to the ->select
      cpuidle governor callback to allow a boolean value indicating
      whether or not the tick should be stopped before entering the
      selected state to be returned from there.
      
      Make the ladder governor ignore that pointer (to preserve its
      current behavior) and make the menu governor return 'false" through
      it if:
       (1) the idle exit latency is constrained at 0, or
       (2) the selected state is a polling one, or
       (3) the expected idle period duration is within the tick period
           range.
      
      In addition to that, the correction factor computations in the menu
      governor need to take the possibility that the tick may not be
      stopped into account to avoid artificially small correction factor
      values.  To that end, add a mechanism to record tick wakeups, as
      suggested by Peter Zijlstra, and use it to modify the menu_update()
      behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
      the tick and the return value of tick_nohz_get_sleep_length() is not
      within the tick boundary, the predicted idle duration is likely too
      short, so make menu_update() try to compensate for that by updating
      the governor statistics as though the CPU was idle for a long time.
      
      Since the value returned through the new argument pointer of
      cpuidle_select() is not used by its caller yet, this change by
      itself is not expected to alter the functionality of the code.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      45f1ff59
    • R
      sched: idle: Do not stop the tick upfront in the idle loop · 2aaf709a
      Rafael J. Wysocki 提交于
      Push the decision whether or not to stop the tick somewhat deeper
      into the idle loop.
      
      Stopping the tick upfront leads to unpleasant outcomes in case the
      idle governor doesn't agree with the nohz code on the duration of the
      upcoming idle period.  Specifically, if the tick has been stopped and
      the idle governor predicts short idle, the situation is bad regardless
      of whether or not the prediction is accurate.  If it is accurate, the
      tick has been stopped unnecessarily which means excessive overhead.
      If it is not accurate, the CPU is likely to spend too much time in
      the (shallow, because short idle has been predicted) idle state
      selected by the governor [1].
      
      As the first step towards addressing this problem, change the code
      to make the tick stopping decision inside of the loop in do_idle().
      In particular, do not stop the tick in the cpu_idle_poll() code path.
      Also don't do that in tick_nohz_irq_exit() which doesn't really have
      enough information on whether or not to stop the tick.
      
      Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1]
      Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdfSuggested-by: NFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      2aaf709a
    • R
      time: tick-sched: Reorganize idle tick management code · 0e776768
      Rafael J. Wysocki 提交于
      Prepare the scheduler tick code for reworking the idle loop to
      avoid stopping the tick in some cases.
      
      The idea is to split the nohz idle entry call to decouple the idle
      time stats accounting and preparatory work from the actual tick stop
      code, in order to later be able to delay the tick stop once we reach
      more power-knowledgeable callers.
      
      Move away the tick_nohz_start_idle() invocation from
      __tick_nohz_idle_enter(), rename the latter to
      __tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick()
      as a wrapper around it for calling it from the outside.
      
      Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead
      of calling the entire __tick_nohz_idle_enter(), add another wrapper
      disabling and enabling interrupts around tick_nohz_idle_stop_tick()
      and make the current callers of tick_nohz_idle_enter() call it too
      to retain their current functionality.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      0e776768
  3. 21 2月, 2018 2 次提交
  4. 28 12月, 2017 1 次提交
  5. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  6. 27 10月, 2017 2 次提交
  7. 23 3月, 2017 1 次提交
    • R
      cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely · b7eaf1aa
      Rafael J. Wysocki 提交于
      The way the schedutil governor uses the PELT metric causes it to
      underestimate the CPU utilization in some cases.
      
      That can be easily demonstrated by running kernel compilation on
      a Sandy Bridge Intel processor, running turbostat in parallel with
      it and looking at the values written to the MSR_IA32_PERF_CTL
      register.  Namely, the expected result would be that when all CPUs
      were 100% busy, all of them would be requested to run in the maximum
      P-state, but observation shows that this clearly isn't the case.
      The CPUs run in the maximum P-state for a while and then are
      requested to run slower and go back to the maximum P-state after
      a while again.  That causes the actual frequency of the processor to
      visibly oscillate below the sustainable maximum in a jittery fashion
      which clearly is not desirable.
      
      That has been attributed to CPU utilization metric updates on task
      migration that cause the total utilization value for the CPU to be
      reduced by the utilization of the migrated task.  If that happens,
      the schedutil governor may see a CPU utilization reduction and will
      attempt to reduce the CPU frequency accordingly right away.  That
      may be premature, though, for example if the system is generally
      busy and there are other runnable tasks waiting to be run on that
      CPU already.
      
      This is unlikely to be an issue on systems where cpufreq policies are
      shared between multiple CPUs, because in those cases the policy
      utilization is computed as the maximum of the CPU utilization values
      over the whole policy and if that turns out to be low, reducing the
      frequency for the policy most likely is a good idea anyway.  On
      systems with one CPU per policy, however, it may affect performance
      adversely and even lead to increased energy consumption in some cases.
      
      On those systems it may be addressed by taking another utilization
      metric into consideration, like whether or not the CPU whose
      frequency is about to be reduced has been idle recently, because if
      that's not the case, the CPU is likely to be busy in the near future
      and its frequency should not be reduced.
      
      To that end, use the counter of idle calls in the timekeeping code.
      Namely, make the schedutil governor look at that counter for the
      current CPU every time before its frequency is about to be reduced.
      If the counter has not changed since the previous iteration of the
      governor computations for that CPU, the CPU has been busy for all
      that time and its frequency should not be decreased, so if the new
      frequency would be lower than the one set previously, the governor
      will skip the frequency update.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NJoel Fernandes <joelaf@google.com>
      b7eaf1aa
  8. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  9. 18 3月, 2016 1 次提交
  10. 02 3月, 2016 4 次提交
    • F
      posix-cpu-timers: Migrate to use new tick dependency mask model · b7878300
      Frederic Weisbecker 提交于
      Instead of providing asynchronous checks for the nohz subsystem to verify
      posix cpu timers tick dependency, migrate the latter to the new mask.
      
      In order to keep track of the running timers and expose the tick
      dependency accordingly, we must probe the timers queuing and dequeuing
      on threads and process lists.
      
      Unfortunately it implies both task and signal level dependencies. We
      should be able to further optimize this and merge all that on the task
      level dependency, at the cost of a bit of complexity and may be overhead.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      b7878300
    • F
      perf: Migrate perf to use new tick dependency mask model · 555e0c1e
      Frederic Weisbecker 提交于
      Instead of providing asynchronous checks for the nohz subsystem to verify
      perf event tick dependency, migrate perf to the new mask.
      
      Perf needs the tick for two situations:
      
      1) Freq events. We could set the tick dependency when those are
      installed on a CPU context. But setting a global dependency on top of
      the global freq events accounting is much easier. If people want that
      to be optimized, we can still refine that on the per-CPU tick dependency
      level. This patch dooesn't change the current behaviour anyway.
      
      2) Throttled events: this is a per-cpu dependency.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      555e0c1e
    • F
      nohz: Use enum code for tick stop failure tracing message · e6e6cc22
      Frederic Weisbecker 提交于
      It makes nohz tracing more lightweight, standard and easier to parse.
      
      Examples:
      
             user_loop-2904  [007] d..1   517.701126: tick_stop: success=1 dependency=NONE
             user_loop-2904  [007] dn.1   518.021181: tick_stop: success=0 dependency=SCHED
          posix_timers-6142  [007] d..1  1739.027400: tick_stop: success=0 dependency=POSIX_TIMER
             user_loop-5463  [007] dN.1  1185.931939: tick_stop: success=0 dependency=PERF_EVENTS
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      e6e6cc22
    • F
      nohz: New tick dependency mask · d027d45d
      Frederic Weisbecker 提交于
      The tick dependency is evaluated on every IRQ and context switch. This
      consists is a batch of checks which determine whether it is safe to
      stop the tick or not. These checks are often split in many details:
      posix cpu timers, scheduler, sched clock, perf events.... each of which
      are made of smaller details: posix cpu timer involves checking process
      wide timers then thread wide timers. Perf involves checking freq events
      then more per cpu details.
      
      Checking these informations asynchronously every time we update the full
      dynticks state bring avoidable overhead and a messy layout.
      
      Let's introduce instead tick dependency masks: one for system wide
      dependency (unstable sched clock, freq based perf events), one for CPU
      wide dependency (sched, throttling perf events), and task/signal level
      dependencies (posix cpu timers). The subsystems are responsible
      for setting and clearing their dependency through a set of APIs that will
      take care of concurrent dependency mask modifications and kick targets
      to restart the relevant CPU tick whenever needed.
      
      This new dependency engine stays beside the old one until all subsystems
      having a tick dependency are converted to it.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      d027d45d
  11. 16 1月, 2016 1 次提交
  12. 02 9月, 2015 1 次提交
  13. 29 7月, 2015 3 次提交
    • F
      nohz: Remove useless argument on tick_nohz_task_switch() · de734f89
      Frederic Weisbecker 提交于
      Leftover from early code.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      de734f89
    • F
      nohz: Restart nohz full tick from irq exit · 73738a95
      Frederic Weisbecker 提交于
      Restart the tick when necessary from the irq exit path. It makes nohz
      full more flexible, simplify the related IPIs and doesn't bring
      significant overhead on irq exit.
      
      In a longer term view, it will allow us to piggyback the nohz kick
      on the scheduler IPI in the future instead of sending a dedicated IPI
      that often doubles the scheduler IPI on task wakeup. This will require
      more changes though including careful review of resched_curr() callers
      to include nohz full needs.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      73738a95
    • C
      nohz: Prevent tilegx network driver interrupts · 03f6199a
      Chris Metcalf 提交于
      Normally the tilegx networking shim sends irqs to all the cores
      to distribute the load of processing incoming-packet interrupts,
      so that you can get to multiple Gb's of traffic inbound.
      
      However, in nohz_full mode we don't want to interrupt the
      nohz_full cores by default, so we limit the set of cores we use
      to only the online housekeeping cores.
      
      To make client code easier to read, we introduce a new nohz_full
      accessor, housekeeping_cpumask(), which returns a pointer to the
      housekeeping_mask if nohz_full is enabled, and otherwise returns
      the cpu_possible_mask.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      03f6199a
  14. 08 7月, 2015 2 次提交
    • T
      tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build · 37b64a42
      Thomas Gleixner 提交于
      Making tick_broadcast_oneshot_control() independent from
      CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
      CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
      there.
      
      Provide a proper stub inline.
      
      Fixes: f32dd117 'tick/broadcast: Make idle check independent from mode and config'
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      37b64a42
    • T
      tick/broadcast: Make idle check independent from mode and config · f32dd117
      Thomas Gleixner 提交于
      Currently the broadcast busy check, which prevents the idle code from
      going into deep idle, works only in one shot mode.
      
      If NOHZ and HIGHRES are off (config or command line) there is no
      sanity check at all, so under certain conditions cpus are allowed to
      go into deep idle, where the local timer stops, and are not woken up
      again because there is no broadcast timer installed or a hrtimer based
      broadcast device is not evaluated.
      
      Move tick_broadcast_oneshot_control() into the common code and provide
      proper subfunctions for the various config combinations.
      
      The common check in tick_broadcast_oneshot_control() is for the C3STOP
      misfeature flag of the local clock event device. If its not set, idle
      can proceed. If set, further checks are necessary.
      
      Provide checks for the trivial cases:
      
       - If broadcast is disabled in the config, then return busy
      
       - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
         busy if the broadcast device is hrtimer based.
      
       - If oneshot mode is enabled in the config call the original
         tick_broadcast_oneshot_control() function. That function needs
         extra checks which will be implemented in seperate patches.
      
      [ Split out from a larger combo patch ]
      Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
      Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
      Cc: Catalin Marinas <Catalin.Marinas@arm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
      f32dd117
  15. 19 5月, 2015 1 次提交
  16. 07 5月, 2015 1 次提交
  17. 03 4月, 2015 4 次提交
  18. 02 4月, 2015 1 次提交
  19. 01 4月, 2015 4 次提交
  20. 16 2月, 2015 1 次提交
    • R
      PM / sleep: Make it possible to quiesce timers during suspend-to-idle · 124cf911
      Rafael J. Wysocki 提交于
      The efficiency of suspend-to-idle depends on being able to keep CPUs
      in the deepest available idle states for as much time as possible.
      Ideally, they should only be brought out of idle by system wakeup
      interrupts.
      
      However, timer interrupts occurring periodically prevent that from
      happening and it is not practical to chase all of the "misbehaving"
      timers in a whack-a-mole fashion.  A much more effective approach is
      to suspend the local ticks for all CPUs and the entire timekeeping
      along the lines of what is done during full suspend, which also
      helps to keep suspend-to-idle and full suspend reasonably similar.
      
      The idea is to suspend the local tick on each CPU executing
      cpuidle_enter_freeze() and to make the last of them suspend the
      entire timekeeping.  That should prevent timer interrupts from
      triggering until an IO interrupt wakes up one of the CPUs.  It
      needs to be done with interrupts disabled on all of the CPUs,
      though, because otherwise the suspended clocksource might be
      accessed by an interrupt handler which might lead to fatal
      consequences.
      
      Unfortunately, the existing ->enter callbacks provided by cpuidle
      drivers generally cannot be used for implementing that, because some
      of them re-enable interrupts temporarily and some idle entry methods
      cause interrupts to be re-enabled automatically on exit.  Also some
      of these callbacks manipulate local clock event devices of the CPUs
      which really shouldn't be done after suspending their ticks.
      
      To overcome that difficulty, introduce a new cpuidle state callback,
      ->enter_freeze, that will be guaranteed (1) to keep interrupts
      disabled all the time (and return with interrupts disabled) and (2)
      not to touch the CPU timer devices.  Modify cpuidle_enter_freeze() to
      look for the deepest available idle state with ->enter_freeze present
      and to make the CPU execute that callback with suspended tick (and the
      last of the online CPUs to execute it with suspended timekeeping).
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      124cf911
  21. 09 10月, 2014 1 次提交
  22. 14 9月, 2014 1 次提交
    • F
      nohz: Move nohz full init call to tick init · a80e49e2
      Frederic Weisbecker 提交于
      This way we unbloat a bit main.c and more importantly we initialize
      nohz full after init_IRQ(). This dependency will be needed in further
      patches because nohz full needs irq work to raise its own IRQ.
      Information about the support for this ability on ARM64 is obtained on
      init_IRQ() which initialize the pointer to __smp_call_function.
      
      Since tick_init() is called right after init_IRQ(), this is a good place
      to call tick_nohz_init() and prepare for that dependency.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      a80e49e2
  23. 05 9月, 2014 1 次提交
    • F
      nohz: Restore NMI safe local irq work for local nohz kick · 40bea039
      Frederic Weisbecker 提交于
      The local nohz kick is currently used by perf which needs it to be
      NMI-safe. Recent commit though (7d1311b9)
      changed its implementation to fire the local kick using the remote kick
      API. It was convenient to make the code more generic but the remote kick
      isn't NMI-safe.
      
      As a result:
      
      	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
      	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
      	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
      	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
      	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
      	Call Trace:
      	<NMI>  [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
      	[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
      	[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
      	[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
      	[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
      	[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
      	[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
      	[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
      	[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
      	[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
      	[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
      	[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
      	[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
      	[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
      	[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
      	[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
      	[<ffffffff9a008268>] do_nmi+0xb8/0x100
      
      Lets fix this by restoring the use of local irq work for the nohz local
      kick.
      Reported-by: NCatalin Iacob <iacobcatalin@gmail.com>
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      40bea039