1. 23 5月, 2018 1 次提交
  2. 22 5月, 2018 2 次提交
    • P
      cpufreq: schedutil: Cleanup and document iowait boost · fd7d5287
      Patrick Bellasi 提交于
      The iowait boosting code has been recently updated to add a progressive
      boosting behavior which allows to be less aggressive in boosting tasks
      doing only sporadic IO operations, thus being more energy efficient for
      example on mobile platforms.
      
      The current code is now however a bit convoluted. Some functionalities
      (e.g. iowait boost reset) are replicated in different paths and their
      documentation is slightly misaligned.
      
      Let's cleanup the code by consolidating all the IO wait boosting related
      functionality within within few dedicated functions and better define
      their role:
      
      - sugov_iowait_boost: set/increase the IO wait boost of a CPU
      - sugov_iowait_apply: apply/reduce the IO wait boost of a CPU
      
      Both these two function are used at every sugov update and they make
      use of a unified IO wait boost reset policy provided by:
      
      - sugov_iowait_reset: reset/disable the IO wait boost of a CPU
           if a CPU is not updated for more then one tick
      
      This makes possible a cleaner and more self-contained design for the IO
      wait boosting code since the rest of the sugov update routines, both for
      single and shared frequency domains, follow the same template:
      
         /* Configure IO boost, if required */
         sugov_iowait_boost()
      
         /* Return here if freq change is in progress or throttled */
      
         /* Collect and aggregate utilization information */
         sugov_get_util()
         sugov_aggregate_util()
      
         /*
          * Add IO boost, if currently enabled, on top of the aggregated
          * utilization value
          */
         sugov_iowait_apply()
      
      As a extra bonus, let's also add the documentation for the new
      functions and better align the in-code documentation.
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fd7d5287
    • P
      cpufreq: schedutil: Fix iowait boost reset · 295f1a99
      Patrick Bellasi 提交于
      A more energy efficient update of the IO wait boosting mechanism has
      been introduced in:
      
         commit a5a0809b ("cpufreq: schedutil: Make iowait boost more energy efficient")
      
      where the boost value is expected to be:
      
       - doubled at each successive wakeup from IO
         staring from the minimum frequency supported by a CPU
      
       - reset when a CPU is not updated for more then one tick
         by either disabling the IO wait boost or resetting its value to the
         minimum frequency if this new update requires an IO boost.
      
      This approach is supposed to "ignore" boosting for sporadic wakeups from
      IO, while still getting the frequency boosted to the maximum to benefit
      long sequence of wakeup from IO operations.
      
      However, these assumptions are not always satisfied.
      For example, when an IO boosted CPU enters idle for more the one tick
      and then wakes up after an IO wait, since in sugov_set_iowait_boost() we
      first check the IOWAIT flag, we keep doubling the iowait boost instead
      of restarting from the minimum frequency value.
      
      This misbehavior could happen mainly on non-shared frequency domains,
      thus defeating the energy efficiency optimization, but it can also
      happen on shared frequency domain systems.
      
      Let fix this issue in sugov_set_iowait_boost() by:
       - first check the IO wait boost reset conditions
         to eventually reset the boost value
       - then applying the correct IO boost value
         if required by the caller
      
      Fixes: a5a0809b (cpufreq: schedutil: Make iowait boost more energy efficient)
      Reported-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      295f1a99
  3. 15 5月, 2018 2 次提交
    • V
      cpufreq: schedutil: Don't set next_freq to UINT_MAX · ecd28842
      Viresh Kumar 提交于
      The schedutil driver sets sg_policy->next_freq to UINT_MAX on certain
      occasions to discard the cached value of next freq:
      - In sugov_start(), when the schedutil governor is started for a group
        of CPUs.
      - And whenever we need to force a freq update before rate-limit
        duration, which happens when:
        - there is an update in cpufreq policy limits.
        - Or when the utilization of DL scheduling class increases.
      
      In return, get_next_freq() doesn't return a cached next_freq value but
      recalculates the next frequency instead.
      
      But having special meaning for a particular value of frequency makes the
      code less readable and error prone. We recently fixed a bug where the
      UINT_MAX value was considered as valid frequency in
      sugov_update_single().
      
      All we need is a flag which can be used to discard the value of
      sg_policy->next_freq and we already have need_freq_update for that. Lets
      reuse it instead of setting next_freq to UINT_MAX.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ecd28842
    • D
      Revert "cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily" · 1b04722c
      Dietmar Eggemann 提交于
      This reverts commit e2cabe48.
      
      Lifting the restriction that the sugov kthread is bound to the
      policy->related_cpus for a system with a slow switching cpufreq driver,
      which is able to perform DVFS from any cpu (e.g. cpufreq-dt), is not
      only not beneficial it also harms Enery-Aware Scheduling (EAS) on
      systems with asymmetric cpu capacities (e.g. Arm big.LITTLE).
      
      The sugov kthread which does the update for the little cpus could
      potentially run on a big cpu. It could prevent that the big cluster goes
      into deeper idle states although all the tasks are running on the little
      cluster.
      
      Example: hikey960 w/ 4.16.0-rc6-+
               Arm big.LITTLE with per-cluster DVFS
      
      root@h960:~# cat /proc/cpuinfo | grep "^CPU part"
      CPU part        : 0xd03 (Cortex-A53, little cpu)
      CPU part        : 0xd03
      CPU part        : 0xd03
      CPU part        : 0xd03
      CPU part        : 0xd09 (Cortex-A73, big cpu)
      CPU part        : 0xd09
      CPU part        : 0xd09
      CPU part        : 0xd09
      
      root@h960:/sys/devices/system/cpu/cpufreq# ls
      policy0  policy4  schedutil
      
      root@h960:/sys/devices/system/cpu/cpufreq# cat policy*/related_cpus
      0 1 2 3
      4 5 6 7
      
      (1) w/o the revert:
      
      root@h960:~# ps -eo pid,class,rtprio,pri,psr,comm | awk 'NR == 1 ||
      /sugov/'
        PID CLS RTPRIO PRI PSR COMMAND
        1489 #6      0 140   1 sugov:0
        1490 #6      0 140   0 sugov:4
      
      The sugov kthread sugov:4 responsible for policy4 runs on cpu0. (In this
      case both sugov kthreads run on little cpus).
      
      cross policy (cluster) remote callback example:
      ...
      migration/1-14 [001] enqueue_task_fair: this_cpu=1 cpu_of(rq)=5
      migration/1-14 [001] sugov_update_shared: this_cpu=1 sg_cpu->cpu=5
                           sg_cpu->sg_policy->policy->related_cpus=4-7
        sugov:4-1490 [000] sugov_work: this_cpu=0
                           sg_cpu->sg_policy->policy->related_cpus=4-7
      ...
      
      The remote callback (this_cpu=1, target_cpu=5) is executed on cpu=0.
      
      (2) w/ the revert:
      
      root@h960:~# ps -eo pid,class,rtprio,pri,psr,comm | awk 'NR == 1 ||
      /sugov/'
        PID CLS RTPRIO PRI PSR COMMAND
        1491 #6      0 140   2 sugov:0
        1492 #6      0 140   4 sugov:4
      
      The sugov kthread sugov:4 responsible for policy4 runs on cpu4.
      
      cross policy (cluster) remote callback example:
      ...
      migration/1-14 [001] enqueue_task_fair: this_cpu=1 cpu_of(rq)=7
      migration/1-14 [001] sugov_update_shared: this_cpu=1 sg_cpu->cpu=7
                           sg_cpu->sg_policy->policy->related_cpus=4-7
        sugov:4-1492 [004] sugov_work: this_cpu=4
                           sg_cpu->sg_policy->policy->related_cpus=4-7
      ...
      
      The remote callback (this_cpu=1, target_cpu=7) is executed on cpu=4.
      
      Now the sugov kthread executes again on the policy (cluster) for which
      the Operating Performance Point (OPP) should be changed.
      It avoids the problem that an otherwise idle policy (cluster) is running
      schedutil (the sugov kthread) for another one.
      Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1b04722c
  4. 14 5月, 2018 5 次提交
    • L
      Linux 4.17-rc5 · 67b8d5c7
      Linus Torvalds 提交于
      67b8d5c7
    • L
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 66e1c94d
      Linus Torvalds 提交于
      Pull x86/pti updates from Thomas Gleixner:
       "A mixed bag of fixes and updates for the ghosts which are hunting us.
      
        The scheduler fixes have been pulled into that branch to avoid
        conflicts.
      
         - A set of fixes to address a khread_parkme() race which caused lost
           wakeups and loss of state.
      
         - A deadlock fix for stop_machine() solved by moving the wakeups
           outside of the stopper_lock held region.
      
         - A set of Spectre V1 array access restrictions. The possible
           problematic spots were discuvered by Dan Carpenters new checks in
           smatch.
      
         - Removal of an unused file which was forgotten when the rest of that
           functionality was removed"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/vdso: Remove unused file
        perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
        perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
        perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
        perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
        perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
        sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
        sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
        sched/core: Introduce set_special_state()
        kthread, sched/wait: Fix kthread_parkme() completion issue
        kthread, sched/wait: Fix kthread_parkme() wait-loop
        sched/fair: Fix the update of blocked load when newly idle
        stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
      66e1c94d
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 86a4ac43
      Linus Torvalds 提交于
      Pull scheduler fix from Thomas Gleixner:
       "Revert the new NUMA aware placement approach which turned out to
        create more problems than it solved"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"
      86a4ac43
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · baeda713
      Linus Torvalds 提交于
      Pull perf tooling fixes from Thomas Gleixner:
       "Another small set of perf tooling fixes and updates:
      
         - Revert "perf pmu: Fix pmu events parsing rule", as it broke Intel
           PT event description parsing (Arnaldo Carvalho de Melo)
      
         - Sync x86's cpufeatures.h and kvm UAPI headers with the kernel
           sources, suppressing the ABI drift warnings (Arnaldo Carvalho de
           Melo)
      
         - Remove duplicated entry for westmereep-dp in Intel's mapfile.csv
           (William Cohen)
      
         - Fix typo in 'perf bench numa' options description (Yisheng Xie)"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "perf pmu: Fix pmu events parsing rule"
        tools headers kvm: Sync ARM UAPI headers with the kernel sources
        tools headers kvm: Sync uapi/linux/kvm.h with the kernel sources
        tools headers: Sync x86 cpufeatures.h with the kernel sources
        perf vendor events intel: Remove duplicated entry for westmereep-dp in mapfile.csv
        perf bench numa: Fix typo in options
      baeda713
    • L
      Merge tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping · 0503fd65
      Linus Torvalds 提交于
      Pull dma-mapping fix from Christoph Hellwig:
       "Just one little fix from Jean to avoid a harmless but very annoying
        warning, especially for the drm code"
      
      * tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: silent unwanted warning "buffer is full"
      0503fd65
  5. 13 5月, 2018 3 次提交
  6. 12 5月, 2018 27 次提交