1. 29 7月, 2019 1 次提交
  2. 15 5月, 2019 1 次提交
    • P
      x86/cpu: Sanitize FAM6_ATOM naming · 1f1bc822
      Peter Zijlstra 提交于
      commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream
      
      Going primarily by:
      
        https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
      
      with additional information gleaned from other related pages; notably:
      
       - Bonnell shrink was called Saltwell
       - Moorefield is the Merriefield refresh which makes it Airmont
      
      The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
      
        for i in `git grep -l FAM6_ATOM` ; do
      	sed -i  -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g'		\
      		-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/'		\
      		-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g'		\
      		-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g'	\
      		-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g'		\
      		-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g'	\
      		-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g'	\
      		-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g'	\
      		-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g'	\
      		-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g'		\
      		-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
        done
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: dave.hansen@linux.intel.com
      Cc: len.brown@intel.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f1bc822
  3. 10 3月, 2019 1 次提交
    • V
      cpufreq: Use struct kobj_attribute instead of struct global_attr · 464b4279
      Viresh Kumar 提交于
      commit 625c85a62cb7d3c79f6e16de3cfa972033658250 upstream.
      
      The cpufreq_global_kobject is created using kobject_create_and_add()
      helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
      routines are set to kobj_attr_show() and kobj_attr_store().
      
      These routines pass struct kobj_attribute as an argument to the
      show/store callbacks. But all the cpufreq files created using the
      cpufreq_global_kobject expect the argument to be of type struct
      attribute. Things work fine currently as no one accesses the "attr"
      argument. We may not see issues even if the argument is used, as struct
      kobj_attribute has struct attribute as its first element and so they
      will both get same address.
      
      But this is logically incorrect and we should rather use struct
      kobj_attribute instead of struct global_attr in the cpufreq core and
      drivers and the show/store callbacks should take struct kobj_attribute
      as argument instead.
      
      This bug is caught using CFI CLANG builds in android kernel which
      catches mismatch in function prototypes for such callbacks.
      Reported-by: NDonghee Han <dh.han@samsung.com>
      Reported-by: NSangkyu Kim <skwith.kim@samsung.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      464b4279
  4. 06 8月, 2018 1 次提交
  5. 31 7月, 2018 1 次提交
  6. 19 7月, 2018 1 次提交
    • S
      cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP · eea033d0
      Srinivas Pandruvada 提交于
      On HWP platforms with Turbo 3.0, the HWP capability max ratio shows the
      maximum ratio of that core, which can be different than other cores. If
      we show the correct maximum frequency in cpufreq sysfs via
      cpuinfo_max_freq and scaling_max_freq then, user can know which cores
      can run faster for pinning some high priority tasks.
      
      Currently the max turbo frequency is shown as max frequency, which is
      the max of all cores, even if some cores can't reach that frequency
      even for single threaded workload.
      
      But it is possible that max ratio in HWP capabilities is set as 0xFF or
      some high invalid value (E.g. One KBL NUC). Since the actual performance
      can never exceed 1 core turbo frequency from MSR TURBO_RATIO_LIMIT, we
      use this as a bound check.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eea033d0
  7. 18 7月, 2018 1 次提交
  8. 02 7月, 2018 1 次提交
  9. 19 6月, 2018 1 次提交
    • S
      cpufreq: intel_pstate: Fix scaling max/min limits with Turbo 3.0 · ff7c9917
      Srinivas Pandruvada 提交于
      When scaling max/min settings are changed, internally they are converted
      to a ratio using the max turbo 1 core turbo frequency. This works fine
      when 1 core max is same irrespective of the core. But under Turbo 3.0,
      this will not be the case. For example:
      Core 0: max turbo pstate: 43 (4.3GHz)
      Core 1: max turbo pstate: 45 (4.5GHz)
      In this case 1 core turbo ratio will be maximum of all, so it will be
      45 (4.5GHz). Suppose scaling max is set to 4GHz (ratio 40) for all cores
      ,then on core one it will be
       = max_state * policy->max / max_freq;
       = 43 * (4000000/4500000) = 38 (3.8GHz)
       = 38
      which is 200MHz less than the desired.
      On core2, it will be correctly set to ratio 40 (4GHz). Same holds true
      for scaling min frequency limit. So this requires usage of correct turbo
      max frequency for core one, which in this case is 4.3GHz. So we need to
      adjust per CPU cpu->pstate.turbo_freq using the maximum HWP ratio of that
      core.
      
      This change uses the HWP capability of a core to adjust max turbo
      frequency. But since Broadwell HWP doesn't use ratios in the HWP
      capabilities, we have to use legacy max 1 core turbo ratio. This is not
      a problem as the HWP capabilities don't differ among cores in Broadwell.
      We need to check for non Broadwell CPU model for applying this change,
      though.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ff7c9917
  10. 13 6月, 2018 1 次提交
    • K
      treewide: Use array_size() in vzalloc() · fad953ce
      Kees Cook 提交于
      The vzalloc() function has no 2-factor argument form, so multiplication
      factors need to be wrapped in array_size(). This patch replaces cases of:
      
              vzalloc(a * b)
      
      with:
              vzalloc(array_size(a, b))
      
      as well as handling cases of:
      
              vzalloc(a * b * c)
      
      with:
      
              vzalloc(array3_size(a, b, c))
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc(4 * 1024)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc(C1 * C2 * C3, ...)
      |
        vzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc(C1 * C2, ...)
      |
        vzalloc(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      fad953ce
  11. 08 6月, 2018 1 次提交
  12. 06 6月, 2018 3 次提交
    • S
      cpufreq: intel_pstate: New sysfs entry to control HWP boost · aaaece3d
      Srinivas Pandruvada 提交于
      A new attribute is added to intel_pstate sysfs to enable/disable
      HWP dynamic performance boost.
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      aaaece3d
    • S
      cpufreq: intel_pstate: HWP boost performance on IO wakeup · 52ccc431
      Srinivas Pandruvada 提交于
      This change uses SCHED_CPUFREQ_IOWAIT flag to boost HWP performance.
      Since SCHED_CPUFREQ_IOWAIT flag is set frequently, we don't start
      boosting steps unless we see two consecutive flags in two ticks. This
      avoids boosting due to IO because of regular system activities.
      
      To avoid synchronization issues, the actual processing of the flag is
      done on the local CPU callback.
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      52ccc431
    • S
      cpufreq: intel_pstate: Add HWP boost utility and sched util hooks · e0efd5be
      Srinivas Pandruvada 提交于
      Added two utility functions to HWP boost up gradually and boost down to
      the default cached HWP request values.
      
      Boost up:
      Boost up updates HWP request minimum value in steps. This minimum value
      can reach upto at HWP request maximum values depends on how frequently,
      this boost up function is called. At max, boost up will take three steps
      to reach the maximum, depending on the current HWP request levels and HWP
      capabilities. For example, if the current settings are:
      If P0 (Turbo max) = P1 (Guaranteed max) = min
              No boost at all.
      If P0 (Turbo max) > P1 (Guaranteed max) = min
              Should result in one level boost only for P0.
      If P0 (Turbo max) = P1 (Guaranteed max) > min
              Should result in two level boost:
                      (min + p1)/2 and P1.
      If P0 (Turbo max) > P1 (Guaranteed max) > min
              Should result in three level boost:
                      (min + p1)/2, P1 and P0.
      We don't set any level between P0 and P1 as there is no guarantee that
      they will be honored.
      
      Boost down:
      After the system is idle for hold time of 3ms, the HWP request is reset
      to the default value from HWP init or user modified one via sysfs.
      
      Caching of HWP Request and Capabilities
      Store the HWP request value last set using MSR_HWP_REQUEST and read
      MSR_HWP_CAPABILITIES. This avoid reading of MSRs in the boost utility
      functions.
      
      These boost utility functions calculated limits are based on the latest
      HWP request value, which can be modified by setpolicy() callback. So if
      user space modifies the minimum perf value, that will be accounted for
      every time the boost up is called. There will be case when there can be
      contention with the user modified minimum perf, in that case user value
      will gain precedence. For example just before HWP_REQUEST MSR is updated
      from setpolicy() callback, the boost up function is called via scheduler
      tick callback. Here the cached MSR value is already the latest and limits
      are updated based on the latest user limits, but on return the MSR write
      callback called from setpolicy() callback will update the HWP_REQUEST
      value. This will be used till next time the boost up function is called.
      
      In addition add a variable to control HWP dynamic boosting. When HWP
      dynamic boost is active then set the HWP specific update util hook. The
      contents in the utility hooks will be filled in the subsequent patches.
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e0efd5be
  13. 15 5月, 2018 1 次提交
    • D
      cpufreq: intel_pstate: allow trace in passive mode · 50e9ffab
      Doug Smythies 提交于
      Allow use of the trace_pstate_sample trace function
      when the intel_pstate driver is in passive mode.
      Since the core_busy and scaled_busy fields are not
      used, and it might be desirable to know which path
      through the driver was used, either intel_cpufreq_target
      or intel_cpufreq_fast_switch, re-task the core_busy
      field as a flag indicator.
      
      The user can then use the intel_pstate_tracer.py utility
      to summarize and plot the trace.
      
      Note: The core_busy feild still goes by that name
      in include/trace/events/power.h and within the
      intel_pstate_tracer.py script and csv file headers,
      but it is graphed as "performance", and called
      core_avg_perf now in the intel_pstate driver.
      
      Sometimes, in passive mode, the driver is not called for
      many tens or even hundreds of seconds. The user
      needs to understand, and not be confused by, this limitation.
      Signed-off-by: NDoug Smythies <dsmythies@telus.net>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      50e9ffab
  14. 10 4月, 2018 1 次提交
  15. 08 2月, 2018 1 次提交
  16. 12 1月, 2018 2 次提交
  17. 29 8月, 2017 1 次提交
  18. 18 8月, 2017 1 次提交
  19. 11 8月, 2017 1 次提交
  20. 10 8月, 2017 2 次提交
  21. 04 8月, 2017 1 次提交
    • S
      cpufreq: intel_pstate: Improve IO performance with per-core P-states · 7bde2d50
      Srinivas Pandruvada 提交于
      In the current implementation, the response latency between seeing
      SCHED_CPUFREQ_IOWAIT set and the actual P-state adjustment can be up
      to 10ms.  It can be reduced by bumping up the P-state to the max at
      the time SCHED_CPUFREQ_IOWAIT is passed to intel_pstate_update_util().
      With this change, the IO performance improves significantly.
      
      For a simple "grep -r . linux" (Here linux is the kernel source
      folder) with caches dropped every time on a Broadwell Xeon workstation
      with per-core P-states, the user and system time is shorter by as much
      as 30% - 40%.
      
      The same performance difference was not observed on clients that don't
      support per-core P-state.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw: Changelog ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7bde2d50
  22. 01 8月, 2017 2 次提交
    • V
      sched: cpufreq: Allow remote cpufreq callbacks · 674e7541
      Viresh Kumar 提交于
      With Android UI and benchmarks the latency of cpufreq response to
      certain scheduling events can become very critical. Currently, callbacks
      into cpufreq governors are only made from the scheduler if the target
      CPU of the event is the same as the current CPU. This means there are
      certain situations where a target CPU may not run the cpufreq governor
      for some time.
      
      One testcase to show this behavior is where a task starts running on
      CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
      system is configured such that the new tasks should receive maximum
      demand initially, this should result in CPU0 increasing frequency
      immediately. But because of the above mentioned limitation though, this
      does not occur.
      
      This patch updates the scheduler core to call the cpufreq callbacks for
      remote CPUs as well.
      
      The schedutil, ondemand and conservative governors are updated to
      process cpufreq utilization update hooks called for remote CPUs where
      the remote CPU is managed by the cpufreq policy of the local CPU.
      
      The intel_pstate driver is updated to always reject remote callbacks.
      
      This is tested with couple of usecases (Android: hackbench, recentfling,
      galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
      octa-core, single policy). Only galleryfling showed minor improvements,
      while others didn't had much deviation.
      
      The reason being that this patch only targets a corner case, where
      following are required to be true to improve performance and that
      doesn't happen too often with these tests:
      
      - Task is migrated to another CPU.
      - The task has high demand, and should take the target CPU to higher
        OPPs.
      - And the target CPU doesn't call into the cpufreq governor until the
        next tick.
      
      Based on initial work from Steve Muckle.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NSaravana Kannan <skannan@codeaurora.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      674e7541
    • R
      cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL · f5c13f44
      Rafael J. Wysocki 提交于
      After commit 62611cb9 (intel_pstate: delete scheduler hook in HWP
      mode) the INTEL_PSTATE_HWP_SAMPLING_INTERVAL is not used anywhere in
      the code, so drop it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f5c13f44
  23. 28 7月, 2017 1 次提交
    • R
      cpufreq: intel_pstate: Drop ->get from intel_pstate structure · 22baebd4
      Rafael J. Wysocki 提交于
      The ->get callback in the intel_pstate structure was mostly there
      for the scaling_cur_freq sysfs attribute to work, but after commit
      f8475cef (x86: use common aperfmperf_khz_on_cpu() to calculate
      KHz using APERF/MPERF) that attribute uses arch_freq_get_on_cpu()
      provided by the x86 arch code on all processors supported by
      intel_pstate, so it doesn't need the ->get callback from the
      driver any more.
      
      Moreover, the very presence of the ->get callback in the intel_pstate
      structure causes the cpuinfo_cur_freq attribute to be present when
      intel_pstate operates in the active mode, which is bogus, because
      the role of that attribute is to return the current CPU frequency
      as seen by the hardware.  For intel_pstate, though, this is just an
      average frequency and not really current, but computed for the
      previous sampling interval (the actual current frequency may be
      way different at the point this value is obtained by reading from
      cpuinfo_cur_freq), and after commit 82b4e03e (intel_pstate: skip
      scheduler hook when in "performance" mode) the value in
      cpuinfo_cur_freq may be stale or just 0, depending on the driver's
      operation mode.  In fact, however, on the hardware supported by
      intel_pstate there is no way to read the current CPU frequency
      from it, so the cpuinfo_cur_freq attribute should not be present
      at all when this driver is in use.
      
      For this reason, drop intel_pstate_get() and clear the ->get
      callback pointer pointing to it, so that the cpuinfo_cur_freq is
      not present for intel_pstate in the active mode any more.
      
      Fixes: 82b4e03e (intel_pstate: skip scheduler hook when in "performance" mode)
      Reported-by: NHuaisheng Ye <yehs1@lenovo.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      22baebd4
  24. 27 7月, 2017 2 次提交
    • R
      cpufreq: intel_pstate: Drop ->update_util from pstate_funcs · c4f3f70c
      Rafael J. Wysocki 提交于
      All systems use the same P-state selection "powersave" algorithm
      in the active mode if HWP is not used, so there's no need to provide
      a pointer for it in struct pstate_funcs any more.
      
      Drop ->update_util from struct pstate_funcs and make
      intel_pstate_set_update_util_hook() use intel_pstate_update_util()
      directly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c4f3f70c
    • R
      cpufreq: intel_pstate: Do not use PID-based P-state selection · 9d0ef7af
      Rafael J. Wysocki 提交于
      All systems with a defined ACPI preferred profile that are not
      "servers" have been using the load-based P-state selection algorithm
      in intel_pstate since 4.12-rc1 (mobile systems and laptops have been
      using it since 4.10-rc1) and no problems with it have been reported
      to date.  In particular, no regressions with respect to the PID-based
      P-state selection have been reported.  Also testing indicates that
      the P-state selection algorithm based on CPU load is generally on par
      with the PID-based algorithm performance-wise, and for some workloads
      it turns out to be better than the other one, while being more
      straightforward and easier to understand at the same time.
      
      Moreover, the PID-based P-state selection algorithm in intel_pstate
      is known to be unstable in some situation and generally problematic,
      the issues with it are hard to address and it has become a
      significant maintenance burden.
      
      For these reasons, make intel_pstate use the "powersave" P-state
      selection algorithm based on CPU load in the active mode on all
      systems and drop the PID-based P-state selection code along with
      all things related to it from the driver.  Also update the
      documentation accordingly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9d0ef7af
  25. 26 7月, 2017 1 次提交
  26. 14 7月, 2017 1 次提交
    • S
      cpufreq: intel_pstate: Correct the busy calculation for KNL · 6e34e1f2
      Srinivas Pandruvada 提交于
      The busy percent calculated for the Knights Landing (KNL) platform
      is 1024 times smaller than the correct busy value.  This causes
      performance to get stuck at the lowest ratio.
      
      The scaling algorithm used for KNL is performance-based, but it still
      looks at the CPU load to set the scaled busy factor to 0 when the
      load is less than 1 percent.  In this case, since the computed load
      is 1024x smaller than it should be, the scaled busy factor will
      always be 0, irrespective of CPU business.
      
      This needs a fix similar to the turbostat one in commit b2b34dfe
      (tools/power turbostat: KNL workaround for %Busy and Avg_MHz).
      
      For this reason, add one more callback to processor-specific
      callbacks to specify an MPERF multiplier represented by a number of
      bit positions to shift the value of that register to the left to
      copmensate for its rate difference with respect to the TSC.  This
      shift value is used during CPU busy calculations.
      
      Fixes: ffb81056 (intel_pstate: Avoid getting stuck in high P-states when idle)
      Reported-and-tested-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
      [ rjw: Changelog ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6e34e1f2
  27. 12 7月, 2017 1 次提交
    • S
      cpufreq: intel_pstate: Fix ratio setting for min_perf_pct · d4436c0d
      Srinivas Pandruvada 提交于
      When the minimum performance limit percentage is set to the power-up
      default, it is possible that minimum performance ratio is off by one.
      
      In the set_policy() callback the minimum ratio is calculated by
      applying global.min_perf_pct to turbo_ratio and rounding up, but the
      power-up default global.min_perf_pct is already rounded up to the
      next percent in min_perf_pct_min().  That results in two round up
      operations, so for the default min_perf_pct one of them is not
      required.
      
      It is better to remove rounding up in min_perf_pct_min() as this
      matches the displayed min_perf_pct prior to commit c5a2ee7d
      (cpufreq: intel_pstate: Active mode P-state limits rework) in 4.12.
      
      For example on a platform with max turbo ratio of 37 and minimum
      ratio of 10, the min_perf_pct resulted in 28 with the above commit.
      Before this commit it was 27 and it will be the same after this
      change.
      
      Fixes: 1a4fe38a (cpufreq: intel_pstate: Remove max/min fractions to limit performance)
      Reported-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d4436c0d
  28. 05 7月, 2017 1 次提交
  29. 30 6月, 2017 1 次提交
  30. 27 6月, 2017 2 次提交
  31. 24 6月, 2017 1 次提交
    • S
      cpufreq: intel_pstate: Remove max/min fractions to limit performance · 1a4fe38a
      Srinivas Pandruvada 提交于
      In the current model the max/min perf limits are a fraction of current
      user space limits to the allowed max_freq or 100% for global limits.
      This results in wrong ratio limits calculation because of rounding
      issues for some user space limits.
      
      Initially we tried to solve this issue by issue by having more shift
      bits to increase precision. Still there are isolated cases where we still
      have error.
      
      This can be avoided by using ratios all together. Since the way we get
      cpuinfo.max_freq is by multiplying scaling factor to max ratio, we can
      easily keep the max/min ratios in terms of ratios and not fractions.
      
      For example:
      if the max ratio = 36
      cpuinfo.max_freq = 36 * 100000 = 3600000
      
      Suppose user space sets a limit of 1200000, then we can calculate
      max ratio limit as
      = 36 * 1200000 / 3600000
      = 12
      This will be correct for any user limits.
      
      The other advantage is that, we don't need to do any calculation in the
      fast path as ratio limit is already calculated via set_policy() callback.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1a4fe38a
  32. 05 6月, 2017 1 次提交
  33. 12 5月, 2017 1 次提交
    • L
      intel_pstate: use updated msr-index.h HWP.EPP values · 3cedbc5a
      Len Brown 提交于
      intel_pstate exports sysfs attributes for setting and observing HWP.EPP.
      These attributes use strings to describe 4 operating states, and
      inside the driver, these strings are mapped to numerical register
      values.
      
      The authorative mapping between the strings and numerical HWP.EPP values
      are now globally defined in msr-index.h, replacing the out-dated
      mapping that were open-coded into intel_pstate.c
      
      new old string
      --- --- ------
        0   0 performance
      128  64 balance_performance
      192 128 balance_power
      255 192 power
      
      Note that the HW and BIOS default value on most system is 128,
      which intel_pstate will now call "balance_performance"
      while it used to call it "balance_power".
      Signed-off-by: NLen Brown <len.brown@intel.com>
      3cedbc5a