1. 03 6月, 2016 1 次提交
    • R
      cpufreq: governor: Get rid of governor events · e788892b
      Rafael J. Wysocki 提交于
      The design of the cpufreq governor API is not very straightforward,
      as struct cpufreq_governor provides only one callback to be invoked
      from different code paths for different purposes.  The purpose it is
      invoked for is determined by its second "event" argument, causing it
      to act as a "callback multiplexer" of sorts.
      
      Unfortunately, that leads to extra complexity in governors, some of
      which implement the ->governor() callback as a switch statement
      that simply checks the event argument and invokes a separate function
      to handle that specific event.
      
      That extra complexity can be eliminated by replacing the all-purpose
      ->governor() callback with a family of callbacks to carry out specific
      governor operations: initialization and exit, start and stop and policy
      limits updates.  That also turns out to reduce the code size too, so
      do it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      e788892b
  2. 30 5月, 2016 4 次提交
  3. 28 5月, 2016 1 次提交
    • A
      remove lots of IS_ERR_VALUE abuses · 287980e4
      Arnd Bergmann 提交于
      Most users of IS_ERR_VALUE() in the kernel are wrong, as they
      pass an 'int' into a function that takes an 'unsigned long'
      argument. This happens to work because the type is sign-extended
      on 64-bit architectures before it gets converted into an
      unsigned type.
      
      However, anything that passes an 'unsigned short' or 'unsigned int'
      argument into IS_ERR_VALUE() is guaranteed to be broken, as are
      8-bit integers and types that are wider than 'unsigned long'.
      
      Andrzej Hajda has already fixed a lot of the worst abusers that
      were causing actual bugs, but it would be nice to prevent any
      users that are not passing 'unsigned long' arguments.
      
      This patch changes all users of IS_ERR_VALUE() that I could find
      on 32-bit ARM randconfig builds and x86 allmodconfig. For the
      moment, this doesn't change the definition of IS_ERR_VALUE()
      because there are probably still architecture specific users
      elsewhere.
      
      Almost all the warnings I got are for files that are better off
      using 'if (err)' or 'if (err < 0)'.
      The only legitimate user I could find that we get a warning for
      is the (32-bit only) freescale fman driver, so I did not remove
      the IS_ERR_VALUE() there but changed the type to 'unsigned long'.
      For 9pfs, I just worked around one user whose calling conventions
      are so obscure that I did not dare change the behavior.
      
      I was using this definition for testing:
      
       #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \
             unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO))
      
      which ends up making all 16-bit or wider types work correctly with
      the most plausible interpretation of what IS_ERR_VALUE() was supposed
      to return according to its users, but also causes a compile-time
      warning for any users that do not pass an 'unsigned long' argument.
      
      I suggested this approach earlier this year, but back then we ended
      up deciding to just fix the users that are obviously broken. After
      the initial warning that caused me to get involved in the discussion
      (fs/gfs2/dir.c) showed up again in the mainline kernel, Linus
      asked me to send the whole thing again.
      
      [ Updated the 9p parts as per Al Viro  - Linus ]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andrzej Hajda <a.hajda@samsung.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.org/lkml/2016/1/7/363
      Link: https://lkml.org/lkml/2016/5/27/486
      Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      287980e4
  4. 18 5月, 2016 4 次提交
  5. 17 5月, 2016 1 次提交
  6. 13 5月, 2016 5 次提交
  7. 12 5月, 2016 5 次提交
  8. 11 5月, 2016 2 次提交
    • A
      cpufreq: powernv: del_timer_sync when global and local pstate are equal · 0bc10b93
      Akshay Adiga 提交于
      When global and local pstate are equal in a powernv_target_index() call,
      we don't queue a timer. But we may have timer already queued for future.
      This could cause the timer to fire one additional time for no use.
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0bc10b93
    • A
      cpufreq: powernv: Move smp_call_function_any() out of irq safe block · 1fd3ff28
      Akshay Adiga 提交于
      Fix a WARN_ON caused by smp_call_function_any() when irq is disabled,
      because of changes made in the patch ('cpufreq: powernv: Ramp-down
       global pstate slower than local-pstate')
      https://patchwork.ozlabs.org/patch/612058/
      
       WARNING: CPU: 0 PID: 4 at kernel/smp.c:291
      smp_call_function_single+0x170/0x180
      
       Call Trace:
       [c0000007f648f9f0] [c0000007f648fa90] 0xc0000007f648fa90 (unreliable)
       [c0000007f648fa30] [c0000000001430e0] smp_call_function_any+0x170/0x1c0
       [c0000007f648fa90] [c0000000007b4b00]
      powernv_cpufreq_target_index+0xe0/0x250
       [c0000007f648fb00] [c0000000007ac9dc]
      __cpufreq_driver_target+0x20c/0x3d0
       [c0000007f648fbc0] [c0000000007b1b4c] od_dbs_timer+0xcc/0x260
       [c0000007f648fc10] [c0000000007b3024] dbs_work_handler+0x54/0xa0
       [c0000007f648fc50] [c0000000000c49a8] process_one_work+0x1d8/0x590
       [c0000007f648fce0] [c0000000000c4e08] worker_thread+0xa8/0x660
       [c0000007f648fd80] [c0000000000cca88] kthread+0x108/0x130
       [c0000007f648fe30] [c0000000000095e8] ret_from_kernel_thread+0x5c/0x74
      
      - Calling smp_call_function_any() with interrupt disabled (through
       spin_lock_irqsave) could cause a deadlock, as smp_call_function_any()
       relies on the IPI to complete. This is detected in the
       smp_call_function_any() call and hence the WARN_ON.
      
      - As the spinlock (gpstates->lock) is only used to synchronize access of
       global_pstate_info  between timer irq handler and target_index calls. And
       the timer irq handler just try_locks() hence it would not cause a
       deadlock. Hence could do without making spinlocks irq safe.
      
      - As the smp_call_function_any() is a blocking call and does not access
       global_pstates_info, it could reduce the critcal section by moving
       smp_call_function_any() after giving up the lock.
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.linux.com>
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1fd3ff28
  9. 10 5月, 2016 1 次提交
  10. 07 5月, 2016 1 次提交
  11. 06 5月, 2016 1 次提交
    • R
      cpufreq: governor: Fix handling of special cases in dbs_update() · 9485e4ca
      Rafael J. Wysocki 提交于
      As reported in KBZ 69821:
      
      "With CONFIG_HZ_PERIODIC=y cpu stays at the lowest frequcency 800MHz
       even if usage goes to 100%, frequency does not scale up, the governor
       in use is ondemand. Neither works conservative. Performance and
       userspace governors work as expected.
      
       With CONFIG_NO_HZ_IDLE or CONFIG_NO_HZ_FULL cpu scales up with ondemand
       as expected."
      
      Analysis carried out by Chen Yu leads to the conclusion that the
      observed issue is due to idle_time in dbs_update() representing a
      negative number in which case the function will return 0 as the load
      (unless load is greater than 0 for another CPU sharing the policy),
      although that need not be the right choice.
      
      Indeed, idle_time representing a negative number means that during
      the last sampling interval the CPU was almost 100% busy on the rough
      average, so 100 should be returned as the load in that case.
      
      Modify the code accordingly and rearrange it to clarify the handling
      of all of the special cases in it.  While at it, also avoid returning
      zero as the load if time_elapsed is 0 (it doesn't really make sense
      to return 0 then).
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821Tested-by: NChen Yu <yu.c.chen@intel.com>
      Tested-by: NTimo Valtoaho <timo.valtoaho@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      9485e4ca
  12. 05 5月, 2016 4 次提交
  13. 04 5月, 2016 1 次提交
    • R
      intel_pstate: Fix intel_pstate_get() · 6d45b719
      Rafael J. Wysocki 提交于
      After commit 8fa520af "intel_pstate: Remove freq calculation from
      intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
      to compute the average frequency, which is problematic for two reasons.
      
      First, intel_pstate_get() may be invoked before the driver reads the
      CPU feedback registers for the first time and if that happens,
      get_avg_frequency() will attempt to divide by zero.
      
      Second, the get_avg_frequency() call in intel_pstate_get() is racy
      with respect to intel_pstate_sample() and it may end up returning
      completely meaningless values for this reason.
      
      Moreover, after commit 7349ec04 "intel_pstate: Move
      intel_pstate_calc_busy() into get_target_pstate_use_performance()"
      sample.core_pct_busy is never computed on Atom, but it is used in
      intel_pstate_adjust_busy_pstate() in that case too.
      
      To address those problems notice that if sample.core_pct_busy
      was used in the average frequency computation carried out by
      get_avg_frequency(), both the divide by zero problem and the
      race with respect to intel_pstate_sample() would be avoided.
      
      Accordingly, move the invocation of intel_pstate_calc_busy() from
      get_target_pstate_use_performance() to intel_pstate_update_util(),
      which also will take care of the uninitialized sample.core_pct_busy
      on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
      as per the above.
      Reported-by: Nkernel test robot <ying.huang@linux.intel.com>
      Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
      Fixes: 8fa520af "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
      Fixes: 7349ec04 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6d45b719
  14. 02 5月, 2016 1 次提交
  15. 01 5月, 2016 1 次提交
  16. 28 4月, 2016 7 次提交
    • S
      cpufreq: st: enable selective initialization based on the platform · 2482bc31
      Sudeep Holla 提交于
      The sti-cpufreq does unconditional registration of the cpufreq-dt driver
      which causes issue on an multi-platform build. For example, on Vexpress
      TC2 platform, we get the following error on boot:
      
      cpu cpu0: OPP-v2 not supported
      cpu cpu0: Not doing voltage scaling
      cpu: dev_pm_opp_of_cpumask_add_table: couldn't find opp table
      	for cpu:0, -19
      cpu cpu0: dev_pm_opp_get_max_volt_latency: Invalid regulator (-6)
      ...
      arm_big_little: bL_cpufreq_register: Failed registering platform driver:
      		vexpress-spc, err: -17
      
      The actual driver fails to initialise as cpufreq-dt is probed
      successfully, which is incorrect. This issue can happen to any platform
      not using cpufreq-dt in a multi-platform build.
      
      This patch adds a check to do selective initialization of the driver.
      
      Fixes: ab0ea257 (cpufreq: st: Provide runtime initialised driver for ST's platforms)
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NLee Jones <lee.jones@linaro.org>
      Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2482bc31
    • V
      cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/ · 9f123def
      Viresh Kumar 提交于
      Move cpufreq bits for mvebu into drivers/cpufreq/ directory, that's
      where they really belong to.
      
      Compiled tested only.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9f123def
    • V
      cpufreq: dt: Kill platform-data · eb96924a
      Viresh Kumar 提交于
      There are no more users of platform-data for cpufreq-dt driver, get rid
      of it.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eb96924a
    • V
      cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2 · 1530b996
      Viresh Kumar 提交于
      Existing platforms, which do not support operating-points-v2, can
      explicitly tell the opp core that some of the CPUs share opp tables,
      with help of dev_pm_opp_set_sharing_cpus().
      
      For such platforms, explicitly ask the opp core to provide list of CPUs
      sharing the opp table with current cpu device, before falling back to
      platform data.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1530b996
    • R
      cpufreq: governor: Change confusing struct field and variable names · b4f4b4b3
      Rafael J. Wysocki 提交于
      The name of the prev_cpu_wall field in struct cpu_dbs_info is
      confusing, because it doesn't represent wall time, but the previous
      update time as returned by get_cpu_idle_time() (that may be the
      current value of jiffies_64 in some cases, for example).
      
      Moreover, the names of some related variables in dbs_update() take
      that confusion further.
      
      Rename all of those things to make their names reflect the purpose
      more accurately.  While at it, drop unnecessary parens from one of
      the updated expressions.
      
      No functional changes.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NChen Yu <yu.c.chen@intel.com>
      b4f4b4b3
    • S
      cpufreq: intel_pstate: Enable PPC enforcement for servers · 2b3ec765
      Srinivas Pandruvada 提交于
      For platforms which are controlled via remove node manager, enable _PPC by
      default. These platforms are mostly categorized as enterprise server or
      performance servers. These platforms needs to go through some
      certifications tests, which tests control via _PPC.
      The relative risk of enabling by default is  low as this is is less likely
      that these systems have broken _PSS table.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2b3ec765
    • S
      cpufreq: intel_pstate: Adjust policy->max · 3be9200d
      Srinivas Pandruvada 提交于
      When policy->max is changed via _PPC or sysfs and is more than the max non
      turbo frequency, it does not really change resulting performance in some
      processors. When policy->max results in a P-State ratio more than the
      turbo activation ratio, then processor can choose any P-State up to max
      turbo. So the user or _PPC setting has no value, but this can cause
      undesirable side effects like:
      - Showing reduced max percentage in Intel P-State sysfs
      - It can cause reduced max performance under certain boundary conditions:
      The requested max scaling frequency either via _PPC or via cpufreq-sysfs,
      will be converted into a fixed floating point max percent scale. In
      majority of the cases this will result in correct max. But not 100% of the
      time. If the _PPC is requested at a point where the calculation lead to a
      lower max, this can result in a lower P-State then expected and it will
      impact performance.
      Example of this condition using a Broadwell laptop with config TDP.
      
      ACPI _PSS table from a Broadwell laptop
      2301000 2300000 2200000 2000000 1900000 1800000 1700000 1500000 1400000
      1300000 1100000 1000000 900000 800000 600000 500000
      
      The actual results by disabling config TDP so that we can get what is
      requested on or below 2300000Khz.
      
      scaling_max_freq        Max Requested P-State   Resultant scaling
      max
      ---------------------------------------- ----------------------
      2400000                 18                      2900000 (max
      turbo)
      2300000                 17                      2300000 (max
      physical non turbo)
      2200000                 15                      2100000
      2100000                 15                      2100000
      2000000                 13                      1900000
      1900000                 13                      1900000
      1800000                 12                      1800000
      1700000                 11                      1700000
      1600000                 10                      1600000
      1500000                 f                       1500000
      1400000                 e                       1400000
      1300000                 d                       1300000
      1200000                 c                       1200000
      1100000                 a                       1000000
      1000000                 a                       1000000
      900000                  9                        900000
      800000                  8                        800000
      700000                  7                        700000
      600000                  6                        600000
      500000                  5                        500000
      ------------------------------------------------------------------
      
      Now set the config TDP level 1 ratio as 0x0b (equivalent to 1100000KHz)
      in BIOS (not every system will let you adjust this).
      The turbo activation ratio will be set to one less than that, which will
      be 0x0a (So any request above 1000000KHz should result in turbo region
      assuming no thermal limits).
      Here _PPC will request max to 1100000KHz (which basically should still
      result in turbo as this is more than the turbo activation ratio up to
      max allowable turbo frequency), but actual calculation resulted in a max
      ceiling P-State which is 0x0a. So under any load condition, this driver
      will not request turbo P-States. This will be a huge performance hit.
      
      When config TDP feature is ON, if the _PPC points to a frequency above
      turbo activation ratio, the performance can still reach max turbo. In this
      case we don't need to treat this as the reduced frequency in set_policy
      callback.
      
      In this change when config TDP is active (by checking if the physical max
      non turbo ratio is more than the current max non turbo ratio), any request
      above current max non turbo is treated as full performance.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw : Minor cleanups ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3be9200d