1. 06 8月, 2016 1 次提交
    • A
      cpufreq: powernv: Fix crash in gpstate_timer_handler() · 8e859467
      Akshay Adiga 提交于
      Commit 09ca4c9b (cpufreq: powernv: Replacing pstate_id with
      frequency table index) changes calc_global_pstate() to use
      cpufreq_table index instead of pstate_id.
      
      But in gpstate_timer_handler(), pstate_id was being passed instead
      of cpufreq_table index, which caused index_to_pstate() to access
      out of bound indices, leading to this crash.
      
      Adding sanity check for index and pstate, to ensure only valid pstate
      and index values are returned.
      
      Call Trace:
      [c00000078d66b130] [c00000000011d224] __free_irq+0x234/0x360
      (unreliable)
      [c00000078d66b1c0] [c00000000011d44c] free_irq+0x6c/0xa0
      [c00000078d66b1f0] [c00000000006c4f8] opal_event_shutdown+0x88/0xd0
      [c00000078d66b230] [c000000000067a4c] opal_shutdown+0x1c/0x90
      [c00000078d66b260] [c000000000063a00] pnv_shutdown+0x20/0x40
      [c00000078d66b280] [c000000000021538] machine_restart+0x38/0x90
      [c0000000078d66b310] [c000000000965ea0] panic+0x284/0x300
      [c00000078d66b3a0] [c00000000001f508] die+0x388/0x450
      [c00000078d66b430] [c000000000045a50] bad_page_fault+0xd0/0x140
      [c00000078d66b4a0] [c000000000008964] handle_page_fault+0x2c/0x30
         interrupt: 300 at gpstate_timer_handler+0x150/0x260
          LR = gpstate_timer_handler+0x130/0x260
      [c00000078d66b7f0] [c000000000132b58] call_timer_fn+0x58/0x1c0
      [c00000078d66b880] [c000000000132e20] expire_timers+0x130/0x1d0
      [c00000078d66b8f0] [c000000000133068] run_timer_softirq+0x1a8/0x230
      [c00000078d66b980] [c0000000000b535c] __do_softirq+0x18c/0x400
      [c00000078d66ba70] [c0000000000b5828] irq_exit+0xc8/0x100
      [c00000078d66ba90] [c00000000001e214] timer_interrupt+0xa4/0xe0
      [c00000078d66bac0] [c0000000000027d0] decrementer_common+0x150/0x180
         interrupt: 901 at arch_local_irq_restore+0x74/0x90
        0] [c000000000106b34] call_cpuidle+0x44/0x90
      [c00000078d66be50] [c00000000010708c] cpu_startup_entry+0x38c/0x460
      [c00000078d66bf20] [c00000000003d930] start_secondary+0x330/0x380
      [c00000078d66bf90] [c000000000008e6c] start_secondary_prolog+0x10/0x14
      
      Fixes: 09ca4c9b (cpufreq: powernv: Replacing pstate_id with frequency table index)
      Reported-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Tested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8e859467
  2. 12 7月, 2016 1 次提交
  3. 07 7月, 2016 2 次提交
  4. 09 6月, 2016 2 次提交
  5. 11 5月, 2016 2 次提交
    • A
      cpufreq: powernv: del_timer_sync when global and local pstate are equal · 0bc10b93
      Akshay Adiga 提交于
      When global and local pstate are equal in a powernv_target_index() call,
      we don't queue a timer. But we may have timer already queued for future.
      This could cause the timer to fire one additional time for no use.
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0bc10b93
    • A
      cpufreq: powernv: Move smp_call_function_any() out of irq safe block · 1fd3ff28
      Akshay Adiga 提交于
      Fix a WARN_ON caused by smp_call_function_any() when irq is disabled,
      because of changes made in the patch ('cpufreq: powernv: Ramp-down
       global pstate slower than local-pstate')
      https://patchwork.ozlabs.org/patch/612058/
      
       WARNING: CPU: 0 PID: 4 at kernel/smp.c:291
      smp_call_function_single+0x170/0x180
      
       Call Trace:
       [c0000007f648f9f0] [c0000007f648fa90] 0xc0000007f648fa90 (unreliable)
       [c0000007f648fa30] [c0000000001430e0] smp_call_function_any+0x170/0x1c0
       [c0000007f648fa90] [c0000000007b4b00]
      powernv_cpufreq_target_index+0xe0/0x250
       [c0000007f648fb00] [c0000000007ac9dc]
      __cpufreq_driver_target+0x20c/0x3d0
       [c0000007f648fbc0] [c0000000007b1b4c] od_dbs_timer+0xcc/0x260
       [c0000007f648fc10] [c0000000007b3024] dbs_work_handler+0x54/0xa0
       [c0000007f648fc50] [c0000000000c49a8] process_one_work+0x1d8/0x590
       [c0000007f648fce0] [c0000000000c4e08] worker_thread+0xa8/0x660
       [c0000007f648fd80] [c0000000000cca88] kthread+0x108/0x130
       [c0000007f648fe30] [c0000000000095e8] ret_from_kernel_thread+0x5c/0x74
      
      - Calling smp_call_function_any() with interrupt disabled (through
       spin_lock_irqsave) could cause a deadlock, as smp_call_function_any()
       relies on the IPI to complete. This is detected in the
       smp_call_function_any() call and hence the WARN_ON.
      
      - As the spinlock (gpstates->lock) is only used to synchronize access of
       global_pstate_info  between timer irq handler and target_index calls. And
       the timer irq handler just try_locks() hence it would not cause a
       deadlock. Hence could do without making spinlocks irq safe.
      
      - As the smp_call_function_any() is a blocking call and does not access
       global_pstates_info, it could reduce the critcal section by moving
       smp_call_function_any() after giving up the lock.
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.linux.com>
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1fd3ff28
  6. 28 4月, 2016 2 次提交
    • A
      cpufreq: powernv: Ramp-down global pstate slower than local-pstate · eaa2c3ae
      Akshay Adiga 提交于
      The frequency transition latency from pmin to pmax is observed to be in
      few millisecond granurality. And it usually happens to take a performance
      penalty during sudden frequency rampup requests.
      
      This patch set solves this problem by using an entity called "global
      pstates". The global pstate is a Chip-level entity, so the global entitiy
      (Voltage) is managed across the cores. The local pstate is a Core-level
      entity, so the local entity (frequency) is managed across threads.
      
      This patch brings down global pstate at a slower rate than the local
      pstate. Hence by holding global pstates higher than local pstate makes
      the subsequent rampups faster.
      
      A per policy structure is maintained to keep track of the global and
      local pstate changes. The global pstate is brought down using a parabolic
      equation. The ramp down time to pmin is set to ~5 seconds. To make sure
      that the global pstates are dropped at regular interval , a timer is
      queued for every 2 seconds during ramp-down phase, which eventually brings
      the pstate down to local pstate.
      
      Iozone results show fairly consistent performance boost.
      YCSB on redis shows improved Max latencies in most cases.
      
      Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb
      with different record sizes . The following table shows IOoperations/sec
      with and without patch.
      
      Iozone Results ( in op/sec) ( mean over 3 iterations )
      ---------------------------------------------------------------------
      file size-                      with            without		  %
      recordsize-IOtype               patch           patch		change
      ----------------------------------------------------------------------
      200704-1-SeqWrite               1616532         1615425         0.06
      200704-1-Rewrite                2423195         2303130         5.21
      200704-2-SeqWrite               1628577         1602620         1.61
      200704-2-Rewrite                2428264         2312154         5.02
      200704-4-SeqWrite               1617605         1617182         0.02
      200704-4-Rewrite                2430524         2351238         3.37
      200704-8-SeqWrite               1629478         1600436         1.81
      200704-8-Rewrite                2415308e         2298136         5.09
      200704-16-SeqWrite              1619632         1618250         0.08
      200704-16-Rewrite               2396650         2352591         1.87
      200704-32-SeqWrite              1632544         1598083         2.15
      200704-32-Rewrite               2425119         2329743         4.09
      200704-64-SeqWrite              1617812         1617235         0.03
      200704-64-Rewrite               2402021         2321080         3.48
      200704-128-SeqWrite             1631998         1600256         1.98
      200704-128-Rewrite              2422389         2304954         5.09
      200704-256 SeqWrite             1617065         1616962         0.00
      200704-256-Rewrite              2432539         2301980         5.67
      200704-512-SeqWrite             1632599         1598656         2.12
      200704-512-Rewrite              2429270         2323676         4.54
      200704-1024-SeqWrite            1618758         1616156         0.16
      200704-1024-Rewrite             2431631         2315889         4.99
      401408-1-SeqWrite               1631479         1608132         1.45
      401408-1-Rewrite                2501550         2459409         1.71
      401408-2-SeqWrite               1617095         1626069         -0.55
      401408-2-Rewrite                2507557         2443621         2.61
      401408-4-SeqWrite               1629601         1611869         1.10
      401408-4-Rewrite                2505909         2462098         1.77
      401408-8-SeqWrite               1617110         1626968         -0.60
      401408-8-Rewrite                2512244         2456827         2.25
      401408-16-SeqWrite              1632609         1609603         1.42
      401408-16-Rewrite               2500792         2451405         2.01
      401408-32-SeqWrite              1619294         1628167         -0.54
      401408-32-Rewrite               2510115         2451292         2.39
      401408-64-SeqWrite              1632709         1603746         1.80
      401408-64-Rewrite               2506692         2433186         3.02
      401408-128-SeqWrite             1619284         1627461         -0.50
      401408-128-Rewrite              2518698         2453361         2.66
      401408-256-SeqWrite             1634022         1610681         1.44
      401408-256-Rewrite              2509987         2446328         2.60
      401408-512-SeqWrite             1617524         1628016         -0.64
      401408-512-Rewrite              2504409         2442899         2.51
      401408-1024-SeqWrite            1629812         1611566         1.13
      401408-1024-Rewrite             2507620          2442968        2.64
      
      Tested with YCSB workload (50% update + 50% read) over redis for 1 million
      records and 1 million operation. Each test was carried out with target
      operations per second and persistence disabled.
      
      Max-latency (in us)( mean over 5 iterations )
      ---------------------------------------------------------------
      op/s    Operation       with patch      without patch   %change
      ---------------------------------------------------------------
      15000   Read            61480.6         50261.4         22.32
      15000   cleanup         215.2           293.6           -26.70
      15000   update          25666.2         25163.8         2.00
      
      25000   Read            32626.2         89525.4         -63.56
      25000   cleanup         292.2           263.0           11.10
      25000   update          32293.4         90255.0         -64.22
      
      35000   Read            34783.0         33119.0         5.02
      35000   cleanup         321.2           395.8           -18.8
      35000   update          36047.0         38747.8         -6.97
      
      40000   Read            38562.2         42357.4         -8.96
      40000   cleanup         371.8           384.6           -3.33
      40000   update          27861.4         41547.8         -32.94
      
      45000   Read            42271.0         88120.6         -52.03
      45000   cleanup         263.6           383.0           -31.17
      45000   update          29755.8         81359.0         -63.43
      
      (test without target op/s)
      47659   Read            83061.4         136440.6        -39.12
      47659   cleanup         195.8           193.8           1.03
      47659   update          73429.4         124971.8        -41.24
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eaa2c3ae
    • S
      cpufreq: powernv: Remove flag use-case of policy->driver_data · 2920e9ce
      Shilpasri G Bhat 提交于
      commit 1b028984 ("cpufreq: powernv: Add sysfs attributes to show
      throttle stats") used policy->driver_data as a flag for one-time creation
      of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to
      check if the attribute already exists. This is required as
      policy->driver_data is used for other purposes in the later patch.
      Signed-off-by: NShilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2920e9ce
  7. 23 3月, 2016 1 次提交
    • S
      cpufreq: powernv: Add sysfs attributes to show throttle stats · 1b028984
      Shilpasri G Bhat 提交于
      Create sysfs attributes to export throttle information in
      /sys/devices/system/cpu/cpuX/cpufreq/throttle_stats directory. The
      newly added sysfs files are as follows:
      
       1)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
       2)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub-turbo_stat
       3)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/unthrottle
       4)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/powercap
       5)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overtemp
       6)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/supply_fault
       7)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/overcurrent
       8)/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/occ_reset
      
      Detailed explanation of each attribute is added to
      Documentation/ABI/testing/sysfs-devices-system-cpu
      Signed-off-by: NShilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1b028984
  8. 22 3月, 2016 1 次提交
  9. 27 2月, 2016 1 次提交
  10. 05 2月, 2016 4 次提交
  11. 17 12月, 2015 1 次提交
  12. 26 9月, 2015 1 次提交
  13. 01 9月, 2015 1 次提交
  14. 28 7月, 2015 5 次提交
  15. 02 4月, 2015 1 次提交
    • S
      cpufreq: powernv: Report cpu frequency throttling · 09a972d1
      Shilpasri G Bhat 提交于
      The power and thermal safety of the system is taken care by an
      On-Chip-Controller (OCC) which is real-time subsystem embedded within
      the POWER8 processor. OCC continuously monitors the memory and core
      temperature, the total system power, state of power supply and fan.
      
      The cpu frequency can be throttled by OCC for the following reasons:
      1)If a processor crosses its power and temperature limit then OCC will
        lower its Pmax to reduce the frequency and voltage.
      2)If OCC crashes then the system is forced to Psafe frequency.
      3)If OCC fails to recover then the kernel is not allowed to do any
        further frequency changes and the chip will remain in Psafe.
      
      The user can see a drop in performance when frequency is throttled and
      is unaware of throttling. So detect and report such a condition, so
      the user can check the OCC status to reboot the system or check for
      power supply or fan failures.
      
      The current status of the core is read from Power Management Status
      Register(PMSR) to check if any of the throttling condition is occurred
      and the appropriate throttling message is reported.
      Signed-off-by: NShilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      09a972d1
  16. 29 9月, 2014 2 次提交
  17. 05 8月, 2014 1 次提交
  18. 17 5月, 2014 1 次提交
  19. 22 4月, 2014 1 次提交
  20. 07 4月, 2014 2 次提交
    • G
      cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids · 0692c691
      Gautham R. Shenoy 提交于
      The .driver_data field in the cpufreq_frequency_table was supposed to
      be private to the drivers. However at some later point, it was being
      used to indicate if the particular frequency in the table is the
      BOOST_FREQUENCY. After patches [1] and [2], the .driver_data is once
      again private to the driver. Thus we can safely use
      cpufreq_frequency_table.driver_data to store pstate_ids instead of
      having to maintain a separate array powernv_pstate_ids[] for this
      purpose.
      
      [1]:
        Subject: cpufreq: don't print value of .driver_data from core
        From   : Viresh Kumar <viresh.kumar@ linaro.org>
        url    : http://marc.info/?l=linux-pm&m=139601421504709&w=2
      
      [2]:
        Subject: cpufreq: create another field .flags in cpufreq_frequency_table
        From   : Viresh Kumar <viresh.kumar@linaro.org>
        url    : http://marc.info/?l=linux-pm&m=139601416804702&w=2Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0692c691
    • V
      cpufreq: powernv: cpufreq driver for powernv platform · b3d627a5
      Vaidyanathan Srinivasan 提交于
      Backend driver to dynamically set voltage and frequency on
      IBM POWER non-virtualized platforms.  Power management SPRs
      are used to set the required PState.
      
      This driver works in conjunction with cpufreq governors
      like 'ondemand' to provide a demand based frequency and
      voltage setting on IBM POWER non-virtualized platforms.
      
      PState table is obtained from OPAL v3 firmware through device
      tree.
      
      powernv_cpufreq back-end driver would parse the relevant device-tree
      nodes and initialise the cpufreq subsystem on powernv platform.
      
      The code was originally written by svaidy@linux.vnet.ibm.com. Over
      time it was modified to accomodate bug-fixes as well as updates to the
      the cpu-freq core. Relevant portions of the change logs corresponding
      to those modifications are noted below:
      
       * The policy->cpus needs to be populated in a hotplug-invariant
         manner instead of using cpu_sibling_mask() which varies with
         cpu-hotplug. This is because the cpufreq core code copies this
         content into policy->related_cpus mask which should not vary on
         cpu-hotplug. [Authored by srivatsa.bhat@linux.vnet.ibm.com]
      
       * Create a helper routine that can return the cpu-frequency for the
         corresponding pstate_id. Also, cache the values of the pstate_max,
         pstate_min and pstate_nominal and nr_pstates in a static structure
         so that they can be reused in the future to perform any
         validations. [Authored by ego@linux.vnet.ibm.com]
      
       * Create a driver attribute named cpuinfo_nominal_freq which creates
         a sysfs read-only file named cpuinfo_nominal_freq. Export the
         frequency corresponding to the nominal_pstate through this
         interface.
      
           Nominal frequency is the highest non-turbo frequency for the
         platform.  This is generally used for setting governor policies
         from user space for optimal energy efficiency. [Authored by
         ego@linux.vnet.ibm.com]
      
       * Implement a powernv_cpufreq_get(unsigned int cpu) method which will
         return the current operating frequency. Export this via the sysfs
         interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
         powernv_cpufreq_get(). [Authored by ego@linux.vnet.ibm.com]
      
      [Change log updated by ego@linux.vnet.ibm.com]
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b3d627a5