1. 23 2月, 2016 1 次提交
    • V
      intel_pstate: Update frequencies of policy->cpus only from ->set_policy() · 41cfd64c
      Viresh Kumar 提交于
      The intel-pstate driver is using intel_pstate_hwp_set() from two
      separate paths, i.e. ->set_policy() callback and sysfs update path for
      the files present in /sys/devices/system/cpu/intel_pstate/ directory.
      
      While an update to the sysfs path applies to all the CPUs being managed
      by the driver (which essentially means all the online CPUs), the update
      via the ->set_policy() callback applies to a smaller group of CPUs
      managed by the policy for which ->set_policy() is called.
      
      And so, intel_pstate_hwp_set() should update frequencies of only the
      CPUs that are part of policy->cpus mask, while it is called from
      ->set_policy() callback.
      
      In order to do that, add a parameter (cpumask) to intel_pstate_hwp_set()
      and apply the frequency changes only to the concerned CPUs.
      
      For ->set_policy() path, we are only concerned about policy->cpus, and
      so policy->rwsem lock taken by the core prior to calling ->set_policy()
      is enough to take care of any races. The larger lock acquired by
      get_online_cpus() is required only for the updates to sysfs files.
      
      Add another routine, intel_pstate_hwp_set_online_cpus(), and call it
      from the sysfs update paths.
      
      This also fixes a lockdep reported recently, where policy->rwsem and
      get_online_cpus() could have been acquired in any order causing an ABBA
      deadlock. The sequence of events leading to that was:
      
      intel_pstate_init(...)
      	...cpufreq_online(...)
      		down_write(&policy->rwsem); // Locks policy->rwsem
      		...
      		cpufreq_init_policy(policy);
      			...intel_pstate_hwp_set();
      				get_online_cpus(); // Temporarily locks cpu_hotplug.lock
      		...
      		up_write(&policy->rwsem);
      
      pm_suspend(...)
      	...disable_nonboot_cpus()
      		_cpu_down()
      			cpu_hotplug_begin(); // Locks cpu_hotplug.lock
      			__cpu_notify(CPU_DOWN_PREPARE, ...);
      				...cpufreq_offline_prepare();
      					down_write(&policy->rwsem); // Locks policy->rwsem
      Reported-and-tested-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Acked-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      41cfd64c
  2. 12 12月, 2015 1 次提交
  3. 10 12月, 2015 3 次提交
  4. 26 11月, 2015 1 次提交
  5. 24 11月, 2015 2 次提交
  6. 19 11月, 2015 4 次提交
  7. 02 11月, 2015 1 次提交
  8. 17 10月, 2015 1 次提交
    • P
      cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value · 51443fbf
      Prarit Bhargava 提交于
      On systems that initialize the intel_pstate driver with the performance
      governor, and then switch to the powersave governor will not transition to
      lower cpu frequencies until /sys/devices/system/cpu/intel_pstate/min_perf_pct
      is set to a low value.
      
      The behavior of governor switching changed after commit a0475992
      ("[cpufreq] intel_pstate: honor user space min_perf_pct override on
       resume").  The commit introduced tracking of performance percentage
      changes via sysfs in order to restore userspace changes during
      suspend/resume.  The problem occurs because the global values of the newly
      introduced max_sysfs_pct and min_sysfs_pct are not lowered on the governor
      change and this causes the powersave governor to inherit the performance
      governor's settings.
      
      A simple change would have been to reset max_sysfs_pct to 100 and
      min_sysfs_pct to 0 on a governor change, which fixes the problem with
      governor switching.  However, since we cannot break userspace[1] the fix
      is now to give each governor its own limits storage area so that governor
      specific changes are tracked.
      
      I successfully tested this by booting with both the performance governor
      and the powersave governor by default, and switching between the two
      governors (while monitoring /sys/devices/system/cpu/intel_pstate/ values,
      and looking at the output of cpupower frequency-info).  Suspend/Resume
      testing was performed by Doug Smythies.
      
      [1] Systems which suspend/resume using the unmaintained pm-utils package
      will always transition to the performance governor before the suspend and
      after the resume.  This means a system using the powersave governor will
      go from powersave to performance, then suspend/resume, performance to
      powersave.  The simple change during governor changes would have been
      overwritten when the governor changed before and after the suspend/resume.
      I have submitted https://bugzilla.redhat.com/show_bug.cgi?id=1271225
      against Fedora to remove the 94cpufreq file that causes the problem.  It
      should be noted that pm-utils is obsoleted with newer versions of systemd.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Acked-by: NKristen Carlson Accardi <kristen@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      51443fbf
  9. 16 10月, 2015 1 次提交
  10. 15 10月, 2015 4 次提交
  11. 10 9月, 2015 2 次提交
  12. 07 8月, 2015 2 次提交
  13. 01 8月, 2015 1 次提交
  14. 27 7月, 2015 1 次提交
  15. 17 7月, 2015 1 次提交
  16. 06 7月, 2015 1 次提交
  17. 17 6月, 2015 1 次提交
    • P
      intel_pstate: Fix overflow in busy_scaled due to long delay · 7180dddf
      Prarit Bhargava 提交于
      The kernel may delay interrupts for a long time which can result in timers
      being delayed. If this occurs the intel_pstate driver will crash with a
      divide by zero error:
      
      divide error: 0000 [#1] SMP
      Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul
       crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
      CPU: 113 PID: 0 Comm: swapper/113 Tainted: G        W   --------------   3.10.0-229.1.2.el7.x86_64 #1
      Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014
      task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000
      RIP: 0010:[<ffffffff814a9279>]  [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
      RSP: 0018:ffff883fff4e3db8  EFLAGS: 00010206
      RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d
      RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001
      R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5
      R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246
      FS:  0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071
       ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd
       ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100
      Call Trace:
       <IRQ>
      
       [<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840
       [<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200
       [<ffffffff814a9100>] ? pid_param_set+0x130/0x130
       [<ffffffff8107df56>] call_timer_fn+0x36/0x110
       [<ffffffff814a9100>] ? pid_param_set+0x130/0x130
       [<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320
       [<ffffffff81077b2f>] __do_softirq+0xef/0x280
       [<ffffffff816156dc>] call_softirq+0x1c/0x30
       [<ffffffff81015d95>] do_softirq+0x65/0xa0
       [<ffffffff81077ec5>] irq_exit+0x115/0x120
       [<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60
       [<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80
       <EOI>
      
       [<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0
       [<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0
       [<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200
       [<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30
       [<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290
       [<ffffffff8104228a>] start_secondary+0x1ba/0x230
      Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
      RIP  [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
       RSP <ffff883fff4e3db8>
      
      The kernel values for cpudata for CPU 113 were:
      
      struct cpudata {
        cpu = 113,
        timer = {
          entry = {
            next = 0x0,
            prev = 0xdead000000200200
          },
          expires = 8357799745,
          base = 0xffff883fe84ec001,
          function = 0xffffffff814a9100 <intel_pstate_timer_func>,
          data = 18446612406765768960,
      <snip>
          i_gain = 0,
          d_gain = 0,
          deadband = 0,
          last_err = 22489
        },
        last_sample_time = {
          tv64 = 4063132438017305
        },
        prev_aperf = 287326796397463,
        prev_mperf = 251427432090198,
        sample = {
          core_pct_busy = 23081,
          aperf = 2937407,
          mperf = 3257884,
          freq = 2524484,
          time = {
            tv64 = 4063149215234118
          }
        }
      }
      
      which results in the time between samples = last_sample_time - sample.time
      = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.
      
      The duration between reads of the APERF and MPERF registers overflowed a s32
      sized integer in intel_pstate_get_scaled_busy()'s call to div_fp().  The result
      is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
      
      While the kernel shouldn't be delaying for a long time, it can and does
      happen and the intel_pstate driver should not panic in this situation.  This
      patch changes the div_fp() function to use div64_s64() to allow for "long"
      division.  This will avoid the overflow condition on long delays.
      
      [v2]: use div64_s64() in div_fp()
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7180dddf
  18. 10 6月, 2015 2 次提交
  19. 03 6月, 2015 1 次提交
    • S
      x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h> · d6472302
      Stephen Rothwell 提交于
      Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so
      remove it from there and fix up the resulting build problems
      triggered on x86 {64|32}-bit {def|allmod|allno}configs.
      
      The breakages were triggering in places where x86 builds relied
      on vmalloc() facilities but did not include <linux/vmalloc.h>
      explicitly and relied on the implicit inclusion via <asm/io.h>.
      
      Also add:
      
        - <linux/init.h> to <linux/io.h>
        - <asm/pgtable_types> to <asm/io.h>
      
      ... which were two other implicit header file dependencies.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      [ Tidied up the changelog. ]
      Acked-by: NDavid Miller <davem@davemloft.net>
      Acked-by: NTakashi Iwai <tiwai@suse.de>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NVinod Koul <vinod.koul@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: James E.J. Bottomley <JBottomley@odin.com>
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Suma Ramars <sramars@cisco.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d6472302
  20. 13 5月, 2015 1 次提交
  21. 05 5月, 2015 1 次提交
  22. 16 4月, 2015 2 次提交
  23. 11 4月, 2015 2 次提交
  24. 07 2月, 2015 1 次提交
  25. 30 1月, 2015 2 次提交