1. 15 12月, 2016 3 次提交
  2. 14 12月, 2016 2 次提交
  3. 13 12月, 2016 1 次提交
    • T
      x86/smpboot: Make logical package management more robust · 9d85eb91
      Thomas Gleixner 提交于
      The logical package management has several issues:
      
       - The APIC ids provided by ACPI are not required to be the same as the
         initial APIC id which can be retrieved by CPUID. The APIC ids provided
         by ACPI are those which are written by the BIOS into the APIC. The
         initial id is set by hardware and can not be changed. The hardware
         provided ids contain the real hardware package information.
      
         Especially AMD sets the effective APIC id different from the hardware id
         as they need to reserve space for the IOAPIC ids starting at id 0.
      
         As a consequence those machines trigger the currently active firmware
         bug printouts in dmesg, These are obviously wrong.
      
       - Virtual machines have their own interesting of enumerating APICs and
         packages which are not reliably covered by the current implementation.
      
      The sizing of the mapping array has been tweaked to be generously large to
      handle systems which provide a wrong core count when HT is disabled so the
      whole magic which checks for space in the physical hotplug case is not
      needed anymore.
      
      Simplify the whole machinery and do the mapping when the CPU starts and the
      CPUID derived physical package information is available. This solves the
      observed problems on AMD machines and works for the virtualization issues
      as well.
      
      Remove the extra call from XEN cpu bringup code as it is not longer
      required.
      
      Fixes: d49597fd ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: M. Vefa Bicakci <m.v.b@runbox.com>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Cc: Charles (Chas) Williams <ciwillia@brocade.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9d85eb91
  4. 11 12月, 2016 3 次提交
    • P
      x86/paravirt: Fix bool return type for PVOP_CALL() · 11f254db
      Peter Zijlstra 提交于
      Commit:
      
        3cded417 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
      
      introduced a paravirt op with bool return type [*]
      
      It turns out that the PVOP_CALL*() macros miscompile when rettype is
      bool. Code that looked like:
      
         83 ef 01                sub    $0x1,%edi
         ff 15 32 a0 d8 00       callq  *0xd8a032(%rip)        # ffffffff81e28120 <pv_lock_ops+0x20>
         84 c0                   test   %al,%al
      
      ended up looking like so after PVOP_CALL1() was applied:
      
         83 ef 01                sub    $0x1,%edi
         48 63 ff                movslq %edi,%rdi
         ff 14 25 20 81 e2 81    callq  *0xffffffff81e28120
         48 85 c0                test   %rax,%rax
      
      Note how it tests the whole of %rax, even though a typical bool return
      function only sets %al, like:
      
        0f 95 c0                setne  %al
        c3                      retq
      
      This is because ____PVOP_CALL() does:
      
      		__ret = (rettype)__eax;
      
      and while regular integer type casts truncate the result, a cast to
      bool tests for any !0 value. Fix this by explicitly truncating to
      sizeof(rettype) before casting.
      
      [*] The actual bug should've been exposed in commit:
            446f3dc8 ("locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests")
          but that didn't properly implement the paravirt call.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 3cded417 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
      Link: http://lkml.kernel.org/r/20161208154349.346057680@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      11f254db
    • P
      x86/paravirt: Fix native_patch() · 45dbea5f
      Peter Zijlstra 提交于
      While chasing a regression I noticed we potentially patch the wrong
      code in native_patch().
      
      If we do not select the native code sequence, we must use the default
      patcher, not fall-through the switch case.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel test robot <xiaolong.ye@intel.com>
      Fixes: 3cded417 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
      Link: http://lkml.kernel.org/r/20161208154349.270616999@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45dbea5f
    • A
      perf/x86: Fix exclusion of BTS and LBR for Goldmont · b0c1ef52
      Andi Kleen 提交于
      An earlier patch allowed enabling PT and LBR at the same
      time on Goldmont. However it also allowed enabling BTS and LBR
      at the same time, which is still not supported. Fix this by
      bypassing the check only for PT.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: alexander.shishkin@intel.com
      Cc: kan.liang@intel.com
      Cc: <stable@vger.kernel.org>
      Fixes: ccbebba4 ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it")
      Link: http://lkml.kernel.org/r/20161209001417.4713-1-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b0c1ef52
  5. 10 12月, 2016 7 次提交
    • T
      x86/ldt: Make all size computations unsigned · 990e9dc3
      Thomas Gleixner 提交于
      ldt->size can never be negative. The helper functions take 'unsigned int'
      arguments which are assigned from ldt->size. The related user space
      user_desc struct member entry_number is unsigned as well.
      
      But ldt->size itself and a few local variables which are related to
      ldt->size are type 'int' which makes no sense whatsoever and results in
      typecasts which make the eyes bleed.
      
      Clean it up and convert everything which is related to ldt->size to
      unsigned it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      990e9dc3
    • D
      x86/ldt: Make a size argument unsigned · 296dc580
      Dan Carpenter 提交于
      My static checker complains that we put an upper bound on the "size"
      argument but not a lower bound.  The checker is not smart enough to know
      the possible ranges of "old_mm->context.ldt->size" from
      init_new_context_ldt() so it thinks maybe it could be negative.
      
      Let's make it unsigned to silence the warning and future proof the code
      a bit.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: kernel-janitors@vger.kernel.org
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161208105602.GA11382@elgon.mountainSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      296dc580
    • T
      x86: Remove empty idle.h header · 34bc3560
      Thomas Gleixner 提交于
      One include less is always a good thing(tm). Good riddance.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      34bc3560
    • B
      x86/amd: Simplify AMD E400 aware idle routine · 07c94a38
      Borislav Petkov 提交于
      Reorganize the E400 detection now that we have everything in place:
      switch the CPUs to broadcast mode after the LAPIC has been initialized
      and remove the facilities that were used previously on the idle path.
      
      Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine
      because alternatives have been applied when the actual detection happens,
      so the static switching does not take effect and the test will stay
      false. Use boot_cpu_has_bug() instead which is definitely an improvement
      over the RDMSR and the cpumask handling.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      07c94a38
    • T
      x86/amd: Check for the C1E bug post ACPI subsystem init · e7ff3a47
      Thomas Gleixner 提交于
      AMD CPUs affected by the E400 erratum suffer from the issue that the
      local APIC timer stops when the CPU goes into C1E. Unfortunately there
      is no way to detect the affected CPUs on early boot. It's only possible
      to determine the range of possibly affected CPUs from the family/model
      range.
      
      The actual decision whether to enter C1E and thus cause the bug is done
      by the firmware and we need to detect that case late, after ACPI has
      been initialized.
      
      The current solution is to check in the idle routine whether the CPU is
      affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the
      K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is
      affected and the system is switched into forced broadcast mode.
      
      This is ineffective and on non-affected CPUs every entry to idle does
      the extra RDMSR.
      
      After doing some research it turns out that the bits are visible on the
      boot CPU right after the ACPI subsystem is initialized in the early
      boot process. So instead of polling for the bits in the idle loop, add
      a detection function after acpi_subsystem_init() and check for the MSR
      bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and
      the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it
      will stop in C1E state as well.
      
      The switch to broadcast mode cannot be done at this point because the
      boot CPU still uses HPET as a clockevent device and the local APIC timer
      is not yet calibrated and installed. The switch to broadcast mode on the
      affected CPUs needs to be done when the local APIC timer is actually set
      up.
      
      This allows to cleanup the amd_e400_idle() function in the next step.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e7ff3a47
    • T
      x86/bugs: Separate AMD E400 erratum and C1E bug · 3344ed30
      Thomas Gleixner 提交于
      The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E
      state) is a two step process:
      
       - Selection of the E400 aware idle routine
      
       - Detection whether the platform is affected
      
      The idle routine selection happens for possibly affected CPUs depending on
      family/model/stepping information. These range of CPUs is not necessarily
      affected as the decision whether to enable the C1E feature is made by the
      firmware. Unfortunately there is no way to query this at early boot.
      
      The current implementation polls a MSR in the E400 aware idle routine to
      detect whether the CPU is affected. This is inefficient on non affected
      CPUs because every idle entry has to do the MSR read.
      
      There is a better way to detect this before going idle for the first time
      which requires to seperate the bug flags:
      
        X86_BUG_AMD_E400 	- Selects the E400 aware idle routine and
        			  enables the detection
      			  
        X86_BUG_AMD_APIC_C1E  - Set when the platform is affected by E400
      
      Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400
      bug bit to select the idle routine which currently does an unconditional
      detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches
      to remove the MSR polling and simplify the handling of this misfeature.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3344ed30
    • B
      x86/cpufeature: Provide helper to set bugs bits · a588b983
      Borislav Petkov 提交于
      Will be used in a later patch to set bug bits for bugs which need late
      detection.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-2-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a588b983
  6. 09 12月, 2016 1 次提交
  7. 06 12月, 2016 3 次提交
  8. 30 11月, 2016 3 次提交
    • I
      sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable · 0a21fc12
      Ingo Molnar 提交于
      Right now CONFIG_SCHED_MC_PRIO has X86_INTEL_PSTATE as a dependency,
      which is not enabled by default and which hides the CONFIG_SCHED_MC_PRIO
      hardware-enabling feature.
      
      Select X86_INTEL_PSTATE instead, plus its dependency (CPU_FREQ), if the
      user enables CONFIG_SCHED_MC_PRIO=y.
      
      (Also align the CONFIG_SCHED_MC_PRIO Kconfig help text in standard style.)
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: bp@suse.de
      Cc: jolsa@redhat.com
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: rjw@rjwysocki.net
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0a21fc12
    • T
      sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO · de966cf4
      Tim Chen 提交于
      Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0
      to CONFIG_SCHED_MC_PRIO.  This makes the configuration extensible
      in future to other architectures that wish to similarly establish
      CPU core priorities support in the scheduler.
      
      The description in Kconfig is updated to reflect this change with
      added details for better clarity.  The configuration is explicitly
      default-y, to enable the feature on CPUs that have this feature.
      
      It has no effect on non-TBM3 CPUs.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bp@suse.de
      Cc: jolsa@redhat.com
      Cc: linux-acpi@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: rjw@rjwysocki.net
      Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      de966cf4
    • C
      timekeeping: Ignore the bogus sleep time if pm_trace is enabled · ba58d102
      Chen Yu 提交于
      Power management suspend/resume tracing (ab)uses the RTC to store
      suspend/resume information persistently. As a consequence the RTC value is
      clobbered when timekeeping is resumed and tries to inject the sleep time.
      
      Commit a4f8f666 ("timekeeping: Cap array access in timekeeping_debug")
      plugged a out of bounds array access in the timekeeping debug code which
      was caused by the clobbered RTC value, but we still use the clobbered RTC
      value for sleep time injection into kernel timekeeping, which will result
      in random adjustments depending on the stored "hash" value.
      
      To prevent this keep track of the RTC clobbering and ignore the invalid RTC
      timestamp at resume. If the system resumed successfully clear the flag,
      which marks the RTC as unusable, warn the user about the RTC clobber and
      recommend to adjust the RTC with 'ntpdate' or 'rdate'.
      
      [jstultz: Fixed up pr_warn formating, and implemented suggestions from Ingo]
      [ tglx: Rewrote changelog ]
      
      Originally-from: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/1480372524-15181-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ba58d102
  9. 28 11月, 2016 7 次提交
    • I
      x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h> · a293b395
      Ingo Molnar 提交于
      asm/mutex.h is gone from the locking tree, which makes sched/core break the build.
      
      Use linux/mutex.h instead, which is the canonical method.
      
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: peterz@infradead.org
      Cc: jolsa@redhat.com
      Cc: rjw@rjwysocki.net
      Cc: bp@suse.de
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a293b395
    • P
      x86/build: Remove three unneeded genhdr-y entries · 9190e217
      Paul Bolle 提交于
      In x86's include/asm/Kbuild three entries are appended to the genhdr-y make
      variable:
      
          genhdr-y += unistd_32.h
          genhdr-y += unistd_64.h
          genhdr-y += unistd_x32.h
      
      The same entries are also appended to that variable in
      include/uapi/asm/Kbuild. So commit:
      
        10b63956 ("UAPI: Plumb the UAPI Kbuilds into the user header installation and checking")
      
      ... removed these three entries from include/asm/Kbuild. But, apparently, some
      merge conflict resolution re-added them.
      
      The net effect is, in short, that the genhdr-y make variable contains these
      file names twice and, as a consequence, that the corresponding headers get
      installed twice. And so the build prints:
      
        INSTALL usr/include/asm/ (65 files)
      
      ... while in reality only 62 files are installed in that directory.
      
      Nothing breaks because of all that, but it's a good idea to finally remove
      these unneeded entries nevertheless.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1480077707-2837-1-git-send-email-pebolle@tiscali.nlSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9190e217
    • P
      x86/build: Don't use $(LINUXINCLUDE) twice · 06cbbac0
      Paul Bolle 提交于
      The make variable KBUILD_CFLAGS contains $(LINUXINCLUDE). But the build
      already picks up $(LINUXINCLUDE) from scripts/Makefile.lib. The net effect
      is that the (long) list of include directories is used twice.
      
      This is harmless but pointless. So stop using $(LINUXINCLUDE) twice.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1480077514-2586-1-git-send-email-pebolle@tiscali.nlSigned-off-by: NIngo Molnar <mingo@kernel.org>
      06cbbac0
    • J
      x86/unwind: Fix guess-unwinder regression · 55f856e6
      Josh Poimboeuf 提交于
      My attempt at fixing some KASAN false positive warnings was rather brain
      dead, and it broke the guess unwinder.  With frame pointers disabled,
      /proc/<pid>/stack is broken:
      
        # cat /proc/1/stack
        [<ffffffffffffffff>] 0xffffffffffffffff
      
      Restore the code flow to more closely resemble its previous state, while
      still using READ_ONCE_NOCHECK() macros to silence KASAN false positives.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: c2d75e03 ("x86/unwind: Prevent KASAN false positive warnings in guess unwinder")
      Link: http://lkml.kernel.org/r/b824f92c2c22eca5ec95ac56bd2a7c84cf0b9df9.1480309971.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      55f856e6
    • P
      x86/build: Annotate die() with noreturn to fix build warning on clang · adee8705
      Peter Foley 提交于
      Fixes below warning with clang:
      
        In file included from ../arch/x86/tools/relocs_64.c:17:
        ../arch/x86/tools/relocs.c:977:6: warning: variable 'do_reloc' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
      Signed-off-by: NPeter Foley <pefoley2@pefoley.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161126222229.673-1-pefoley2@pefoley.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      adee8705
    • B
      x86/platform/olpc: Fix resume handler build warning · 20ab6677
      Borislav Petkov 提交于
      Fix:
      
        arch/x86/platform/olpc/olpc-xo15-sci.c:199:12: warning: ‘xo15_sci_resume’
        defined but not used [-Wunused-function]
         static int xo15_sci_resume(struct device *dev)
                    ^
      
      which I see in randconfig builds here.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161126142706.13602-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20ab6677
    • B
      x86/boot/64: Optimize fixmap page fixup · 6248f456
      Borislav Petkov 提交于
      Single-stepping through head_64.S made me look at the fixmap page PTEs
      fixup loop:
      
      So we're going through the whole level2_fixmap_pgt 4K page, looking at
      whether PAGE_PRESENT is set in those PTEs and add the delta between
      where we're compiled to run and where we actually end up running.
      
      However, if that delta is 0 (most cases) we go through all those 512
      PTEs for no reason at all. Oh well, we add 0 but that's no reason to me.
      
      Skipping that useless fixup gives us a boot speedup of 0.004 seconds in
      my guest. Not a lot but considering how cheap it is, I'll take it. Here
      is the printk time difference:
      
      before:
        ...
        [    0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized
        [    0.013590] Calibrating delay loop (skipped), value calculated using timer frequency..
      		8027.17 BogoMIPS (lpj=16054348)
        [    0.017094] pid_max: default: 32768 minimum: 301
        ...
      
      after:
        ...
        [    0.000000] tsc: Marking TSC unstable due to TSCs unsynchronized
        [    0.009587] Calibrating delay loop (skipped), value calculated using timer frequency..
      		8026.86 BogoMIPS (lpj=16053724)
        [    0.013090] pid_max: default: 32768 minimum: 301
        ...
      
      For the other two changes converting naked numbers to defines:
      
        # arch/x86/kernel/head_64.o:
      
         text    data     bss     dec     hex filename
         1124  290864    4096  296084   48494 head_64.o.before
         1124  290864    4096  296084   48494 head_64.o.after
      
      md5:
         87086e202588939296f66e892414ffe2  head_64.o.before.asm
         87086e202588939296f66e892414ffe2  head_64.o.after.asm
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161125111448.23623-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6248f456
  10. 25 11月, 2016 9 次提交
    • B
      x86/boot/64: Use defines for page size · 9b032d21
      Borislav Petkov 提交于
      ... instead of naked numbers like the rest of the asm does in this file.
      
      No code changed:
      
        # arch/x86/kernel/head_64.o:
      
         text    data     bss     dec     hex filename
         1124  290864    4096  296084   48494 head_64.o.before
         1124  290864    4096  296084   48494 head_64.o.after
      
      md5:
         87086e202588939296f66e892414ffe2  head_64.o.before.asm
         87086e202588939296f66e892414ffe2  head_64.o.after.asm
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161124210550.15025-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9b032d21
    • T
      x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU · d3d37d85
      Tim Chen 提交于
      Some Intel cores in a package can be boosted to a higher turbo frequency
      with ITMT 3.0 technology. The scheduler can use the asymmetric packing
      feature to move tasks to the more capable cores.
      
      If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
      sched domains to enable asymmetric packing.
      Co-developed-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: linux-pm@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: jolsa@redhat.com
      Cc: rjw@rjwysocki.net
      Cc: linux-acpi@vger.kernel.org
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: bp@suse.de
      Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d3d37d85
    • T
      x86/sysctl: Add sysctl for ITMT scheduling feature · f9793e34
      Tim Chen 提交于
      Intel Turbo Boost Max Technology 3.0 (ITMT) feature
      allows some cores to be boosted to higher turbo
      frequency than others.
      
      Add /proc/sys/kernel/sched_itmt_enabled so operator
      can enable/disable scheduling of tasks that favor cores
      with higher turbo boost frequency potential.
      
      By default, system that is ITMT capable and single
      socket has this feature turned on.  It is more likely
      to be lightly loaded and operates in Turbo range.
      
      When there is a change in the ITMT scheduling operation
      desired, a rebuild of the sched domain is initiated
      so the scheduler can set up sched domains with appropriate
      flag to enable/disable ITMT scheduling operations.
      Co-developed-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Co-developed-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: linux-pm@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: jolsa@redhat.com
      Cc: rjw@rjwysocki.net
      Cc: linux-acpi@vger.kernel.org
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: bp@suse.de
      Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f9793e34
    • T
      x86: Enable Intel Turbo Boost Max Technology 3.0 · 5e76b2ab
      Tim Chen 提交于
      On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
      turbo frequencies of some cores in a CPU package may be higher than for
      the other cores in the same package.  In that case, better performance
      (and possibly lower energy consumption as well) can be achieved by
      making the scheduler prefer to run tasks on the CPUs with higher max
      turbo frequencies.
      
      To that end, set up a core priority metric to abstract the core
      preferences based on the maximum turbo frequency.  In that metric,
      the cores with higher maximum turbo frequencies are higher-priority
      than the other cores in the same package and that causes the scheduler
      to favor them when making load-balancing decisions using the asymmertic
      packing approach.  At the same time, the priority of SMT threads with a
      higher CPU number is reduced so as to avoid scheduling tasks on all of
      the threads that belong to a favored core before all of the other cores
      have been given a task to run.
      
      The priority metric will be initialized by the P-state driver with the
      help of the sched_set_itmt_core_prio() function.  The P-state driver
      will also determine whether or not ITMT is supported by the platform
      and will call sched_set_itmt_support() to indicate that.
      Co-developed-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Co-developed-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: linux-pm@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: jolsa@redhat.com
      Cc: rjw@rjwysocki.net
      Cc: linux-acpi@vger.kernel.org
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: bp@suse.de
      Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5e76b2ab
    • T
      x86/topology: Define x86's arch_update_cpu_topology · 7d25127c
      Tim Chen 提交于
      The scheduler calls arch_update_cpu_topology() to check whether the
      scheduler domains have to be rebuilt.
      
      So far x86 has no requirement for this, but the upcoming ITMT support
      makes this necessary.
      
      Request the rebuild when the x86 internal update flag is set.
      Suggested-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: linux-pm@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: jolsa@redhat.com
      Cc: rjw@rjwysocki.net
      Cc: linux-acpi@vger.kernel.org
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: bp@suse.de
      Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7d25127c
    • R
      KVM: x86: check for pic and ioapic presence before use · df492896
      Radim Krčmář 提交于
      Split irqchip allows pic and ioapic routes to be used without them being
      created, which results in NULL access.  Check for NULL and avoid it.
      (The setup is too racy for a nicer solutions.)
      
      Found by syzkaller:
      
        general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 3 PID: 11923 Comm: kworker/3:2 Not tainted 4.9.0-rc5+ #27
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Workqueue: events irqfd_inject
        task: ffff88006a06c7c0 task.stack: ffff880068638000
        RIP: 0010:[...]  [...] __lock_acquire+0xb35/0x3380 kernel/locking/lockdep.c:3221
        RSP: 0000:ffff88006863ea20  EFLAGS: 00010006
        RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000
        RDX: 0000000000000039 RSI: 0000000000000000 RDI: 1ffff1000d0c7d9e
        RBP: ffff88006863ef58 R08: 0000000000000001 R09: 0000000000000000
        R10: 00000000000001c8 R11: 0000000000000000 R12: ffff88006a06c7c0
        R13: 0000000000000001 R14: ffffffff8baab1a0 R15: 0000000000000001
        FS:  0000000000000000(0000) GS:ffff88006d100000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004abdd0 CR3: 000000003e2f2000 CR4: 00000000000026e0
        Stack:
         ffffffff894d0098 1ffff1000d0c7d56 ffff88006863ecd0 dffffc0000000000
         ffff88006a06c7c0 0000000000000000 ffff88006863ecf8 0000000000000082
         0000000000000000 ffffffff815dd7c1 ffffffff00000000 ffffffff00000000
        Call Trace:
         [...] lock_acquire+0x2a2/0x790 kernel/locking/lockdep.c:3746
         [...] __raw_spin_lock include/linux/spinlock_api_smp.h:144
         [...] _raw_spin_lock+0x38/0x50 kernel/locking/spinlock.c:151
         [...] spin_lock include/linux/spinlock.h:302
         [...] kvm_ioapic_set_irq+0x4c/0x100 arch/x86/kvm/ioapic.c:379
         [...] kvm_set_ioapic_irq+0x8f/0xc0 arch/x86/kvm/irq_comm.c:52
         [...] kvm_set_irq+0x239/0x640 arch/x86/kvm/../../../virt/kvm/irqchip.c:101
         [...] irqfd_inject+0xb4/0x150 arch/x86/kvm/../../../virt/kvm/eventfd.c:60
         [...] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096
         [...] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230
         [...] kthread+0x328/0x3e0 kernel/kthread.c:209
         [...] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: 49df6397 ("KVM: x86: Split the APIC from the rest of IRQCHIP.")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      df492896
    • R
      KVM: x86: fix out-of-bounds accesses of rtc_eoi map · 81cdb259
      Radim Krčmář 提交于
      KVM was using arrays of size KVM_MAX_VCPUS with vcpu_id, but ID can be
      bigger that the maximal number of VCPUs, resulting in out-of-bounds
      access.
      
      Found by syzkaller:
      
        BUG: KASAN: slab-out-of-bounds in __apic_accept_irq+0xb33/0xb50 at addr [...]
        Write of size 1 by task a.out/27101
        CPU: 1 PID: 27101 Comm: a.out Not tainted 4.9.0-rc5+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __apic_accept_irq+0xb33/0xb50 arch/x86/kvm/lapic.c:905
         [...] kvm_apic_set_irq+0x10e/0x180 arch/x86/kvm/lapic.c:495
         [...] kvm_irq_delivery_to_apic+0x732/0xc10 arch/x86/kvm/irq_comm.c:86
         [...] ioapic_service+0x41d/0x760 arch/x86/kvm/ioapic.c:360
         [...] ioapic_set_irq+0x275/0x6c0 arch/x86/kvm/ioapic.c:222
         [...] kvm_ioapic_inject_all arch/x86/kvm/ioapic.c:235
         [...] kvm_set_ioapic+0x223/0x310 arch/x86/kvm/ioapic.c:670
         [...] kvm_vm_ioctl_set_irqchip arch/x86/kvm/x86.c:3668
         [...] kvm_arch_vm_ioctl+0x1a08/0x23c0 arch/x86/kvm/x86.c:3999
         [...] kvm_vm_ioctl+0x1fa/0x1a70 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3099
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: af1bae54 ("KVM: x86: bump KVM_MAX_VCPU_ID to 1023")
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      81cdb259
    • R
      KVM: x86: drop error recovery in em_jmp_far and em_ret_far · 2117d539
      Radim Krčmář 提交于
      em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
      bit mode, but syzkaller proved otherwise (and SDM agrees).
      Code segment was restored upon failure, but it was left uninitialized
      outside of long mode, which could lead to a leak of host kernel stack.
      We could have fixed that by always saving and restoring the CS, but we
      take a simpler approach and just break any guest that manages to fail
      as the error recovery is error-prone and modern CPUs don't need emulator
      for this.
      
      Found by syzkaller:
      
        WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __dump_stack lib/dump_stack.c:15
         [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
         [...] panic+0x1b7/0x3a3 kernel/panic.c:179
         [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
         [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
         [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
         [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
         [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
         [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
         [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
         [...] complete_emulated_io arch/x86/kvm/x86.c:6870
         [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
         [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
         [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
         [...] vfs_ioctl fs/ioctl.c:43
         [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
         [...] SYSC_ioctl fs/ioctl.c:694
         [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
         [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: d1442d85 ("KVM: x86: Handle errors when RIP is set during far jumps")
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      2117d539
    • R
      KVM: x86: fix out-of-bounds access in lapic · 444fdad8
      Radim Krčmář 提交于
      Cluster xAPIC delivery incorrectly assumed that dest_id <= 0xff.
      With enabled KVM_X2APIC_API_USE_32BIT_IDS in KVM_CAP_X2APIC_API, a
      userspace can send an interrupt with dest_id that results in
      out-of-bounds access.
      
      Found by syzkaller:
      
        BUG: KASAN: slab-out-of-bounds in kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 at addr ffff88003d9ca750
        Read of size 8 by task syz-executor/22923
        CPU: 0 PID: 22923 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __dump_stack lib/dump_stack.c:15
         [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
         [...] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
         [...] print_address_description mm/kasan/report.c:194
         [...] kasan_report_error mm/kasan/report.c:283
         [...] kasan_report+0x231/0x500 mm/kasan/report.c:303
         [...] __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:329
         [...] kvm_irq_delivery_to_apic_fast+0x11fa/0x1210 arch/x86/kvm/lapic.c:824
         [...] kvm_irq_delivery_to_apic+0x132/0x9a0 arch/x86/kvm/irq_comm.c:72
         [...] kvm_set_msi+0x111/0x160 arch/x86/kvm/irq_comm.c:157
         [...] kvm_send_userspace_msi+0x201/0x280 arch/x86/kvm/../../../virt/kvm/irqchip.c:74
         [...] kvm_vm_ioctl+0xba5/0x1670 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3015
         [...] vfs_ioctl fs/ioctl.c:43
         [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
         [...] SYSC_ioctl fs/ioctl.c:694
         [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
         [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: e45115b6 ("KVM: x86: use physical LAPIC array for logical x2APIC")
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      444fdad8
  11. 24 11月, 2016 1 次提交
    • D
      x86/apic/uv: Silence a shift wrapping warning · c4597fd7
      Dan Carpenter 提交于
      'm_io' is stored in 6 bits so it's a number in the 0-63 range.  Static
      analysis tools complain that 1 << 63 will wrap so I have changed it to
      1ULL << m_io.
      
      This code is over three years old so presumably the bug doesn't happen
      very frequently in real life or someone would have complained by now.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Nathan Zimmer <nzimmer@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-janitors@vger.kernel.org
      Fixes: b15cc4a1 ("x86, uv, uv3: Update x2apic Support for SGI UV3")
      Link: http://lkml.kernel.org/r/20161123221908.GA23997@mwandaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c4597fd7