1. 19 12月, 2016 4 次提交
  2. 18 12月, 2016 3 次提交
    • T
      x86/tsc: Limit the adjust value further · 8c9b9d87
      Thomas Gleixner 提交于
      Adjust value 0x80000000 and other values larger than that render the TSC
      deadline timer disfunctional.
      
      We have not yet any information about this from Intel, but experimentation
      clearly proves that this is a 32/64 bit and sign extension issue.
      
      If adjust values larger than that are actually required, which might be the
      case for physical CPU hotplug, then we need to disable the deadline timer
      on the affected package/CPUs and use the local APIC timer instead.
      
      That requires some surgery in the APIC setup code, so we just limit the
      ADJUST register value into the known to work range for now and revisit this
      when Intel comes forth with proper information.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Roland Scheidegger <rscheidegger_lists@hispeed.ch>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      8c9b9d87
    • T
      x86/tsc: Annotate printouts as firmware bug · 16588f65
      Thomas Gleixner 提交于
      Make it more obvious that the BIOS is screwed up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Roland Scheidegger <rscheidegger_lists@hispeed.ch>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Kevin Stanton <kevin.b.stanton@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      16588f65
    • K
      x86/floppy: Use designated initializers · ffc7dc8d
      Kees Cook 提交于
      Prepare to mark sensitive kernel structures for randomization by making
      sure they're using designated initializers. These were identified during
      allyesconfig builds of x86, arm, and arm64, with most initializer fixes
      extracted from grsecurity.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161217213705.GA1248@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ffc7dc8d
  3. 17 12月, 2016 2 次提交
  4. 15 12月, 2016 9 次提交
  5. 14 12月, 2016 4 次提交
  6. 13 12月, 2016 3 次提交
    • T
      x86/smpboot: Make logical package management more robust · 9d85eb91
      Thomas Gleixner 提交于
      The logical package management has several issues:
      
       - The APIC ids provided by ACPI are not required to be the same as the
         initial APIC id which can be retrieved by CPUID. The APIC ids provided
         by ACPI are those which are written by the BIOS into the APIC. The
         initial id is set by hardware and can not be changed. The hardware
         provided ids contain the real hardware package information.
      
         Especially AMD sets the effective APIC id different from the hardware id
         as they need to reserve space for the IOAPIC ids starting at id 0.
      
         As a consequence those machines trigger the currently active firmware
         bug printouts in dmesg, These are obviously wrong.
      
       - Virtual machines have their own interesting of enumerating APICs and
         packages which are not reliably covered by the current implementation.
      
      The sizing of the mapping array has been tweaked to be generously large to
      handle systems which provide a wrong core count when HT is disabled so the
      whole magic which checks for space in the physical hotplug case is not
      needed anymore.
      
      Simplify the whole machinery and do the mapping when the CPU starts and the
      CPUID derived physical package information is available. This solves the
      observed problems on AMD machines and works for the virtualization issues
      as well.
      
      Remove the extra call from XEN cpu bringup code as it is not longer
      required.
      
      Fixes: d49597fd ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: M. Vefa Bicakci <m.v.b@runbox.com>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Cc: Charles (Chas) Williams <ciwillia@brocade.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9d85eb91
    • A
      x86/ldt: use vfree_atomic() to free ldt entries · 8d5341a6
      Andrey Ryabinin 提交于
      vfree() is going to use sleeping lock.  free_ldt_struct() may be called
      with disabled preemption, therefore we must use vfree_atomic() here.
      
      E.g. call trace:
      	vfree()
      	free_ldt_struct()
      	destroy_context_ldt()
      	__mmdrop()
      	finish_task_switch()
      	schedule_tail()
      	ret_from_fork()
      
      Link: http://lkml.kernel.org/r/1479474236-4139-7-git-send-email-hch@lst.deSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Jisheng Zhang <jszhang@marvell.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: John Dias <joaodias@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d5341a6
    • R
      mm: remove x86-only restriction of movable_node · 39fa104d
      Reza Arbab 提交于
      In commit c5320926 ("mem-hotplug: introduce movable_node boot
      option"), the memblock allocation direction is changed to bottom-up and
      then back to top-down like this:
      
      1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
      2. memblock_set_bottom_up(false), called by x86's numa_init().
      
      Even though (1) occurs in generic mm code, it is wrapped by #ifdef
      CONFIG_MOVABLE_NODE, which depends on X86_64.
      
      This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
      things will be unbalanced.  (1) will happen for them, but (2) will not.
      
      This toggle was added in the first place because x86 has a delay between
      adding memblocks and marking them as hotpluggable.  Since other arches
      do this marking either immediately or not at all, they do not require
      the bottom-up toggle.
      
      So, resolve things by moving (1) from cmdline_parse_movable_node() to
      x86's setup_arch(), immediately after the movable_node parameter has
      been parsed.
      
      Link: http://lkml.kernel.org/r/1479160961-25840-3-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alistair Popple <apopple@au1.ibm.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39fa104d
  7. 11 12月, 2016 3 次提交
    • P
      x86/paravirt: Fix bool return type for PVOP_CALL() · 11f254db
      Peter Zijlstra 提交于
      Commit:
      
        3cded417 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
      
      introduced a paravirt op with bool return type [*]
      
      It turns out that the PVOP_CALL*() macros miscompile when rettype is
      bool. Code that looked like:
      
         83 ef 01                sub    $0x1,%edi
         ff 15 32 a0 d8 00       callq  *0xd8a032(%rip)        # ffffffff81e28120 <pv_lock_ops+0x20>
         84 c0                   test   %al,%al
      
      ended up looking like so after PVOP_CALL1() was applied:
      
         83 ef 01                sub    $0x1,%edi
         48 63 ff                movslq %edi,%rdi
         ff 14 25 20 81 e2 81    callq  *0xffffffff81e28120
         48 85 c0                test   %rax,%rax
      
      Note how it tests the whole of %rax, even though a typical bool return
      function only sets %al, like:
      
        0f 95 c0                setne  %al
        c3                      retq
      
      This is because ____PVOP_CALL() does:
      
      		__ret = (rettype)__eax;
      
      and while regular integer type casts truncate the result, a cast to
      bool tests for any !0 value. Fix this by explicitly truncating to
      sizeof(rettype) before casting.
      
      [*] The actual bug should've been exposed in commit:
            446f3dc8 ("locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests")
          but that didn't properly implement the paravirt call.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 3cded417 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
      Link: http://lkml.kernel.org/r/20161208154349.346057680@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      11f254db
    • P
      x86/paravirt: Fix native_patch() · 45dbea5f
      Peter Zijlstra 提交于
      While chasing a regression I noticed we potentially patch the wrong
      code in native_patch().
      
      If we do not select the native code sequence, we must use the default
      patcher, not fall-through the switch case.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel test robot <xiaolong.ye@intel.com>
      Fixes: 3cded417 ("x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()")
      Link: http://lkml.kernel.org/r/20161208154349.270616999@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45dbea5f
    • A
      perf/x86: Fix exclusion of BTS and LBR for Goldmont · b0c1ef52
      Andi Kleen 提交于
      An earlier patch allowed enabling PT and LBR at the same
      time on Goldmont. However it also allowed enabling BTS and LBR
      at the same time, which is still not supported. Fix this by
      bypassing the check only for PT.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: alexander.shishkin@intel.com
      Cc: kan.liang@intel.com
      Cc: <stable@vger.kernel.org>
      Fixes: ccbebba4 ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it")
      Link: http://lkml.kernel.org/r/20161209001417.4713-1-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b0c1ef52
  8. 10 12月, 2016 7 次提交
    • T
      x86/ldt: Make all size computations unsigned · 990e9dc3
      Thomas Gleixner 提交于
      ldt->size can never be negative. The helper functions take 'unsigned int'
      arguments which are assigned from ldt->size. The related user space
      user_desc struct member entry_number is unsigned as well.
      
      But ldt->size itself and a few local variables which are related to
      ldt->size are type 'int' which makes no sense whatsoever and results in
      typecasts which make the eyes bleed.
      
      Clean it up and convert everything which is related to ldt->size to
      unsigned it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      990e9dc3
    • D
      x86/ldt: Make a size argument unsigned · 296dc580
      Dan Carpenter 提交于
      My static checker complains that we put an upper bound on the "size"
      argument but not a lower bound.  The checker is not smart enough to know
      the possible ranges of "old_mm->context.ldt->size" from
      init_new_context_ldt() so it thinks maybe it could be negative.
      
      Let's make it unsigned to silence the warning and future proof the code
      a bit.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: kernel-janitors@vger.kernel.org
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161208105602.GA11382@elgon.mountainSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      296dc580
    • T
      x86: Remove empty idle.h header · 34bc3560
      Thomas Gleixner 提交于
      One include less is always a good thing(tm). Good riddance.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      34bc3560
    • B
      x86/amd: Simplify AMD E400 aware idle routine · 07c94a38
      Borislav Petkov 提交于
      Reorganize the E400 detection now that we have everything in place:
      switch the CPUs to broadcast mode after the LAPIC has been initialized
      and remove the facilities that were used previously on the idle path.
      
      Unfortunately static_cpu_has_bug() cannpt be used in the E400 idle routine
      because alternatives have been applied when the actual detection happens,
      so the static switching does not take effect and the test will stay
      false. Use boot_cpu_has_bug() instead which is definitely an improvement
      over the RDMSR and the cpumask handling.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-5-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      07c94a38
    • T
      x86/amd: Check for the C1E bug post ACPI subsystem init · e7ff3a47
      Thomas Gleixner 提交于
      AMD CPUs affected by the E400 erratum suffer from the issue that the
      local APIC timer stops when the CPU goes into C1E. Unfortunately there
      is no way to detect the affected CPUs on early boot. It's only possible
      to determine the range of possibly affected CPUs from the family/model
      range.
      
      The actual decision whether to enter C1E and thus cause the bug is done
      by the firmware and we need to detect that case late, after ACPI has
      been initialized.
      
      The current solution is to check in the idle routine whether the CPU is
      affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the
      K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is
      affected and the system is switched into forced broadcast mode.
      
      This is ineffective and on non-affected CPUs every entry to idle does
      the extra RDMSR.
      
      After doing some research it turns out that the bits are visible on the
      boot CPU right after the ACPI subsystem is initialized in the early
      boot process. So instead of polling for the bits in the idle loop, add
      a detection function after acpi_subsystem_init() and check for the MSR
      bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and
      the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it
      will stop in C1E state as well.
      
      The switch to broadcast mode cannot be done at this point because the
      boot CPU still uses HPET as a clockevent device and the local APIC timer
      is not yet calibrated and installed. The switch to broadcast mode on the
      affected CPUs needs to be done when the local APIC timer is actually set
      up.
      
      This allows to cleanup the amd_e400_idle() function in the next step.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e7ff3a47
    • T
      x86/bugs: Separate AMD E400 erratum and C1E bug · 3344ed30
      Thomas Gleixner 提交于
      The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E
      state) is a two step process:
      
       - Selection of the E400 aware idle routine
      
       - Detection whether the platform is affected
      
      The idle routine selection happens for possibly affected CPUs depending on
      family/model/stepping information. These range of CPUs is not necessarily
      affected as the decision whether to enable the C1E feature is made by the
      firmware. Unfortunately there is no way to query this at early boot.
      
      The current implementation polls a MSR in the E400 aware idle routine to
      detect whether the CPU is affected. This is inefficient on non affected
      CPUs because every idle entry has to do the MSR read.
      
      There is a better way to detect this before going idle for the first time
      which requires to seperate the bug flags:
      
        X86_BUG_AMD_E400 	- Selects the E400 aware idle routine and
        			  enables the detection
      			  
        X86_BUG_AMD_APIC_C1E  - Set when the platform is affected by E400
      
      Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400
      bug bit to select the idle routine which currently does an unconditional
      detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches
      to remove the MSR polling and simplify the handling of this misfeature.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3344ed30
    • B
      x86/cpufeature: Provide helper to set bugs bits · a588b983
      Borislav Petkov 提交于
      Will be used in a later patch to set bug bits for bugs which need late
      detection.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-2-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a588b983
  9. 09 12月, 2016 5 次提交