1. 06 2月, 2018 3 次提交
    • M
      membarrier/x86: Provide core serializing command · 10bcc80e
      Mathieu Desnoyers 提交于
      There are two places where core serialization is needed by membarrier:
      
      1) When returning from the membarrier IPI,
      2) After scheduler updates curr to a thread with a different mm, before
         going back to user-space, since the curr->mm is used by membarrier to
         check whether it needs to send an IPI to that CPU.
      
      x86-32 uses IRET as return from interrupt, and both IRET and SYSEXIT to go
      back to user-space. The IRET instruction is core serializing, but not
      SYSEXIT.
      
      x86-64 uses IRET as return from interrupt, which takes care of the IPI.
      However, it can return to user-space through either SYSRETL (compat
      code), SYSRETQ, or IRET. Given that SYSRET{L,Q} is not core serializing,
      we rely instead on write_cr3() performed by switch_mm() to provide core
      serialization after changing the current mm, and deal with the special
      case of kthread -> uthread (temporarily keeping current mm into
      active_mm) by adding a sync_core() in that specific case.
      
      Use the new sync_core_before_usermode() to guarantee this.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-10-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      10bcc80e
    • M
      lockin/x86: Implement sync_core_before_usermode() · ac1ab12a
      Mathieu Desnoyers 提交于
      Ensure that a core serializing instruction is issued before returning to
      user-mode. x86 implements return to user-space through sysexit, sysrel,
      and sysretq, which are not core serializing.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-8-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ac1ab12a
    • M
      membarrier: Document scheduler barrier requirements · 306e0604
      Mathieu Desnoyers 提交于
      Document the membarrier requirement on having a full memory barrier in
      __schedule() after coming from user-space, before storing to rq->curr.
      It is provided by smp_mb__after_spinlock() in __schedule().
      
      Document that membarrier requires a full barrier on transition from
      kernel thread to userspace thread. We currently have an implicit barrier
      from atomic_dec_and_test() in mmdrop() that ensures this.
      
      The x86 switch_mm_irqs_off() full barrier is currently provided by many
      cpumask update operations as well as write_cr3(). Document that
      write_cr3() provides this barrier.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: David Sehr <sehr@google.com>
      Cc: Greg Hackmann <ghackmann@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180129202020.8515-4-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      306e0604
  2. 28 1月, 2018 4 次提交
  3. 27 1月, 2018 1 次提交
  4. 26 1月, 2018 13 次提交
  5. 25 1月, 2018 5 次提交
    • P
      perf/x86: Fix perf,x86,cpuhp deadlock · efe951d3
      Peter Zijlstra 提交于
      More lockdep gifts, a 5-way lockup race:
      
      	perf_event_create_kernel_counter()
      	  perf_event_alloc()
      	    perf_try_init_event()
      	      x86_pmu_event_init()
      		__x86_pmu_event_init()
      		  x86_reserve_hardware()
       #0		    mutex_lock(&pmc_reserve_mutex);
      		    reserve_ds_buffer()
       #1		      get_online_cpus()
      
      	perf_event_release_kernel()
      	  _free_event()
      	    hw_perf_event_destroy()
      	      x86_release_hardware()
       #0		mutex_lock(&pmc_reserve_mutex)
      		release_ds_buffer()
       #1		  get_online_cpus()
      
       #1	do_cpu_up()
      	  perf_event_init_cpu()
       #2	    mutex_lock(&pmus_lock)
       #3	    mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #3	    mutex_lock(ctx->mutex)
       #4	    mutex_lock_nested(ctx->mutex, 1);
      
      	perf_try_init_event()
       #4	  mutex_lock_nested(ctx->mutex, 1)
      	  x86_pmu_event_init()
      	    intel_pmu_hw_config()
      	      x86_add_exclusive()
       #0		mutex_lock(&pmc_reserve_mutex)
      
      Fix it by using ordering constructs instead of locking.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      efe951d3
    • P
      KVM: VMX: Make indirect call speculation safe · c940a3fb
      Peter Zijlstra 提交于
      Replace indirect call with CALL_NOSPEC.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rga@amazon.de
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180125095843.645776917@infradead.org
      c940a3fb
    • P
      KVM: x86: Make indirect calls in emulator speculation safe · 1a29b5b7
      Peter Zijlstra 提交于
      Replace the indirect calls with CALL_NOSPEC.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rga@amazon.de
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180125095843.595615683@infradead.org
      1a29b5b7
    • C
      x86: Remove unused IOMMU_STRESS Kconfig · 782bf20c
      Corentin Labbe 提交于
      Last use of IOMMU_STRESS was removed in commit 29b68415 ("x86:
      amd_iommu: move to drivers/iommu/"). 6 years later the Kconfig entry is
      definitely due for removal.
      Signed-off-by: NCorentin Labbe <clabbe@baylibre.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJoerg Roedel <jroedel@suse.de>
      Link: https://lkml.kernel.org/r/1516825754-28415-1-git-send-email-clabbe@baylibre.com
      782bf20c
    • V
      x86/hyperv: Stop suppressing X86_FEATURE_PCID · 617ab45c
      Vitaly Kuznetsov 提交于
      When hypercall-based TLB flush was enabled for Hyper-V guests PCID feature
      was deliberately suppressed as a precaution: back then PCID was never
      exposed to Hyper-V guests and it wasn't clear what will happen if some day
      it becomes available. The day came and PCID/INVPCID features are already
      exposed on certain Hyper-V hosts.
      
      From TLFS (as of 5.0b) it is unclear how TLB flush hypercalls combine with
      PCID. In particular the usage of PCID is per-cpu based: the same mm gets
      different CR3 values on different CPUs. If the hypercall does exact
      matching this will fail. However, this is not the case. David Zhang
      explains:
      
       "In practice, the AddressSpace argument is ignored on any VM that supports
        PCIDs.
      
        Architecturally, the AddressSpace argument must match the CR3 with PCID
        bits stripped out (i.e., the low 12 bits of AddressSpace should be 0 in
        long mode). The flush hypercalls flush all PCIDs for the specified
        AddressSpace."
      
      With this, PCID can be enabled.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Zhang <dazhan@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Aditya Bhandari <adityabh@microsoft.com>
      Link: https://lkml.kernel.org/r/20180124103629.29980-1-vkuznets@redhat.com
      617ab45c
  6. 24 1月, 2018 7 次提交
    • D
      x86/centaur: Mark TSC invariant · fe6daab1
      davidwang 提交于
      Centaur CPU has a constant frequency TSC and that TSC does not stop in
      C-States. But because the corresponding TSC feature flags are not set for
      that CPU, the TSC is treated as not constant frequency and assumed to stop
      in C-States, which makes it an unreliable and unusable clock source.
      
      Setting those flags tells the kernel that the TSC is usable, so it will
      select it over HPET.  The effect of this is that reading time stamps (from
      kernel or user space) will be faster and more efficent.
      Signed-off-by: Ndavidwang <davidwang@zhaoxin.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: qiyuanwang@zhaoxin.com
      Cc: linux-pm@vger.kernel.org
      Cc: brucechang@via-alliance.com
      Cc: cooperyan@zhaoxin.com
      Cc: benjaminpan@viatech.com
      Link: https://lkml.kernel.org/r/1516616057-5158-1-git-send-email-davidwang@zhaoxin.com
      fe6daab1
    • B
      x86/microcode: Fix again accessing initrd after having been freed · 1d080f09
      Borislav Petkov 提交于
      Commit 24c25032 ("x86/microcode: Do not access the initrd after it has
      been freed") fixed attempts to access initrd from the microcode loader
      after it has been freed. However, a similar KASAN warning was reported
      (stack trace edited):
      
        smpboot: Booting Node 0 Processor 1 APIC 0x11
        ==================================================================
        BUG: KASAN: use-after-free in find_cpio_data+0x9b5/0xa50
        Read of size 1 at addr ffff880035ffd000 by task swapper/1/0
      
        CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.8-slack #7
        Hardware name: System manufacturer System Product Name/A88X-PLUS, BIOS 3003 03/10/2016
        Call Trace:
         dump_stack
         print_address_description
         kasan_report
         ? find_cpio_data
         __asan_report_load1_noabort
         find_cpio_data
         find_microcode_in_initrd
         __load_ucode_amd
         load_ucode_amd_ap
            load_ucode_ap
      
      After some investigation, it turned out that a merge was done using the
      wrong side to resolve, leading to picking up the previous state, before
      the 24c25032 fix. Therefore the Fixes tag below contains a merge
      commit.
      
      Revert the mismerge by catching the save_microcode_in_initrd_amd()
      retval and thus letting the function exit with the last return statement
      so that initrd_gone can be set to true.
      
      Fixes: f26483ea ("Merge branch 'x86/urgent' into x86/microcode, to resolve conflicts")
      Reported-by: <higuita@gmx.net>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=198295
      Link: https://lkml.kernel.org/r/20180123104133.918-2-bp@alien8.de
      1d080f09
    • jia zhang's avatar
      x86/microcode/intel: Extend BDW late-loading further with LLC size check · 7e702d17
      jia zhang 提交于
      Commit b94b7373 ("x86/microcode/intel: Extend BDW late-loading with a
      revision check") reduced the impact of erratum BDF90 for Broadwell model
      79.
      
      The impact can be reduced further by checking the size of the last level
      cache portion per core.
      
      Tony: "The erratum says the problem only occurs on the large-cache SKUs.
      So we only need to avoid the update if we are on a big cache SKU that is
      also running old microcode."
      
      For more details, see erratum BDF90 in document #334165 (Intel Xeon
      Processor E7-8800/4800 v4 Product Family Specification Update) from
      September 2017.
      
      Fixes: b94b7373 ("x86/microcode/intel: Extend BDW late-loading with a revision check")
      Signed-off-by: jia zhang's avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1516321542-31161-1-git-send-email-zhang.jia@linux.alibaba.com
      7e702d17
    • X
      perf/x86/amd/power: Do not load AMD power module on !AMD platforms · 40d4071c
      Xiao Liang 提交于
      The AMD power module can be loaded on non AMD platforms, but unload fails
      with the following Oops:
      
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: __list_del_entry_valid+0x29/0x90
       Call Trace:
        perf_pmu_unregister+0x25/0xf0
        amd_power_pmu_exit+0x1c/0xd23 [power]
        SyS_delete_module+0x1a8/0x2b0
        ? exit_to_usermode_loop+0x8f/0xb0
        entry_SYSCALL_64_fastpath+0x20/0x83
      
      Return -ENODEV instead of 0 from the module init function if the CPU does
      not match.
      
      Fixes: c7ab62bf ("perf/x86/amd/power: Add AMD accumulated power reporting mechanism")
      Signed-off-by: NXiao Liang <xiliang@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180122061252.6394-1-xiliang@redhat.com
      40d4071c
    • W
      x86/retpoline: Remove the esp/rsp thunk · 1df37383
      Waiman Long 提交于
      It doesn't make sense to have an indirect call thunk with esp/rsp as
      retpoline code won't work correctly with the stack pointer register.
      Removing it will help compiler writers to catch error in case such
      a thunk call is emitted incorrectly.
      
      Fixes: 76b04384 ("x86/retpoline: Add initial retpoline support")
      Suggested-by: NJeff Law <law@redhat.com>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1516658974-27852-1-git-send-email-longman@redhat.com
      1df37383
    • S
      ftrace, orc, x86: Handle ftrace dynamically allocated trampolines · 6be7fa3c
      Steven Rostedt (VMware) 提交于
      The function tracer can create a dynamically allocated trampoline that is
      called by the function mcount or fentry hook that is used to call the
      function callback that is registered. The problem is that the orc undwinder
      will bail if it encounters one of these trampolines. This breaks the stack
      trace of function callbacks, which include the stack tracer and setting the
      stack trace for individual functions.
      
      Since these dynamic trampolines are basically copies of the static ftrace
      trampolines defined in ftrace_*.S, we do not need to create new orc entries
      for the dynamic trampolines. Finding the return address on the stack will be
      identical as the functions that were copied to create the dynamic
      trampolines. When encountering a ftrace dynamic trampoline, we can just use
      the orc entry of the ftrace static function that was copied for that
      trampoline.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      6be7fa3c
    • J
      x86/ftrace: Fix ORC unwinding from ftrace handlers · e2ac83d7
      Josh Poimboeuf 提交于
      Steven Rostedt discovered that the ftrace stack tracer is broken when
      it's used with the ORC unwinder.  The problem is that objtool is
      instructed by the Makefile to ignore the ftrace_64.S code, so it doesn't
      generate any ORC data for it.
      
      Fix it by making the asm code objtool-friendly:
      
      - Objtool doesn't like the fact that save_mcount_regs pushes RBP at the
        beginning, but it's never restored (directly, at least).  So just skip
        the original RBP push, which is only needed for frame pointers anyway.
      
      - Annotate some functions as normal callable functions with
        ENTRY/ENDPROC.
      
      - Add an empty unwind hint to return_to_handler().  The return address
        isn't on the stack, so there's nothing ORC can do there.  It will just
        punt in the unlikely case it tries to unwind from that code.
      
      With all that fixed, remove the OBJECT_FILES_NON_STANDARD Makefile
      annotation so objtool can read the file.
      
      Link: http://lkml.kernel.org/r/20180123040746.ih4ep3tk4pbjvg7c@trebleReported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      e2ac83d7
  7. 21 1月, 2018 1 次提交
  8. 20 1月, 2018 2 次提交
  9. 19 1月, 2018 4 次提交