1. 08 8月, 2018 1 次提交
    • P
      x86/paravirt: Fix spectre-v2 mitigations for paravirt guests · 5800dc5c
      Peter Zijlstra 提交于
      Nadav reported that on guests we're failing to rewrite the indirect
      calls to CALLEE_SAVE paravirt functions. In particular the
      pv_queued_spin_unlock() call is left unpatched and that is all over the
      place. This obviously wrecks Spectre-v2 mitigation (for paravirt
      guests) which relies on not actually having indirect calls around.
      
      The reason is an incorrect clobber test in paravirt_patch_call(); this
      function rewrites an indirect call with a direct call to the _SAME_
      function, there is no possible way the clobbers can be different
      because of this.
      
      Therefore remove this clobber check. Also put WARNs on the other patch
      failure case (not enough room for the instruction) which I've not seen
      trigger in my (limited) testing.
      
      Three live kernel image disassemblies for lock_sock_nested (as a small
      function that illustrates the problem nicely). PRE is the current
      situation for guests, POST is with this patch applied and NATIVE is with
      or without the patch for !guests.
      
      PRE:
      
      (gdb) disassemble lock_sock_nested
      Dump of assembler code for function lock_sock_nested:
         0xffffffff817be970 <+0>:     push   %rbp
         0xffffffff817be971 <+1>:     mov    %rdi,%rbp
         0xffffffff817be974 <+4>:     push   %rbx
         0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
         0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
         0xffffffff817be981 <+17>:    mov    %rbx,%rdi
         0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
         0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
         0xffffffff817be98f <+31>:    test   %eax,%eax
         0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
         0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
         0xffffffff817be99d <+45>:    mov    %rbx,%rdi
         0xffffffff817be9a0 <+48>:    callq  *0xffffffff822299e8
         0xffffffff817be9a7 <+55>:    pop    %rbx
         0xffffffff817be9a8 <+56>:    pop    %rbp
         0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
         0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
         0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
         0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
         0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
         0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
      End of assembler dump.
      
      POST:
      
      (gdb) disassemble lock_sock_nested
      Dump of assembler code for function lock_sock_nested:
         0xffffffff817be970 <+0>:     push   %rbp
         0xffffffff817be971 <+1>:     mov    %rdi,%rbp
         0xffffffff817be974 <+4>:     push   %rbx
         0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
         0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
         0xffffffff817be981 <+17>:    mov    %rbx,%rdi
         0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
         0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
         0xffffffff817be98f <+31>:    test   %eax,%eax
         0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
         0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
         0xffffffff817be99d <+45>:    mov    %rbx,%rdi
         0xffffffff817be9a0 <+48>:    callq  0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
         0xffffffff817be9a5 <+53>:    xchg   %ax,%ax
         0xffffffff817be9a7 <+55>:    pop    %rbx
         0xffffffff817be9a8 <+56>:    pop    %rbp
         0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
         0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
         0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063aa0 <__local_bh_enable_ip>
         0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
         0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
         0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
      End of assembler dump.
      
      NATIVE:
      
      (gdb) disassemble lock_sock_nested
      Dump of assembler code for function lock_sock_nested:
         0xffffffff817be970 <+0>:     push   %rbp
         0xffffffff817be971 <+1>:     mov    %rdi,%rbp
         0xffffffff817be974 <+4>:     push   %rbx
         0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
         0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
         0xffffffff817be981 <+17>:    mov    %rbx,%rdi
         0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
         0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
         0xffffffff817be98f <+31>:    test   %eax,%eax
         0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
         0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
         0xffffffff817be99d <+45>:    mov    %rbx,%rdi
         0xffffffff817be9a0 <+48>:    movb   $0x0,(%rdi)
         0xffffffff817be9a3 <+51>:    nopl   0x0(%rax)
         0xffffffff817be9a7 <+55>:    pop    %rbx
         0xffffffff817be9a8 <+56>:    pop    %rbp
         0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
         0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
         0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
         0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
         0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
         0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
      End of assembler dump.
      
      
      Fixes: 63f70270 ("[PATCH] i386: PARAVIRT: add common patching machinery")
      Fixes: 3010a066 ("x86/paravirt, objtool: Annotate indirect calls")
      Reported-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: stable@vger.kernel.org
      5800dc5c
  2. 03 8月, 2018 3 次提交
    • T
      x86/intel_rdt: Disable PMU access · 4a7a54a5
      Thomas Gleixner 提交于
      Peter is objecting to the direct PMU access in RDT. Right now the PMU usage
      is broken anyway as it is not coordinated with perf.
      
      Until this discussion settled, disable the PMU mechanics by simply
      rejecting the type '2' measurement in the resctrl file.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Reinette Chatre <reinette.chatre@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: fenghua.yu@intel.com
      Cc: tony.luck@intel.com
      Cc: vikas.shivappa@linux.intel.com
      CC: gavin.hindman@intel.com
      Cc: jithu.joseph@intel.com
      Cc: hpa@zytor.com
      4a7a54a5
    • S
      x86/speculation: Support Enhanced IBRS on future CPUs · 706d5168
      Sai Praneeth 提交于
      Future Intel processors will support "Enhanced IBRS" which is an "always
      on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
      disabled.
      
      From the specification [1]:
      
       "With enhanced IBRS, the predicted targets of indirect branches
        executed cannot be controlled by software that was executed in a less
        privileged predictor mode or on another logical processor. As a
        result, software operating on a processor with enhanced IBRS need not
        use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
        privileged predictor mode. Software can isolate predictor modes
        effectively simply by setting the bit once. Software need not disable
        enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
      
      If Enhanced IBRS is supported by the processor then use it as the
      preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
      Retpoline white paper [2] states:
      
       "Retpoline is known to be an effective branch target injection (Spectre
        variant 2) mitigation on Intel processors belonging to family 6
        (enumerated by the CPUID instruction) that do not have support for
        enhanced IBRS. On processors that support enhanced IBRS, it should be
        used for mitigation instead of retpoline."
      
      The reason why Enhanced IBRS is the recommended mitigation on processors
      which support it is that these processors also support CET which
      provides a defense against ROP attacks. Retpoline is very similar to ROP
      techniques and might trigger false positives in the CET defense.
      
      If Enhanced IBRS is selected as the mitigation technique for spectre v2,
      the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
      cleared. Kernel also has to make sure that IBRS bit remains set after
      VMEXIT because the guest might have cleared the bit. This is already
      covered by the existing x86_spec_ctrl_set_guest() and
      x86_spec_ctrl_restore_host() speculation control functions.
      
      Enhanced IBRS still requires IBPB for full mitigation.
      
      [1] Speculative-Execution-Side-Channel-Mitigations.pdf
      [2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
      Both documents are available at:
      https://bugzilla.kernel.org/show_bug.cgi?id=199511Originally-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tim C Chen <tim.c.chen@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
      706d5168
    • P
      x86/cpufeatures: Add EPT_AD feature bit · 301d328a
      Peter Feiner 提交于
      Some Intel processors have an EPT feature whereby the accessed & dirty bits
      in EPT entries can be updated by HW. MSR IA32_VMX_EPT_VPID_CAP exposes the
      presence of this capability.
      
      There is no point in trying to use that new feature bit in the VMX code as
      VMX needs to read the MSR anyway to access other bits, but having the
      feature bit for EPT_AD in place helps virtualization management as it
      exposes "ept_ad" in /proc/cpuinfo/$proc/flags if the feature is present.
      
      [ tglx: Amended changelog ]
      Signed-off-by: NPeter Feiner <pfeiner@google.com>
      Signed-off-by: NPeter Shier <pshier@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Link: https://lkml.kernel.org/r/20180801180657.138051-1-pshier@google.com
      301d328a
  3. 02 8月, 2018 1 次提交
  4. 31 7月, 2018 5 次提交
  5. 30 7月, 2018 1 次提交
    • J
      x86/kexec: Allocate 8k PGDs for PTI · ca38dc8f
      Joerg Roedel 提交于
      Fuzzing the PTI-x86-32 code with trinity showed unhandled
      kernel paging request oops-messages that looked a lot like
      silent data corruption.
      
      Lot's of debugging and testing lead to the kexec-32bit code,
      which is still allocating 4k PGDs when PTI is enabled. But
      since it uses native_set_pud() to build the page-table, it
      will unevitably call into __pti_set_user_pgtbl(), which
      writes beyond the allocated 4k page.
      
      Use PGD_ALLOCATION_ORDER to allocate PGDs in the kexec code
      to fix the issue.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NDavid H. Gutteridge <dhgutteridge@sympatico.ca>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1532533683-5988-4-git-send-email-joro@8bytes.org
      ca38dc8f
  6. 24 7月, 2018 2 次提交
  7. 20 7月, 2018 26 次提交
    • J
      x86/ldt: Enable LDT user-mapping for PAE · 6df934b9
      Joerg Roedel 提交于
      This adds the needed special case for PAE to get the LDT mapped into the
      user page-table when PTI is enabled. The big difference to the other paging
      modes is that on PAE there is no full top-level PGD entry available for the
      LDT, but only a PMD entry.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-37-git-send-email-joro@8bytes.org
      6df934b9
    • J
      x86/ldt: Split out sanity check in map_ldt_struct() · 9bae3197
      Joerg Roedel 提交于
      This splits out the mapping sanity check and the actual mapping of the LDT
      to user-space from the map_ldt_struct() function in a way so that it is
      re-usable for PAE paging.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-36-git-send-email-joro@8bytes.org
      9bae3197
    • J
      x86/ldt: Define LDT_END_ADDR · 8195d869
      Joerg Roedel 提交于
      It marks the end of the address-space range reserved for the LDT. The
      LDT-code will use it when unmapping the LDT for user-space.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-35-git-send-email-joro@8bytes.org
      8195d869
    • J
      x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit · 39d668e0
      Joerg Roedel 提交于
      The pti_clone_kernel_text() function references __end_rodata_hpage_align,
      which is only present on x86-64.  This makes sense as the end of the rodata
      section is not huge-page aligned on 32 bit.
      
      Nevertheless a symbol is required for the function that points at the right
      address for both 32 and 64 bit. Introduce __end_rodata_aligned for that
      purpose and use it in pti_clone_kernel_text().
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-28-git-send-email-joro@8bytes.org
      39d668e0
    • J
      x86/pgtable/32: Allocate 8k page-tables when PTI is enabled · e3238faf
      Joerg Roedel 提交于
      Allocate a kernel and a user page-table root when PTI is enabled. Also
      allocate a full page per root for PAE because otherwise the bit to flip in
      CR3 to switch between them would be non-constant, which creates a lot of
      hassle.  Keep that for a later optimization.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-18-git-send-email-joro@8bytes.org
      e3238faf
    • J
      x86/entry: Rename update_sp0 to update_task_stack · 252e1a05
      Joerg Roedel 提交于
      The function does not update sp0 anymore but updates makes the task-stack
      visible for entry code. This is by either writing it to sp1 or by doing a
      hypercall. Rename the function to get rid of the misleading name.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-15-git-send-email-joro@8bytes.org
      252e1a05
    • J
      x86/entry/32: Enter the kernel via trampoline stack · 45d7b255
      Joerg Roedel 提交于
      Use the entry-stack as a trampoline to enter the kernel. The entry-stack is
      already in the cpu_entry_area and will be mapped to userspace when PTI is
      enabled.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-8-git-send-email-joro@8bytes.org
      45d7b255
    • J
      x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler · a6b744f3
      Joerg Roedel 提交于
      x86_tss.sp0 will be used to point to the entry stack later to use it as a
      trampoline stack for other kernel entry points besides SYSENTER.
      
      So store the real task stack pointer in x86_tss.sp1, which is otherwise
      unused by the hardware, as Linux doesn't make use of Ring 1.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-4-git-send-email-joro@8bytes.org
      a6b744f3
    • J
      x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack · ae2e565b
      Joerg Roedel 提交于
      The stack address doesn't need to be stored in tss.sp0 if the stack is
      switched manually like on sysenter. Rename the offset so that it still
      makes sense when its location is changed in later patches.
      
      This stackk will also be used for all kernel-entry points, not just
      sysenter. Reflect that and the fact that it is the offset to the task-stack
      location in the name as well.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-3-git-send-email-joro@8bytes.org
      ae2e565b
    • J
      x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c · 9e97b73f
      Joerg Roedel 提交于
      These offsets will be used in 32 bit assembly code as well, so make them
      available for all of x86 code.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPavel Machek <pavel@ucw.cz>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1531906876-13451-2-git-send-email-joro@8bytes.org
      9e97b73f
    • P
      x86/tsc: Make use of tsc_calibrate_cpu_early() · 8dbe4385
      Pavel Tatashin 提交于
      During early boot enable tsc_calibrate_cpu_early() and switch to
      tsc_calibrate_cpu() only later. Do this unconditionally, because it is
      unknown what methods other cpus will use to calibrate once they are
      onlined.
      
      If by the time tsc_init() is called tsc frequency is still unknown do only
      pit_hpet_ptimer_calibrate_cpu() to calibrate, as this function contains the
      only methods wich have not been called and tried earlier.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-27-pasha.tatashin@oracle.com
      8dbe4385
    • P
      x86/tsc: Split native_calibrate_cpu() into early and late parts · 03821f45
      Pavel Tatashin 提交于
      During early boot TSC and CPU frequency can be calibrated using MSR, CPUID,
      and quick PIT calibration methods. The other methods PIT/HPET/PMTIMER are
      available only after ACPI is initialized.
      
      Split native_calibrate_cpu() into early and late parts so they can be
      called separately during early and late tsc calibration.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-26-pasha.tatashin@oracle.com
      03821f45
    • P
      x86/tsc: Use TSC as sched clock early · 4763f03d
      Pavel Tatashin 提交于
      All prerequesites for enabling TSC as sched clock early in the boot
      process are available now:
      
       - Early attempt of TSC calibration
      
       - Early availablity of static branch patching
      
      If TSC frequency can be established in the early calibration, enable the
      static key which switches sched clock to use TSC.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-22-pasha.tatashin@oracle.com
      4763f03d
    • P
      x86/tsc: Initialize cyc2ns when tsc frequency is determined · e2a9ca29
      Pavel Tatashin 提交于
      cyc2ns converts tsc to nanoseconds, and it is handled in a per-cpu data
      structure.
      
      Currently, the setup code for c2ns data for every possible CPU goes through
      the same sequence of calculations as for the boot CPU, but is based on the
      same tsc frequency as the boot CPU, and thus this is not necessary.
      
      Initialize the boot cpu when tsc frequency is determined. Copy the
      calculated data from the boot CPU to the other CPUs in tsc_init().
      
      In addition do the following:
      
       - Remove unnecessary zeroing of c2ns data by removing cyc2ns_data_init()
      
       - Split set_cyc2ns_scale() into two functions, so set_cyc2ns_scale() can be
         called when system is up, and wraps around __set_cyc2ns_scale() that can
         be called directly when system is booting but avoids saving restoring
         IRQs and going and waking up from idle.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-21-pasha.tatashin@oracle.com
      e2a9ca29
    • P
      x86/tsc: Calibrate tsc only once · cf7a63ef
      Pavel Tatashin 提交于
      During boot tsc is calibrated twice: once in tsc_early_delay_calibrate(),
      and the second time in tsc_init().
      
      Rename tsc_early_delay_calibrate() to tsc_early_init(), and rework it so
      the calibration is done only early, and make tsc_init() to use the values
      already determined in tsc_early_init().
      
      Sometimes it is not possible to determine tsc early, as the subsystem that
      is required is not yet initialized, in such case try again later in
      tsc_init().
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-20-pasha.tatashin@oracle.com
      cf7a63ef
    • P
      x86/tsc: Redefine notsc to behave as tsc=unstable · fe9af81e
      Pavel Tatashin 提交于
      Currently, the notsc kernel parameter disables the use of the TSC by
      sched_clock(). However, this parameter does not prevent the kernel from
      accessing tsc in other places.
      
      The only rationale to boot with notsc is to avoid timing discrepancies on
      multi-socket systems where TSC are not properly synchronized, and thus
      exclude TSC from being used for time keeping. But that prevents using TSC
      as sched_clock() as well, which is not necessary as the core sched_clock()
      implementation can handle non synchronized TSC based sched clocks just
      fine.
      
      However, there is another method to solve the above problem: booting with
      tsc=unstable parameter. This parameter allows sched_clock() to use TSC and
      just excludes it from timekeeping.
      
      So there is no real reason to keep notsc, but for compatibility reasons the
      parameter has to stay. Make it behave like 'tsc=unstable' instead.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-12-pasha.tatashin@oracle.com
      fe9af81e
    • B
      x86/CPU: Call detect_nopl() only on the BSP · 9b3661cd
      Borislav Petkov 提交于
      Make it use the setup_* variants and have it be called only on the BSP and
      drop the call in generic_identify() - X86_FEATURE_NOPL will be replicated
      to the APs through the forced caps. Helps to keep the mess at a manageable
      level.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-11-pasha.tatashin@oracle.com
      9b3661cd
    • P
      x86/jump_label: Initialize static branching early · 8990cac6
      Pavel Tatashin 提交于
      Static branching is useful to runtime patch branches that are used in hot
      path, but are infrequently changed.
      
      The x86 clock framework is one example that uses static branches to setup
      the best clock during boot and never changes it again.
      
      It is desired to enable the TSC based sched clock early to allow fine
      grained boot time analysis early on. That requires the static branching
      functionality to be functional early as well.
      
      Static branching requires patching nop instructions, thus,
      arch_init_ideal_nops() must be called prior to jump_label_init().
      
      Do all the necessary steps to call arch_init_ideal_nops() right after
      early_cpu_init(), which also allows to insert a call to jump_label_init()
      right after that. jump_label_init() will be called again from the generic
      init code, but the code is protected against reinitialization already.
      
      [ tglx: Massaged changelog ]
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-10-pasha.tatashin@oracle.com
      8990cac6
    • P
      x86/alternatives, jumplabel: Use text_poke_early() before mm_init() · 6fffacb3
      Pavel Tatashin 提交于
      It supposed to be safe to modify static branches after jump_label_init().
      But, because static key modifying code eventually calls text_poke() it can
      end up accessing a struct page which has not been initialized yet.
      
      Here is how to quickly reproduce the problem. Insert code like this
      into init/main.c:
      
      | +static DEFINE_STATIC_KEY_FALSE(__test);
      | asmlinkage __visible void __init start_kernel(void)
      | {
      |        char *command_line;
      |@@ -587,6 +609,10 @@ asmlinkage __visible void __init start_kernel(void)
      |        vfs_caches_init_early();
      |        sort_main_extable();
      |        trap_init();
      |+       {
      |+       static_branch_enable(&__test);
      |+       WARN_ON(!static_branch_likely(&__test));
      |+       }
      |        mm_init();
      
      The following warnings show-up:
      WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:701 text_poke+0x20d/0x230
      RIP: 0010:text_poke+0x20d/0x230
      Call Trace:
       ? text_poke_bp+0x50/0xda
       ? arch_jump_label_transform+0x89/0xe0
       ? __jump_label_update+0x78/0xb0
       ? static_key_enable_cpuslocked+0x4d/0x80
       ? static_key_enable+0x11/0x20
       ? start_kernel+0x23e/0x4c8
       ? secondary_startup_64+0xa5/0xb0
      
      ---[ end trace abdc99c031b8a90a ]---
      
      If the code above is moved after mm_init(), no warning is shown, as struct
      pages are initialized during handover from memblock.
      
      Use text_poke_early() in static branching until early boot IRQs are enabled
      and from there switch to text_poke. Also, ensure text_poke() is never
      invoked when unitialized memory access may happen by using adding a
      !after_bootmem assertion.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-9-pasha.tatashin@oracle.com
      6fffacb3
    • T
      x86/kvmclock: Switch kvmclock data to a PER_CPU variable · 95a3d445
      Thomas Gleixner 提交于
      The previous removal of the memblock dependency from kvmclock introduced a
      static data array sized 64bytes * CONFIG_NR_CPUS. That's wasteful on large
      systems when kvmclock is not used.
      
      Replace it with:
      
       - A static page sized array of pvclock data. It's page sized because the
         pvclock data of the boot cpu is mapped into the VDSO so otherwise random
         other data would be exposed to the vDSO
      
       - A PER_CPU variable of pvclock data pointers. This is used to access the
         pcvlock data storage on each CPU.
      
      The setup is done in two stages:
      
       - Early boot stores the pointer to the static page for the boot CPU in
         the per cpu data.
      
       - In the preparatory stage of CPU hotplug assign either an element of
         the static array (when the CPU number is in that range) or allocate
         memory and initialize the per cpu pointer.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-8-pasha.tatashin@oracle.com
      95a3d445
    • T
      x86/kvmclock: Move kvmclock vsyscall param and init to kvmclock · e499a9b6
      Thomas Gleixner 提交于
      There is no point to have this in the kvm code itself and call it from
      there. This can be called from an initcall and the parameter is cleared
      when the hypervisor is not KVM.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-7-pasha.tatashin@oracle.com
      e499a9b6
    • T
      x86/kvmclock: Mark variables __initdata and __ro_after_init · 42f8df93
      Thomas Gleixner 提交于
      The kvmclock parameter is init data and the other variables are not
      modified after init.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-6-pasha.tatashin@oracle.com
      42f8df93
    • T
      x86/kvmclock: Cleanup the code · 146c394d
      Thomas Gleixner 提交于
      - Cleanup the mrs write for wall clock. The type casts to (int) are sloppy
        because the wrmsr parameters are u32 and aside of that wrmsrl() already
        provides the high/low split for free.
      
      - Remove the pointless get_cpu()/put_cpu() dance from various
        functions. Either they are called during early init where CPU is
        guaranteed to be 0 or they are already called from non preemptible
        context where smp_processor_id() can be used safely
      
      - Simplify the convoluted check for kvmclock in the init function.
      
      - Mark the parameter parsing function __init. No point in keeping it
        around.
      
      - Convert to pr_info()
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-5-pasha.tatashin@oracle.com
      146c394d
    • T
      x86/kvmclock: Decrapify kvm_register_clock() · 7a5ddc8f
      Thomas Gleixner 提交于
      The return value is pointless because the wrmsr cannot fail if
      KVM_FEATURE_CLOCKSOURCE or KVM_FEATURE_CLOCKSOURCE2 are set.
      
      kvm_register_clock() is only called locally so wants to be static.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-4-pasha.tatashin@oracle.com
      7a5ddc8f
    • T
      x86/kvmclock: Remove page size requirement from wall_clock · 7ef363a3
      Thomas Gleixner 提交于
      There is no requirement for wall_clock data to be page aligned or page
      sized.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-3-pasha.tatashin@oracle.com
      7ef363a3
    • P
      x86/kvmclock: Remove memblock dependency · 368a540e
      Pavel Tatashin 提交于
      KVM clock is initialized later compared to other hypervisor clocks because
      it has a dependency on the memblock allocator.
      
      Bring it in line with other hypervisors by using memory from the BSS
      instead of allocating it.
      
      The benefits:
      
        - Remove ifdef from common code
        - Earlier availability of the clock
        - Remove dependency on memblock, and reduce code
      
      The downside:
      
        - Static allocation of the per cpu data structures sized NR_CPUS * 64byte
          Will be addressed in follow up patches.
      
      [ tglx: Split out from larger series ]
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-2-pasha.tatashin@oracle.com
      368a540e
  8. 19 7月, 2018 1 次提交