1. 21 2月, 2018 1 次提交
    • K
      x86/mm: Redefine some of page table helpers as macros · 92e1c5b3
      Kirill A. Shutemov 提交于
      This is preparation for the next patch, which would change
      pgtable_l5_enabled to be cpu_feature_enabled(X86_FEATURE_LA57).
      
      The change makes few helpers in paravirt.h dependent on
      cpu_feature_enabled() definition from cpufeature.h.
      And cpufeature.h is dependent on paravirt.h.
      
      Let's re-define some of helpers as macros to break this dependency loop.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180216114948.68868-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92e1c5b3
  2. 16 2月, 2018 6 次提交
  3. 14 2月, 2018 8 次提交
  4. 13 2月, 2018 1 次提交
  5. 07 2月, 2018 1 次提交
  6. 06 2月, 2018 1 次提交
  7. 04 2月, 2018 1 次提交
  8. 03 2月, 2018 1 次提交
  9. 01 2月, 2018 3 次提交
  10. 31 1月, 2018 8 次提交
    • V
      x86/irq: Count Hyper-V reenlightenment interrupts · 51d4e5da
      Vitaly Kuznetsov 提交于
      Hyper-V reenlightenment interrupts arrive when the VM is migrated, While
      they are not interesting in general it's important when L2 nested guests
      are running.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-6-vkuznets@redhat.com
      51d4e5da
    • V
      x86/hyperv: Reenlightenment notifications support · 93286261
      Vitaly Kuznetsov 提交于
      Hyper-V supports Live Migration notification. This is supposed to be used
      in conjunction with TSC emulation: when a VM is migrated to a host with
      different TSC frequency for some short period the host emulates the
      accesses to TSC and sends an interrupt to notify about the event. When the
      guest is done updating everything it can disable TSC emulation and
      everything will start working fast again.
      
      These notifications weren't required until now as Hyper-V guests are not
      supposed to use TSC as a clocksource: in Linux the TSC is even marked as
      unstable on boot. Guests normally use 'tsc page' clocksource and host
      updates its values on migrations automatically.
      
      Things change when with nested virtualization: even when the PV
      clocksources (kvm-clock or tsc page) are passed through to the nested
      guests the TSC frequency and frequency changes need to be know..
      
      Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
      EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
      right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
      the fix in on the way.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
      93286261
    • V
      x86/hyperv: Add a function to read both TSC and TSC page value simulateneously · e2768eaa
      Vitaly Kuznetsov 提交于
      This is going to be used from KVM code where both TSC and TSC page value
      are needed.
      
      Nothing is supposed to use the function when Hyper-V code is compiled out,
      just BUG().
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-3-vkuznets@redhat.com
      e2768eaa
    • T
      x86/speculation: Use Indirect Branch Prediction Barrier in context switch · 18bf3c3e
      Tim Chen 提交于
      Flush indirect branches when switching into a process that marked itself
      non dumpable. This protects high value processes like gpg better,
      without having too high performance overhead.
      
      If done naïvely, we could switch to a kernel idle thread and then back
      to the original process, such as:
      
          process A -> idle -> process A
      
      In such scenario, we do not have to do IBPB here even though the process
      is non-dumpable, as we are switching back to the same process after a
      hiatus.
      
      To avoid the redundant IBPB, which is expensive, we track the last mm
      user context ID. The cost is to have an extra u64 mm context id to track
      the last mm we were using before switching to the init_mm used by idle.
      Avoiding the extra IBPB is probably worth the extra memory for this
      common scenario.
      
      For those cases where tlb_defer_switch_to_init_mm() returns true (non
      PCID), lazy tlb will defer switch to init_mm, so we will not be changing
      the mm for the process A -> idle -> process A switch. So IBPB will be
      skipped for this case.
      
      Thanks to the reviewers and Andy Lutomirski for the suggestion of
      using ctx_id which got rid of the problem of mm pointer recycling.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ak@linux.intel.com
      Cc: karahmed@amazon.de
      Cc: arjan@linux.intel.com
      Cc: torvalds@linux-foundation.org
      Cc: linux@dominikbrodowski.net
      Cc: peterz@infradead.org
      Cc: bp@alien8.de
      Cc: luto@kernel.org
      Cc: pbonzini@redhat.com
      Cc: gregkh@linux-foundation.org
      Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk
      18bf3c3e
    • D
      x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec · 304ec1b0
      Dan Williams 提交于
      Quoting Linus:
      
          I do think that it would be a good idea to very expressly document
          the fact that it's not that the user access itself is unsafe. I do
          agree that things like "get_user()" want to be protected, but not
          because of any direct bugs or problems with get_user() and friends,
          but simply because get_user() is an excellent source of a pointer
          that is obviously controlled from a potentially attacking user
          space. So it's a prime candidate for then finding _subsequent_
          accesses that can then be used to perturb the cache.
      
      __uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the
      limit check is far away from the user pointer de-reference. In those cases
      a barrier_nospec() prevents speculation with a potential pointer to
      privileged memory. uaccess_try_nospec covers get_user_try.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Kees Cook <keescook@chromium.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com
      304ec1b0
    • D
      x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec · b3bbfb3f
      Dan Williams 提交于
      For __get_user() paths, do not allow the kernel to speculate on the value
      of a user controlled pointer. In addition to the 'stac' instruction for
      Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the
      access_ok() result to resolve in the pipeline before the CPU might take any
      speculative action on the pointer value. Given the cost of 'stac' the
      speculation barrier is placed after 'stac' to hopefully overlap the cost of
      disabling SMAP with the cost of flushing the instruction pipeline.
      
      Since __get_user is a major kernel interface that deals with user
      controlled pointers, the __uaccess_begin_nospec() mechanism will prevent
      speculative execution past an access_ok() permission check. While
      speculative execution past access_ok() is not enough to lead to a kernel
      memory leak, it is a necessary precondition.
      
      To be clear, __uaccess_begin_nospec() is addressing a class of potential
      problems near __get_user() usages.
      
      Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used
      to protect __get_user(), pointer masking similar to array_index_nospec()
      will be used for get_user() since it incorporates a bounds check near the
      usage.
      
      uaccess_try_nospec provides the same mechanism for get_user_try.
      
      No functional changes.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Suggested-by: NIngo Molnar <mingo@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwillia2-desk3.amr.corp.intel.com
      b3bbfb3f
    • D
      x86: Introduce barrier_nospec · b3d7ad85
      Dan Williams 提交于
      Rename the open coded form of this instruction sequence from
      rdtsc_ordered() into a generic barrier primitive, barrier_nospec().
      
      One of the mitigations for Spectre variant1 vulnerabilities is to fence
      speculative execution after successfully validating a bounds check. I.e.
      force the result of a bounds check to resolve in the instruction pipeline
      to ensure speculative execution honors that result before potentially
      operating on out-of-bounds data.
      
      No functional changes.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Suggested-by: NIngo Molnar <mingo@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727415361.33451.9049453007262764675.stgit@dwillia2-desk3.amr.corp.intel.com
      b3d7ad85
    • D
      x86: Implement array_index_mask_nospec · babdde26
      Dan Williams 提交于
      array_index_nospec() uses a mask to sanitize user controllable array
      indexes, i.e. generate a 0 mask if 'index' >= 'size', and a ~0 mask
      otherwise. While the default array_index_mask_nospec() handles the
      carry-bit from the (index - size) result in software.
      
      The x86 array_index_mask_nospec() does the same, but the carry-bit is
      handled in the processor CF flag without conditional instructions in the
      control flow.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727414808.33451.1873237130672785331.stgit@dwillia2-desk3.amr.corp.intel.com
      babdde26
  11. 30 1月, 2018 2 次提交
  12. 28 1月, 2018 3 次提交
  13. 26 1月, 2018 4 次提交