1. 22 2月, 2018 1 次提交
  2. 17 2月, 2018 1 次提交
  3. 15 2月, 2018 7 次提交
  4. 13 2月, 2018 2 次提交
    • T
      x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages · fd0e786d
      Tony Luck 提交于
      In the following commit:
      
        ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      
      ... we added code to memory_failure() to unmap the page from the
      kernel 1:1 virtual address space to avoid speculative access to the
      page logging additional errors.
      
      But memory_failure() may not always succeed in taking the page offline,
      especially if the page belongs to the kernel.  This can happen if
      there are too many corrected errors on a page and either mcelog(8)
      or drivers/ras/cec.c asks to take a page offline.
      
      Since we remove the 1:1 mapping early in memory_failure(), we can
      end up with the page unmapped, but still in use. On the next access
      the kernel crashes :-(
      
      There are also various debug paths that call memory_failure() to simulate
      occurrence of an error. Since there is no actual error in memory, we
      don't need to map out the page for those cases.
      
      Revert most of the previous attempt and keep the solution local to
      arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
      
      	1) there is a real error
      	2) memory_failure() succeeds.
      
      All of this only applies to 64-bit systems. 32-bit kernel doesn't map
      all of memory into kernel space. It isn't worth adding the code to unmap
      the piece that is mapped because nobody would run a 32-bit kernel on a
      machine that has recoverable machine checks.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert (Persistent Memory) <elliott@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org #v4.14
      Fixes: ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fd0e786d
    • D
      Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" · f208820a
      David Woodhouse 提交于
      This reverts commit 64e16720.
      
      We cannot call C functions like that, without marking all the
      call-clobbered registers as, well, clobbered. We might have got away
      with it for now because the __ibp_barrier() function was *fairly*
      unlikely to actually use any other registers. But no. Just no.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: karahmed@amazon.de
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f208820a
  5. 07 2月, 2018 1 次提交
  6. 06 2月, 2018 3 次提交
  7. 04 2月, 2018 1 次提交
  8. 03 2月, 2018 1 次提交
  9. 01 2月, 2018 3 次提交
  10. 31 1月, 2018 8 次提交
    • V
      x86/irq: Count Hyper-V reenlightenment interrupts · 51d4e5da
      Vitaly Kuznetsov 提交于
      Hyper-V reenlightenment interrupts arrive when the VM is migrated, While
      they are not interesting in general it's important when L2 nested guests
      are running.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-6-vkuznets@redhat.com
      51d4e5da
    • V
      x86/hyperv: Reenlightenment notifications support · 93286261
      Vitaly Kuznetsov 提交于
      Hyper-V supports Live Migration notification. This is supposed to be used
      in conjunction with TSC emulation: when a VM is migrated to a host with
      different TSC frequency for some short period the host emulates the
      accesses to TSC and sends an interrupt to notify about the event. When the
      guest is done updating everything it can disable TSC emulation and
      everything will start working fast again.
      
      These notifications weren't required until now as Hyper-V guests are not
      supposed to use TSC as a clocksource: in Linux the TSC is even marked as
      unstable on boot. Guests normally use 'tsc page' clocksource and host
      updates its values on migrations automatically.
      
      Things change when with nested virtualization: even when the PV
      clocksources (kvm-clock or tsc page) are passed through to the nested
      guests the TSC frequency and frequency changes need to be know..
      
      Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
      EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The
      right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that
      the fix in on the way.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
      93286261
    • V
      x86/hyperv: Add a function to read both TSC and TSC page value simulateneously · e2768eaa
      Vitaly Kuznetsov 提交于
      This is going to be used from KVM code where both TSC and TSC page value
      are needed.
      
      Nothing is supposed to use the function when Hyper-V code is compiled out,
      just BUG().
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
      Cc: Roman Kagan <rkagan@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: devel@linuxdriverproject.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Mohammed Gamal <mmorsy@redhat.com>
      Link: https://lkml.kernel.org/r/20180124132337.30138-3-vkuznets@redhat.com
      e2768eaa
    • T
      x86/speculation: Use Indirect Branch Prediction Barrier in context switch · 18bf3c3e
      Tim Chen 提交于
      Flush indirect branches when switching into a process that marked itself
      non dumpable. This protects high value processes like gpg better,
      without having too high performance overhead.
      
      If done naïvely, we could switch to a kernel idle thread and then back
      to the original process, such as:
      
          process A -> idle -> process A
      
      In such scenario, we do not have to do IBPB here even though the process
      is non-dumpable, as we are switching back to the same process after a
      hiatus.
      
      To avoid the redundant IBPB, which is expensive, we track the last mm
      user context ID. The cost is to have an extra u64 mm context id to track
      the last mm we were using before switching to the init_mm used by idle.
      Avoiding the extra IBPB is probably worth the extra memory for this
      common scenario.
      
      For those cases where tlb_defer_switch_to_init_mm() returns true (non
      PCID), lazy tlb will defer switch to init_mm, so we will not be changing
      the mm for the process A -> idle -> process A switch. So IBPB will be
      skipped for this case.
      
      Thanks to the reviewers and Andy Lutomirski for the suggestion of
      using ctx_id which got rid of the problem of mm pointer recycling.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ak@linux.intel.com
      Cc: karahmed@amazon.de
      Cc: arjan@linux.intel.com
      Cc: torvalds@linux-foundation.org
      Cc: linux@dominikbrodowski.net
      Cc: peterz@infradead.org
      Cc: bp@alien8.de
      Cc: luto@kernel.org
      Cc: pbonzini@redhat.com
      Cc: gregkh@linux-foundation.org
      Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk
      18bf3c3e
    • D
      x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec · 304ec1b0
      Dan Williams 提交于
      Quoting Linus:
      
          I do think that it would be a good idea to very expressly document
          the fact that it's not that the user access itself is unsafe. I do
          agree that things like "get_user()" want to be protected, but not
          because of any direct bugs or problems with get_user() and friends,
          but simply because get_user() is an excellent source of a pointer
          that is obviously controlled from a potentially attacking user
          space. So it's a prime candidate for then finding _subsequent_
          accesses that can then be used to perturb the cache.
      
      __uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the
      limit check is far away from the user pointer de-reference. In those cases
      a barrier_nospec() prevents speculation with a potential pointer to
      privileged memory. uaccess_try_nospec covers get_user_try.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Kees Cook <keescook@chromium.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com
      304ec1b0
    • D
      x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec · b3bbfb3f
      Dan Williams 提交于
      For __get_user() paths, do not allow the kernel to speculate on the value
      of a user controlled pointer. In addition to the 'stac' instruction for
      Supervisor Mode Access Protection (SMAP), a barrier_nospec() causes the
      access_ok() result to resolve in the pipeline before the CPU might take any
      speculative action on the pointer value. Given the cost of 'stac' the
      speculation barrier is placed after 'stac' to hopefully overlap the cost of
      disabling SMAP with the cost of flushing the instruction pipeline.
      
      Since __get_user is a major kernel interface that deals with user
      controlled pointers, the __uaccess_begin_nospec() mechanism will prevent
      speculative execution past an access_ok() permission check. While
      speculative execution past access_ok() is not enough to lead to a kernel
      memory leak, it is a necessary precondition.
      
      To be clear, __uaccess_begin_nospec() is addressing a class of potential
      problems near __get_user() usages.
      
      Note, that while the barrier_nospec() in __uaccess_begin_nospec() is used
      to protect __get_user(), pointer masking similar to array_index_nospec()
      will be used for get_user() since it incorporates a bounds check near the
      usage.
      
      uaccess_try_nospec provides the same mechanism for get_user_try.
      
      No functional changes.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Suggested-by: NIngo Molnar <mingo@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727415922.33451.5796614273104346583.stgit@dwillia2-desk3.amr.corp.intel.com
      b3bbfb3f
    • D
      x86: Introduce barrier_nospec · b3d7ad85
      Dan Williams 提交于
      Rename the open coded form of this instruction sequence from
      rdtsc_ordered() into a generic barrier primitive, barrier_nospec().
      
      One of the mitigations for Spectre variant1 vulnerabilities is to fence
      speculative execution after successfully validating a bounds check. I.e.
      force the result of a bounds check to resolve in the instruction pipeline
      to ensure speculative execution honors that result before potentially
      operating on out-of-bounds data.
      
      No functional changes.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Suggested-by: NIngo Molnar <mingo@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727415361.33451.9049453007262764675.stgit@dwillia2-desk3.amr.corp.intel.com
      b3d7ad85
    • D
      x86: Implement array_index_mask_nospec · babdde26
      Dan Williams 提交于
      array_index_nospec() uses a mask to sanitize user controllable array
      indexes, i.e. generate a 0 mask if 'index' >= 'size', and a ~0 mask
      otherwise. While the default array_index_mask_nospec() handles the
      carry-bit from the (index - size) result in software.
      
      The x86 array_index_mask_nospec() does the same, but the carry-bit is
      handled in the processor CF flag without conditional instructions in the
      control flow.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: kernel-hardening@lists.openwall.com
      Cc: gregkh@linuxfoundation.org
      Cc: alan@linux.intel.com
      Link: https://lkml.kernel.org/r/151727414808.33451.1873237130672785331.stgit@dwillia2-desk3.amr.corp.intel.com
      babdde26
  11. 30 1月, 2018 2 次提交
  12. 28 1月, 2018 3 次提交
  13. 26 1月, 2018 6 次提交
  14. 24 1月, 2018 1 次提交