1. 28 9月, 2018 1 次提交
    • K
      x86/boot: Fix kexec booting failure in the SEV bit detection code · bdec8d7f
      Kairui Song 提交于
      Commit
      
        1958b5fc ("x86/boot: Add early boot support when running with SEV active")
      
      can occasionally cause system resets when kexec-ing a second kernel even
      if SEV is not active.
      
      That's because get_sev_encryption_bit() uses 32-bit rIP-relative
      addressing to read the value of enc_bit - a variable which caches a
      previously detected encryption bit position - but kexec may allocate
      the early boot code to a higher location, beyond the 32-bit addressing
      limit.
      
      In this case, garbage will be read and get_sev_encryption_bit() will
      return the wrong value, leading to accessing memory with the wrong
      encryption setting.
      
      Therefore, remove enc_bit, and thus get rid of the need to do 32-bit
      rIP-relative addressing in the first place.
      
       [ bp: massage commit message heavily. ]
      
      Fixes: 1958b5fc ("x86/boot: Add early boot support when running with SEV active")
      Suggested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: tglx@linutronix.de
      Cc: mingo@redhat.com
      Cc: hpa@zytor.com
      Cc: brijesh.singh@amd.com
      Cc: kexec@lists.infradead.org
      Cc: dyoung@redhat.com
      Cc: bhe@redhat.com
      Cc: ghook@redhat.com
      Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
      bdec8d7f
  2. 21 9月, 2018 1 次提交
  3. 19 9月, 2018 10 次提交
  4. 16 9月, 2018 2 次提交
    • B
      x86/kvm: Use __bss_decrypted attribute in shared variables · 6a1cac56
      Brijesh Singh 提交于
      The recent removal of the memblock dependency from kvmclock caused a SEV
      guest regression because the wall_clock and hv_clock_boot variables are
      no longer mapped decrypted when SEV is active.
      
      Use the __bss_decrypted attribute to put the static wall_clock and
      hv_clock_boot in the .bss..decrypted section so that they are mapped
      decrypted during boot.
      
      In the preparatory stage of CPU hotplug, the per-cpu pvclock data pointer
      assigns either an element of the static array or dynamically allocated
      memory for the pvclock data pointer. The static array are now mapped
      decrypted but the dynamically allocated memory is not mapped decrypted.
      However, when SEV is active this memory range must be mapped decrypted.
      
      Add a function which is called after the page allocator is up, and
      allocate memory for the pvclock data pointers for the all possible cpus.
      Map this memory range as decrypted when SEV is active.
      
      Fixes: 368a540e ("x86/kvmclock: Remove memblock dependency")
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Link: https://lkml.kernel.org/r/1536932759-12905-3-git-send-email-brijesh.singh@amd.com
      6a1cac56
    • B
      x86/mm: Add .bss..decrypted section to hold shared variables · b3f0907c
      Brijesh Singh 提交于
      kvmclock defines few static variables which are shared with the
      hypervisor during the kvmclock initialization.
      
      When SEV is active, memory is encrypted with a guest-specific key, and
      if the guest OS wants to share the memory region with the hypervisor
      then it must clear the C-bit before sharing it.
      
      Currently, we use kernel_physical_mapping_init() to split large pages
      before clearing the C-bit on shared pages. But it fails when called from
      the kvmclock initialization (mainly because the memblock allocator is
      not ready that early during boot).
      
      Add a __bss_decrypted section attribute which can be used when defining
      such shared variable. The so-defined variables will be placed in the
      .bss..decrypted section. This section will be mapped with C=0 early
      during boot.
      
      The .bss..decrypted section has a big chunk of memory that may be unused
      when memory encryption is not active, free it when memory encryption is
      not active.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Radim Krčmář<rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com
      b3f0907c
  5. 15 9月, 2018 1 次提交
  6. 14 9月, 2018 1 次提交
    • J
      Revert "x86/mm/legacy: Populate the user page-table with user pgd's" · 61a6bd83
      Joerg Roedel 提交于
      This reverts commit 1f40a46c.
      
      It turned out that this patch is not sufficient to enable PTI on 32 bit
      systems with legacy 2-level page-tables. In this paging mode the huge-page
      PTEs are in the top-level page-table directory, where also the mirroring to
      the user-space page-table happens. So every huge PTE exits twice, in the
      kernel and in the user page-table.
      
      That means that accessed/dirty bits need to be fetched from two PTEs in
      this mode to be safe, but this is not trivial to implement because it needs
      changes to generic code just for the sake of enabling PTI with 32-bit
      legacy paging. As all systems that need PTI should support PAE anyway,
      remove support for PTI when 32-bit legacy paging is used.
      
      Fixes: 7757d607 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org
      61a6bd83
  7. 13 9月, 2018 2 次提交
  8. 12 9月, 2018 4 次提交
  9. 11 9月, 2018 4 次提交
    • J
      arm64: kernel: arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE · 84c57dbd
      James Morse 提交于
      Since commit 23c85094 ("proc/kcore: add vmcoreinfo note to /proc/kcore")
      the kernel has exported the vmcoreinfo PT_NOTE on /proc/kcore as well
      as /proc/vmcore.
      
      arm64 only exposes it's additional arch information via
      arch_crash_save_vmcoreinfo() if built with CONFIG_KEXEC, as kdump was
      previously the only user of vmcoreinfo.
      
      Move this weak function to a separate file that is built at the same
      time as its caller in kernel/crash_core.c. This ensures values like
      'kimage_voffset' are always present in the vmcoreinfo PT_NOTE.
      
      CC: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      84c57dbd
    • M
      arm64: jump_label.h: use asm_volatile_goto macro instead of "asm goto" · 13aceef0
      Miguel Ojeda 提交于
      All other uses of "asm goto" go through asm_volatile_goto, which avoids
      a miscompile when using GCC < 4.8.2. Replace our open-coded "asm goto"
      statements with the asm_volatile_goto macro to avoid issues with older
      toolchains.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      13aceef0
    • R
      hexagon: modify ffs() and fls() to return int · 5c41aaad
      Randy Dunlap 提交于
      Building drivers/mtd/nand/raw/nandsim.c on arch/hexagon/ produces a
      printk format build warning.  This is due to hexagon's ffs() being
      coded as returning long instead of int.
      
      Fix the printk format warning by changing all of hexagon's ffs() and
      fls() functions to return int instead of long.  The variables that
      they return are already int instead of long.  This return type
      matches the return type in <asm-generic/bitops/>.
      
      ../drivers/mtd/nand/raw/nandsim.c: In function 'init_nandsim':
      ../drivers/mtd/nand/raw/nandsim.c:760:2: warning: format '%u' expects argument of type 'unsigned int', but argument 2 has type 'long int' [-Wformat]
      
      There are no ffs() or fls() allmodconfig build errors after making this
      change.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: linux-hexagon@vger.kernel.org
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Patch-mainline: linux-kernel @ 07/22/2018, 16:03
      Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>
      5c41aaad
    • R
      arch/hexagon: fix kernel/dma.c build warning · 200f351e
      Randy Dunlap 提交于
      Fix build warning in arch/hexagon/kernel/dma.c by casting a void *
      to unsigned long to match the function parameter type.
      
      ../arch/hexagon/kernel/dma.c: In function 'arch_dma_alloc':
      ../arch/hexagon/kernel/dma.c:51:5: warning: passing argument 2 of 'gen_pool_add' makes integer from pointer without a cast [enabled by default]
      ../include/linux/genalloc.h:112:19: note: expected 'long unsigned int' but argument is of type 'void *'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: linux-sh@vger.kernel.org
      Patch-mainline: linux-kernel @ 07/20/2018, 20:17
      [rkuo@codeaurora.org: fixed architecture name]
      Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>
      200f351e
  10. 10 9月, 2018 1 次提交
    • J
      perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs · 16160c19
      Jacek Tomaka 提交于
      Problem: perf did not show branch predicted/mispredicted bit in brstack.
      
      Output of perf -F brstack for profile collected
      
      Before:
      
       0x4fdbcd/0x4fdc03/-/-/-/0
       0x45f4c1/0x4fdba0/-/-/-/0
       0x45f544/0x45f4bb/-/-/-/0
       0x45f555/0x45f53c/-/-/-/0
       0x7f66901cc24b/0x45f555/-/-/-/0
       0x7f66901cc22e/0x7f66901cc23d/-/-/-/0
       0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0
       0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0
      
      After:
      
       0x4fdbcd/0x4fdc03/P/-/-/0
       0x45f4c1/0x4fdba0/P/-/-/0
       0x45f544/0x45f4bb/P/-/-/0
       0x45f555/0x45f53c/P/-/-/0
       0x7f66901cc24b/0x45f555/P/-/-/0
       0x7f66901cc22e/0x7f66901cc23d/P/-/-/0
       0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0
       0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0
      
      Cause:
      
      As mentioned in Software Development Manual vol 3, 17.4.8.1,
      IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is
      stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as
      its format. Despite that, registers containing FROM address of the branch,
      do have MISPREDICT bit but because of the format indicated in
      IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit.
      
      Solution:
      
      Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit.
      Signed-off-by: NJacek Tomaka <jacek.tomaka@poczta.fm>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      16160c19
  11. 08 9月, 2018 4 次提交
    • N
      x86/mm: Use WRITE_ONCE() when setting PTEs · 9bc4f28a
      Nadav Amit 提交于
      When page-table entries are set, the compiler might optimize their
      assignment by using multiple instructions to set the PTE. This might
      turn into a security hazard if the user somehow manages to use the
      interim PTE. L1TF does not make our lives easier, making even an interim
      non-present PTE a security hazard.
      
      Using WRITE_ONCE() to set PTEs and friends should prevent this potential
      security hazard.
      
      I skimmed the differences in the binary with and without this patch. The
      differences are (obviously) greater when CONFIG_PARAVIRT=n as more
      code optimizations are possible. For better and worse, the impact on the
      binary with this patch is pretty small. Skimming the code did not cause
      anything to jump out as a security hazard, but it seems that at least
      move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
      9bc4f28a
    • T
      x86/apic/vector: Make error return value negative · 47b7360c
      Thomas Gleixner 提交于
      activate_managed() returns EINVAL instead of -EINVAL in case of
      error. While this is unlikely to happen, the positive return value would
      cause further malfunction at the call site.
      
      Fixes: 2db1f959 ("x86/vector: Handle managed interrupts proper")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      47b7360c
    • W
      KVM: LAPIC: Fix pv ipis out-of-bounds access · bdf7ffc8
      Wanpeng Li 提交于
      Dan Carpenter reported that the untrusted data returns from kvm_register_read()
      results in the following static checker warning:
        arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi()
        error: buffer underflow 'map->phys_map' 's32min-s32max'
      
      KVM guest can easily trigger this by executing the following assembly sequence
      in Ring0:
      
      mov $10, %rax
      mov $0xFFFFFFFF, %rbx
      mov $0xFFFFFFFF, %rdx
      mov $0, %rsi
      vmcall
      
      As this will cause KVM to execute the following code-path:
      vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi()
      which will reach out-of-bounds access.
      
      This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id,
      ignoring destinations that are not present and delivering the rest. We also check
      whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the
      max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm
      unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      [Add second "if (min > map->max_apic_id)" to complete the fix. -Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      bdf7ffc8
    • L
      KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2 · b5861e5c
      Liran Alon 提交于
      Consider the case L1 had a IRQ/NMI event until it executed
      VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
      (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
      L0 needs to evaluate if this pending event should cause an exit from
      L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
      EXTERNAL_INTERRUPT).
      
      Usually this would be handled by L0 requesting a IRQ/NMI window
      by setting VMCS accordingly. However, this setting was done on
      VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
      VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
      requesting a KVM_REQ_EVENT.
      
      Note that above scenario exists when L1 KVM is about to enter L2 but
      requests an "immediate-exit". As in this case, L1 will
      disable-interrupts and then send a self-IPI before entering L2.
      Reviewed-by: NNikita Leshchenko <nikita.leshchenko@oracle.com>
      Co-developed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b5861e5c
  12. 07 9月, 2018 5 次提交
  13. 06 9月, 2018 2 次提交
  14. 05 9月, 2018 2 次提交