1. 21 9月, 2018 1 次提交
  2. 20 9月, 2018 16 次提交
    • D
      KVM: x86: Control guest reads of MSR_PLATFORM_INFO · 6fbbde9a
      Drew Schmitt 提交于
      Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
      to reads of MSR_PLATFORM_INFO.
      
      Disabling access to reads of this MSR gives userspace the control to "expose"
      this platform-dependent information to guests in a clear way. As it exists
      today, guests that read this MSR would get unpopulated information if userspace
      hadn't already set it (and prior to this patch series, only the CPUID faulting
      information could have been populated). This existing interface could be
      confusing if guests don't handle the potential for incorrect/incomplete
      information gracefully (e.g. zero reported for base frequency).
      Signed-off-by: NDrew Schmitt <dasch@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6fbbde9a
    • D
      KVM: x86: Turbo bits in MSR_PLATFORM_INFO · d84f1cff
      Drew Schmitt 提交于
      Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only
      the CPUID faulting bit was settable. But now any bit in
      MSR_PLATFORM_INFO would be settable. This can be used, for example, to
      convey frequency information about the platform on which the guest is
      running.
      Signed-off-by: NDrew Schmitt <dasch@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d84f1cff
    • K
      nVMX x86: Check VPID value on vmentry of L2 guests · ba8e23db
      Krish Sadhukhan 提交于
      According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
      following check needs to be enforced on vmentry of L2 guests:
      
          If the 'enable VPID' VM-execution control is 1, the value of the
          of the VPID VM-execution control field must not be 0000H.
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ba8e23db
    • K
      nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2 · 6de84e58
      Krish Sadhukhan 提交于
      According to section "Checks on VMX Controls" in Intel SDM vol 3C,
      the following check needs to be enforced on vmentry of L2 guests:
      
         - Bits 5:0 of the posted-interrupt descriptor address are all 0.
         - The posted-interrupt descriptor address does not set any bits
           beyond the processor's physical-address width.
      Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6de84e58
    • L
      KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv · e6c67d8c
      Liran Alon 提交于
      In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state,
      it is possible for a vCPU to be blocked while it is in guest-mode.
      
      According to Intel SDM 26.6.5 Interrupt-Window Exiting and
      Virtual-Interrupt Delivery: "These events wake the logical processor
      if it just entered the HLT state because of a VM entry".
      Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending
      deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU
      should be waken from the HLT state and injected with the interrupt.
      
      In addition, if while the vCPU is blocked (while it is in guest-mode),
      it receives a nested posted-interrupt, then the vCPU should also be
      waken and injected with the posted interrupt.
      
      To handle these cases, this patch enhances kvm_vcpu_has_events() to also
      check if there is a pending interrupt in L2 virtual APICv provided by
      L1. That is, it evaluates if there is a pending virtual interrupt for L2
      by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1
      Evaluation of Pending Interrupts.
      
      Note that this also handles the case of nested posted-interrupt by the
      fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
      called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
      kvm_vcpu_running() -> vmx_check_nested_events() ->
      vmx_complete_nested_posted_interrupt().
      Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e6c67d8c
    • P
      KVM: VMX: check nested state and CR4.VMXE against SMM · 5bea5123
      Paolo Bonzini 提交于
      VMX cannot be enabled under SMM, check it when CR4 is set and when nested
      virtualization state is restored.
      
      This should fix some WARNs reported by syzkaller, mostly around
      alloc_shadow_vmcs.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5bea5123
    • S
      kvm: x86: make kvm_{load|put}_guest_fpu() static · 822f312d
      Sebastian Andrzej Siewior 提交于
      The functions
      	kvm_load_guest_fpu()
      	kvm_put_guest_fpu()
      
      are only used locally, make them static. This requires also that both
      functions are moved because they are used before their implementation.
      Those functions were exported (via EXPORT_SYMBOL) before commit
      e5bb4025 ("KVM: Drop kvm_{load,put}_guest_fpu() exports").
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      822f312d
    • V
      x86/hyper-v: rename ipi_arg_{ex,non_ex} structures · a1efa9b7
      Vitaly Kuznetsov 提交于
      These structures are going to be used from KVM code so let's make
      their names reflect their Hyper-V origin.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a1efa9b7
    • S
      KVM: VMX: use preemption timer to force immediate VMExit · d264ee0c
      Sean Christopherson 提交于
      A VMX preemption timer value of '0' is guaranteed to cause a VMExit
      prior to the CPU executing any instructions in the guest.  Use the
      preemption timer (if it's supported) to trigger immediate VMExit
      in place of the current method of sending a self-IPI.  This ensures
      that pending VMExit injection to L1 occurs prior to executing any
      instructions in the guest (regardless of nesting level).
      
      When deferring VMExit injection, KVM generates an immediate VMExit
      from the (possibly nested) guest by sending itself an IPI.  Because
      hardware interrupts are blocked prior to VMEnter and are unblocked
      (in hardware) after VMEnter, this results in taking a VMExit(INTR)
      before any guest instruction is executed.  But, as this approach
      relies on the IPI being received before VMEnter executes, it only
      works as intended when KVM is running as L0.  Because there are no
      architectural guarantees regarding when IPIs are delivered, when
      running nested the INTR may "arrive" long after L2 is running e.g.
      L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.
      
      For the most part, this unintended delay is not an issue since the
      events being injected to L1 also do not have architectural guarantees
      regarding their timing.  The notable exception is the VMX preemption
      timer[1], which is architecturally guaranteed to cause a VMExit prior
      to executing any instructions in the guest if the timer value is '0'
      at VMEnter.  Specifically, the delay in injecting the VMExit causes
      the preemption timer KVM unit test to fail when run in a nested guest.
      
      Note: this approach is viable even on CPUs with a broken preemption
      timer, as broken in this context only means the timer counts at the
      wrong rate.  There are no known errata affecting timer value of '0'.
      
      [1] I/O SMIs also have guarantees on when they arrive, but I have
          no idea if/how those are emulated in KVM.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      [Use a hook for SVM instead of leaving the default in x86.c - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d264ee0c
    • S
      KVM: VMX: modify preemption timer bit only when arming timer · f459a707
      Sean Christopherson 提交于
      Provide a singular location where the VMX preemption timer bit is
      set/cleared so that future usages of the preemption timer can ensure
      the VMCS bit is up-to-date without having to modify unrelated code
      paths.  For example, the preemption timer can be used to force an
      immediate VMExit.  Cache the status of the timer to avoid redundant
      VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
      VMEnters/VMExits.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f459a707
    • S
      KVM: VMX: immediately mark preemption timer expired only for zero value · 4c008127
      Sean Christopherson 提交于
      A VMX preemption timer value of '0' at the time of VMEnter is
      architecturally guaranteed to cause a VMExit prior to the CPU
      executing any instructions in the guest.  This architectural
      definition is in place to ensure that a previously expired timer
      is correctly recognized by the CPU as it is possible for the timer
      to reach zero and not trigger a VMexit due to a higher priority
      VMExit being signalled instead, e.g. a pending #DB that morphs into
      a VMExit.
      
      Whether by design or coincidence, commit f4124500 ("KVM: nVMX:
      Fully emulate preemption timer") special cased timer values of '0'
      and '1' to ensure prompt delivery of the VMExit.  Unlike '0', a
      timer value of '1' has no has no architectural guarantees regarding
      when it is delivered.
      
      Modify the timer emulation to trigger immediate VMExit if and only
      if the timer value is '0', and document precisely why '0' is special.
      Do this even if calibration of the virtual TSC failed, i.e. VMExit
      will occur immediately regardless of the frequency of the timer.
      Making only '0' a special case gives KVM leeway to be more aggressive
      in ensuring the VMExit is injected prior to executing instructions in
      the nested guest, and also eliminates any ambiguity as to why '1' is
      a special case, e.g. why wasn't the threshold for a "short timeout"
      set to 10, 100, 1000, etc...
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4c008127
    • A
      KVM: SVM: Switch to bitmap_zalloc() · a101c9d6
      Andy Shevchenko 提交于
      Switch to bitmap_zalloc() to show clearly what we are allocating.
      Besides that it returns pointer of bitmap type instead of opaque void *.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a101c9d6
    • T
      KVM/MMU: Fix comment in walk_shadow_page_lockless_end() · 9a984586
      Tianyu Lan 提交于
      kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page()
      This patch is to fix the commit.
      Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9a984586
    • W
      KVM: x86: don't reset root in kvm_mmu_setup() · 83b20b28
      Wei Yang 提交于
      Here is the code path which shows kvm_mmu_setup() is invoked after
      kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path,
      this means the root_hpa and prev_roots are guaranteed to be invalid. And
      it is not necessary to reset it again.
      
          kvm_vm_ioctl_create_vcpu()
              kvm_arch_vcpu_create()
                  vmx_create_vcpu()
                      kvm_vcpu_init()
                          kvm_arch_vcpu_init()
                              kvm_mmu_create()
              kvm_arch_vcpu_setup()
                  kvm_mmu_setup()
                      kvm_init_mmu()
      
      This patch set reset_roots to false in kmv_mmu_setup().
      
      Fixes: 50c28f21Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      83b20b28
    • J
      kvm: mmu: Don't read PDPTEs when paging is not enabled · d35b34a9
      Junaid Shahid 提交于
      kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and
      CR4.PAE = 1.
      Signed-off-by: NJunaid Shahid <junaids@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d35b34a9
    • V
      x86/kvm/lapic: always disable MMIO interface in x2APIC mode · d1766202
      Vitaly Kuznetsov 提交于
      When VMX is used with flexpriority disabled (because of no support or
      if disabled with module parameter) MMIO interface to lAPIC is still
      available in x2APIC mode while it shouldn't be (kvm-unit-tests):
      
      PASS: apic_disable: Local apic enabled in x2APIC mode
      PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
      FAIL: apic_disable: *0xfee00030: 50014
      
      The issue appears because we basically do nothing while switching to
      x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
      only check if lAPIC is disabled before proceeding to actual write.
      
      When APIC access is virtualized we correctly manipulate with VMX controls
      in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
      in x2APIC mode so there's no issue.
      
      Disabling MMIO interface seems to be easy. The question is: what do we
      do with these reads and writes? If we add apic_x2apic_mode() check to
      apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
      go to userspace. When lAPIC is in kernel, Qemu uses this interface to
      inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
      somehow works with disabled lAPIC but when we're in xAPIC mode we will
      get a real injected MSI from every write to lAPIC. Not good.
      
      The simplest solution seems to be to just ignore writes to the region
      and return ~0 for all reads when we're in x2APIC mode. This is what this
      patch does. However, this approach is inconsistent with what currently
      happens when flexpriority is enabled: we allocate APIC access page and
      create KVM memory region so in x2APIC modes all reads and writes go to
      this pre-allocated page which is, btw, the same for all vCPUs.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d1766202
  3. 15 9月, 2018 1 次提交
  4. 14 9月, 2018 1 次提交
    • J
      Revert "x86/mm/legacy: Populate the user page-table with user pgd's" · 61a6bd83
      Joerg Roedel 提交于
      This reverts commit 1f40a46c.
      
      It turned out that this patch is not sufficient to enable PTI on 32 bit
      systems with legacy 2-level page-tables. In this paging mode the huge-page
      PTEs are in the top-level page-table directory, where also the mirroring to
      the user-space page-table happens. So every huge PTE exits twice, in the
      kernel and in the user page-table.
      
      That means that accessed/dirty bits need to be fetched from two PTEs in
      this mode to be safe, but this is not trivial to implement because it needs
      changes to generic code just for the sake of enabling PTI with 32-bit
      legacy paging. As all systems that need PTI should support PAE anyway,
      remove support for PTI when 32-bit legacy paging is used.
      
      Fixes: 7757d607 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org
      61a6bd83
  5. 13 9月, 2018 2 次提交
  6. 12 9月, 2018 8 次提交
  7. 11 9月, 2018 4 次提交
    • J
      arm64: kernel: arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE · 84c57dbd
      James Morse 提交于
      Since commit 23c85094 ("proc/kcore: add vmcoreinfo note to /proc/kcore")
      the kernel has exported the vmcoreinfo PT_NOTE on /proc/kcore as well
      as /proc/vmcore.
      
      arm64 only exposes it's additional arch information via
      arch_crash_save_vmcoreinfo() if built with CONFIG_KEXEC, as kdump was
      previously the only user of vmcoreinfo.
      
      Move this weak function to a separate file that is built at the same
      time as its caller in kernel/crash_core.c. This ensures values like
      'kimage_voffset' are always present in the vmcoreinfo PT_NOTE.
      
      CC: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      84c57dbd
    • M
      arm64: jump_label.h: use asm_volatile_goto macro instead of "asm goto" · 13aceef0
      Miguel Ojeda 提交于
      All other uses of "asm goto" go through asm_volatile_goto, which avoids
      a miscompile when using GCC < 4.8.2. Replace our open-coded "asm goto"
      statements with the asm_volatile_goto macro to avoid issues with older
      toolchains.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      13aceef0
    • R
      hexagon: modify ffs() and fls() to return int · 5c41aaad
      Randy Dunlap 提交于
      Building drivers/mtd/nand/raw/nandsim.c on arch/hexagon/ produces a
      printk format build warning.  This is due to hexagon's ffs() being
      coded as returning long instead of int.
      
      Fix the printk format warning by changing all of hexagon's ffs() and
      fls() functions to return int instead of long.  The variables that
      they return are already int instead of long.  This return type
      matches the return type in <asm-generic/bitops/>.
      
      ../drivers/mtd/nand/raw/nandsim.c: In function 'init_nandsim':
      ../drivers/mtd/nand/raw/nandsim.c:760:2: warning: format '%u' expects argument of type 'unsigned int', but argument 2 has type 'long int' [-Wformat]
      
      There are no ffs() or fls() allmodconfig build errors after making this
      change.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: linux-hexagon@vger.kernel.org
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Patch-mainline: linux-kernel @ 07/22/2018, 16:03
      Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>
      5c41aaad
    • R
      arch/hexagon: fix kernel/dma.c build warning · 200f351e
      Randy Dunlap 提交于
      Fix build warning in arch/hexagon/kernel/dma.c by casting a void *
      to unsigned long to match the function parameter type.
      
      ../arch/hexagon/kernel/dma.c: In function 'arch_dma_alloc':
      ../arch/hexagon/kernel/dma.c:51:5: warning: passing argument 2 of 'gen_pool_add' makes integer from pointer without a cast [enabled by default]
      ../include/linux/genalloc.h:112:19: note: expected 'long unsigned int' but argument is of type 'void *'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: linux-sh@vger.kernel.org
      Patch-mainline: linux-kernel @ 07/20/2018, 20:17
      [rkuo@codeaurora.org: fixed architecture name]
      Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>
      200f351e
  8. 10 9月, 2018 1 次提交
    • J
      perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs · 16160c19
      Jacek Tomaka 提交于
      Problem: perf did not show branch predicted/mispredicted bit in brstack.
      
      Output of perf -F brstack for profile collected
      
      Before:
      
       0x4fdbcd/0x4fdc03/-/-/-/0
       0x45f4c1/0x4fdba0/-/-/-/0
       0x45f544/0x45f4bb/-/-/-/0
       0x45f555/0x45f53c/-/-/-/0
       0x7f66901cc24b/0x45f555/-/-/-/0
       0x7f66901cc22e/0x7f66901cc23d/-/-/-/0
       0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0
       0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0
      
      After:
      
       0x4fdbcd/0x4fdc03/P/-/-/0
       0x45f4c1/0x4fdba0/P/-/-/0
       0x45f544/0x45f4bb/P/-/-/0
       0x45f555/0x45f53c/P/-/-/0
       0x7f66901cc24b/0x45f555/P/-/-/0
       0x7f66901cc22e/0x7f66901cc23d/P/-/-/0
       0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0
       0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0
      
      Cause:
      
      As mentioned in Software Development Manual vol 3, 17.4.8.1,
      IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is
      stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as
      its format. Despite that, registers containing FROM address of the branch,
      do have MISPREDICT bit but because of the format indicated in
      IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit.
      
      Solution:
      
      Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit.
      Signed-off-by: NJacek Tomaka <jacek.tomaka@poczta.fm>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      16160c19
  9. 08 9月, 2018 4 次提交
    • N
      x86/mm: Use WRITE_ONCE() when setting PTEs · 9bc4f28a
      Nadav Amit 提交于
      When page-table entries are set, the compiler might optimize their
      assignment by using multiple instructions to set the PTE. This might
      turn into a security hazard if the user somehow manages to use the
      interim PTE. L1TF does not make our lives easier, making even an interim
      non-present PTE a security hazard.
      
      Using WRITE_ONCE() to set PTEs and friends should prevent this potential
      security hazard.
      
      I skimmed the differences in the binary with and without this patch. The
      differences are (obviously) greater when CONFIG_PARAVIRT=n as more
      code optimizations are possible. For better and worse, the impact on the
      binary with this patch is pretty small. Skimming the code did not cause
      anything to jump out as a security hazard, but it seems that at least
      move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
      9bc4f28a
    • T
      x86/apic/vector: Make error return value negative · 47b7360c
      Thomas Gleixner 提交于
      activate_managed() returns EINVAL instead of -EINVAL in case of
      error. While this is unlikely to happen, the positive return value would
      cause further malfunction at the call site.
      
      Fixes: 2db1f959 ("x86/vector: Handle managed interrupts proper")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      47b7360c
    • W
      KVM: LAPIC: Fix pv ipis out-of-bounds access · bdf7ffc8
      Wanpeng Li 提交于
      Dan Carpenter reported that the untrusted data returns from kvm_register_read()
      results in the following static checker warning:
        arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi()
        error: buffer underflow 'map->phys_map' 's32min-s32max'
      
      KVM guest can easily trigger this by executing the following assembly sequence
      in Ring0:
      
      mov $10, %rax
      mov $0xFFFFFFFF, %rbx
      mov $0xFFFFFFFF, %rdx
      mov $0, %rsi
      vmcall
      
      As this will cause KVM to execute the following code-path:
      vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi()
      which will reach out-of-bounds access.
      
      This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id,
      ignoring destinations that are not present and delivering the rest. We also check
      whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the
      max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm
      unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      [Add second "if (min > map->max_apic_id)" to complete the fix. -Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      bdf7ffc8
    • L
      KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2 · b5861e5c
      Liran Alon 提交于
      Consider the case L1 had a IRQ/NMI event until it executed
      VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
      (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
      L0 needs to evaluate if this pending event should cause an exit from
      L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
      EXTERNAL_INTERRUPT).
      
      Usually this would be handled by L0 requesting a IRQ/NMI window
      by setting VMCS accordingly. However, this setting was done on
      VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
      VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
      requesting a KVM_REQ_EVENT.
      
      Note that above scenario exists when L1 KVM is about to enter L2 but
      requests an "immediate-exit". As in this case, L1 will
      disable-interrupts and then send a self-IPI before entering L2.
      Reviewed-by: NNikita Leshchenko <nikita.leshchenko@oracle.com>
      Co-developed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b5861e5c
  10. 07 9月, 2018 2 次提交