1. 30 11月, 2020 1 次提交
  2. 26 10月, 2020 1 次提交
  3. 24 10月, 2020 5 次提交
    • V
      KVM: ioapic: break infinite recursion on lazy EOI · 77377064
      Vitaly Kuznetsov 提交于
      During shutdown the IOAPIC trigger mode is reset to edge triggered
      while the vfio-pci INTx is still registered with a resampler.
      This allows us to get into an infinite loop:
      
      ioapic_set_irq
        -> ioapic_lazy_update_eoi
        -> kvm_ioapic_update_eoi_one
        -> kvm_notify_acked_irq
        -> kvm_notify_acked_gsi
        -> (via irq_acked fn ptr) irqfd_resampler_ack
        -> kvm_set_irq
        -> (via set fn ptr) kvm_set_ioapic_irq
        -> kvm_ioapic_set_irq
        -> ioapic_set_irq
      
      Commit 8be8f932 ("kvm: ioapic: Restrict lazy EOI update to
      edge-triggered interrupts", 2020-05-04) acknowledges that this recursion
      loop exists and tries to avoid it at the call to ioapic_lazy_update_eoi,
      but at this point the scenario is already set, we have an edge interrupt
      with resampler on the same gsi.
      
      Fortunately, the only user of irq ack notifiers (in addition to resamplefd)
      is i8254 timer interrupt reinjection.  These are edge-triggered, so in
      principle they would need the call to kvm_ioapic_update_eoi_one from
      ioapic_lazy_update_eoi, but they already disable AVIC(*), so they don't
      need the lazy EOI behavior.  Therefore, remove the call to
      kvm_ioapic_update_eoi_one from ioapic_lazy_update_eoi.
      
      This fixes CVE-2020-27152.  Note that this issue cannot happen with
      SR-IOV assigned devices because virtual functions do not have INTx,
      only MSI.
      
      Fixes: f458d039 ("kvm: ioapic: Lazy update IOAPIC EOI")
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Tested-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      77377064
    • P
      KVM: vmx: rename pi_init to avoid conflict with paride · a3ff25fc
      Paolo Bonzini 提交于
      allyesconfig results in:
      
      ld: drivers/block/paride/paride.o: in function `pi_init':
      (.text+0x1340): multiple definition of `pi_init'; arch/x86/kvm/vmx/posted_intr.o:posted_intr.c:(.init.text+0x0): first defined here
      make: *** [Makefile:1164: vmlinux] Error 1
      
      because commit:
      
      commit 8888cdd0
      Author: Xiaoyao Li <xiaoyao.li@intel.com>
      Date:   Wed Sep 23 11:31:11 2020 -0700
      
          KVM: VMX: Extract posted interrupt support to separate files
      
      added another pi_init(), though one already existed in the paride code.
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3ff25fc
    • S
      KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build · 764388ce
      Sean Christopherson 提交于
      Replace a modulo operator with the more common pattern for computing the
      gfn "offset" of a huge page to fix an i386 build error.
      
        arch/x86/kvm/mmu/tdp_mmu.c:212: undefined reference to `__umoddi3'
      
      In fact, almost all of tdp_mmu.c can be elided on 32-bit builds, but
      that is a much larger patch.
      
      Fixes: 2f2fad08 ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
      Reported-by: NDaniel Díaz <daniel.diaz@linaro.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201024031150.9318-1-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      764388ce
    • R
      x86/uaccess: fix code generation in put_user() · 9c5743df
      Rasmus Villemoes 提交于
      Quoting https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html:
      
        You can define a local register variable and associate it with a
        specified register...
      
        The only supported use for this feature is to specify registers for
        input and output operands when calling Extended asm (see Extended
        Asm). This may be necessary if the constraints for a particular
        machine don't provide sufficient control to select the desired
        register.
      
      On 32-bit x86, this is used to ensure that gcc will put an 8-byte value
      into the %edx:%eax pair, while all other cases will just use the single
      register %eax (%rax on x86-64).  While the _ASM_AX actually just expands
      to "%eax", note this comment next to get_user() which does something
      very similar:
      
       * The use of _ASM_DX as the register specifier is a bit of a
       * simplification, as gcc only cares about it as the starting point
       * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
       * (%ecx being the next register in gcc's x86 register sequence), and
       * %rdx on 64 bits.
      
      However, getting this to work requires that there is no code between the
      assignment to the local register variable and its use as an input to the
      asm() which can possibly clobber any of the registers involved -
      including evaluation of the expressions making up other inputs.
      
      In the current code, the ptr expression used directly as an input may
      cause such code to be emitted.  For example, Sean Christopherson
      observed that with KASAN enabled and ptr being current->set_child_tid
      (from chedule_tail()), the load of current->set_child_tid causes a call
      to __asan_load8() to be emitted immediately prior to the __put_user_4
      call, and Naresh Kamboju reports that various mmstress tests fail on
      KASAN-enabled builds.
      
      It's also possible to synthesize a broken case without KASAN if one uses
      "foo()" as the ptr argument, with foo being some "extern u64 __user
      *foo(void);" (though I don't know if that appears in real code).
      
      Fix it by making sure ptr gets evaluated before the assignment to
      __val_pu, and add a comment that __val_pu must be the last thing
      computed before the asm() is entered.
      
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Fixes: d55564cf ("x86: Make __put_user() generate an out-of-line call")
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c5743df
    • H
      parisc: Add wrapper syscalls to fix O_NONBLOCK flag usage · 44a4c9e4
      Helge Deller 提交于
      The commit 75ae0420 ("parisc: Define O_NONBLOCK to become
      000200000") changed the O_NONBLOCK constant to have only one bit set
      (like all other architectures). This change broke some existing
      userspace code (e.g.  udevadm, systemd-udevd, elogind) which called
      specific syscalls which do strict value checking on their flag
      parameter.
      
      This patch adds wrapper functions for the relevant syscalls. The
      wrappers masks out any old invalid O_NONBLOCK flags, reports in the
      syslog if the old O_NONBLOCK value was used and then calls the target
      syscall with the new O_NONBLOCK value.
      
      Fixes: 75ae0420 ("parisc: Define O_NONBLOCK to become 000200000")
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NJeroen Roovers <jer@xs4all.nl>
      44a4c9e4
  4. 23 10月, 2020 14 次提交
  5. 22 10月, 2020 19 次提交
    • G
      powerpc/pseries: Avoid using addr_to_pfn in real mode · 4ff753fe
      Ganesh Goudar 提交于
      When an UE or memory error exception is encountered the MCE handler
      tries to find the pfn using addr_to_pfn() which takes effective
      address as an argument, later pfn is used to poison the page where
      memory error occurred, recent rework in this area made addr_to_pfn
      to run in real mode, which can be fatal as it may try to access
      memory outside RMO region.
      
      Have two helper functions to separate things to be done in real mode
      and virtual mode without changing any functionality. This also fixes
      the following error as the use of addr_to_pfn is now moved to virtual
      mode.
      
      Without this change following kernel crash is seen on hitting UE.
      
      [  485.128036] Oops: Kernel access of bad area, sig: 11 [#1]
      [  485.128040] LE SMP NR_CPUS=2048 NUMA pSeries
      [  485.128047] Modules linked in:
      [  485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 #22
      [  485.128074] NIP:  c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0
      [  485.128078] REGS: c000000003f1f970 TRAP: 0300   Tainted: G OE (5.7.0)
      [  485.128082] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 28008284  XER: 00000001
      [  485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1
      [  485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000
      [  485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022
      [  485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14
      [  485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000
      [  485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100
      [  485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50
      [  485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a
      [  485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a
      [  485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310
      [  485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170
      [  485.128133] Call Trace:
      [  485.128135] Instruction dump:
      [  485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff
      [  485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040
      [  485.128152] ---[ end trace d34b27e29ae0e340 ]---
      
      Fixes: 9ca766f9 ("powerpc/64s/pseries: machine check convert to use common event code")
      Signed-off-by: NGanesh Goudar <ganeshgr@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200724063946.21378-1-ganeshgr@linux.ibm.com
      4ff753fe
    • C
      powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9 · 592bbe9c
      Christophe Leroy 提交于
      GCC 4.9 sometimes fails to build with "m<>" constraint in
      inline assembly.
      
        CC      lib/iov_iter.o
      In file included from ./arch/powerpc/include/asm/cmpxchg.h:6:0,
                       from ./arch/powerpc/include/asm/atomic.h:11,
                       from ./include/linux/atomic.h:7,
                       from ./include/linux/crypto.h:15,
                       from ./include/crypto/hash.h:11,
                       from lib/iov_iter.c:2:
      lib/iov_iter.c: In function 'iovec_from_user.part.30':
      ./arch/powerpc/include/asm/uaccess.h:287:2: error: 'asm' operand has impossible constraints
        __asm__ __volatile__(    \
        ^
      ./include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
       # define unlikely(x) __builtin_expect(!!(x), 0)
                                                ^
      ./arch/powerpc/include/asm/uaccess.h:583:34: note: in expansion of macro 'unsafe_op_wrap'
       #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
                                        ^
      ./arch/powerpc/include/asm/uaccess.h:329:10: note: in expansion of macro '__get_user_asm'
        case 4: __get_user_asm(x, (u32 __user *)ptr, retval, "lwz"); break; \
                ^
      ./arch/powerpc/include/asm/uaccess.h:363:3: note: in expansion of macro '__get_user_size_allowed'
         __get_user_size_allowed(__gu_val, __gu_addr, __gu_size, __gu_err); \
         ^
      ./arch/powerpc/include/asm/uaccess.h:100:2: note: in expansion of macro '__get_user_nocheck'
        __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false)
        ^
      ./arch/powerpc/include/asm/uaccess.h:583:49: note: in expansion of macro '__get_user_allowed'
       #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
                                                       ^
      lib/iov_iter.c:1663:3: note: in expansion of macro 'unsafe_get_user'
         unsafe_get_user(len, &uiov[i].iov_len, uaccess_end);
         ^
      make[1]: *** [scripts/Makefile.build:283: lib/iov_iter.o] Error 1
      
      Define a UPD_CONSTR macro that is "<>" by default and
      only "" with GCC prior to GCC 5.
      
      Fixes: fcf1f268 ("powerpc/uaccess: Add pre-update addressing to __put_user_asm_goto()")
      Fixes: 2f279eeb ("powerpc/uaccess: Add pre-update addressing to __get_user_asm() and __put_user_asm()")
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Acked-by: NSegher Boessenkool <segher@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/212d3bc4a52ca71523759517bb9c61f7e477c46a.1603179582.git.christophe.leroy@csgroup.eu
      592bbe9c
    • O
      powerpc/eeh: Fix eeh_dev_check_failure() for PE#0 · 99f6e979
      Oliver O'Halloran 提交于
      In commit 269e5833 ("powerpc/eeh: Delete eeh_pe->config_addr") the
      following simplification was made:
      
      -       if (!pe->addr && !pe->config_addr) {
      +       if (!pe->addr) {
                      eeh_stats.no_cfg_addr++;
                      return 0;
              }
      
      This introduced a bug which causes EEH checking to be skipped for
      devices in PE#0.
      
      Before the change above the check would always pass since at least one
      of the two PE addresses would be non-zero in all circumstances. On
      PowerNV pe->config_addr would be the BDFN of the first device added to
      the PE. The zero BDFN is reserved for the PHB's root port, but this is
      fine since for obscure platform reasons the root port is never
      assigned to PE#0.
      
      Similarly, on pseries pe->addr has always been non-zero for the
      reasons outlined in commit 42de19d5 ("powerpc/pseries/eeh: Allow
      zero to be a valid PE configuration address").
      
      We can fix the problem by deleting the block entirely The original
      purpose of this test was to avoid performing EEH checks on devices
      that were not on an EEH capable bus. In modern Linux the edev->pe
      pointer will be NULL for devices that are not on an EEH capable bus.
      The code block immediately above this one already checks for the
      edev->pe == NULL case so this test (new and old) is entirely
      redundant.
      
      Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments
      it any more. Unfortunately, that information is exposed via
      /proc/powerpc/eeh which means it's technically ABI. We could make it
      hard-coded, but that's a change for another patch.
      
      Fixes: 269e5833 ("powerpc/eeh: Delete eeh_pe->config_addr")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20201021232554.1434687-1-oohall@gmail.com
      99f6e979
    • B
      kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg · 7d945312
      Ben Gardon 提交于
      In order to avoid creating executable hugepages in the TDP MMU PF
      handler, remove the dependency between disallowed_hugepage_adjust and
      the shadow_walk_iterator. This will open the function up to being used
      by the TDP MMU PF handler in a future patch.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-10-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7d945312
    • B
      kvm: x86/mmu: Support zapping SPTEs in the TDP MMU · faaf05b0
      Ben Gardon 提交于
      Add functions to zap SPTEs to the TDP MMU. These are needed to tear down
      TDP MMU roots properly and implement other MMU functions which require
      tearing down mappings. Future patches will add functions to populate the
      page tables, but as for this patch there will not be any work for these
      functions to do.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-8-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      faaf05b0
    • B
      kvm: x86/mmu: Add functions to handle changed TDP SPTEs · 2f2fad08
      Ben Gardon 提交于
      The existing bookkeeping done by KVM when a PTE is changed is spread
      around several functions. This makes it difficult to remember all the
      stats, bitmaps, and other subsystems that need to be updated whenever a
      PTE is modified. When a non-leaf PTE is marked non-present or becomes a
      leaf PTE, page table memory must also be freed. To simplify the MMU and
      facilitate the use of atomic operations on SPTEs in future patches, create
      functions to handle some of the bookkeeping required as a result of
      a change.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2f2fad08
    • B
      kvm: x86/mmu: Allocate and free TDP MMU roots · 02c00b3a
      Ben Gardon 提交于
      The TDP MMU must be able to allocate paging structure root pages and track
      the usage of those pages. Implement a similar, but separate system for root
      page allocation to that of the x86 shadow paging implementation. When
      future patches add synchronization model changes to allow for parallel
      page faults, these pages will need to be handled differently from the
      x86 shadow paging based MMU's root pages.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      02c00b3a
    • B
      kvm: x86/mmu: Init / Uninit the TDP MMU · fe5db27d
      Ben Gardon 提交于
      The TDP MMU offers an alternative mode of operation to the x86 shadow
      paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU
      will require new fields that need to be initialized and torn down. Add
      hooks into the existing KVM MMU initialization process to do that
      initialization / cleanup. Currently the initialization and cleanup
      fucntions do not do very much, however more operations will be added in
      future patches.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-4-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fe5db27d
    • B
      kvm: x86/mmu: Introduce tdp_iter · c9180b72
      Ben Gardon 提交于
      The TDP iterator implements a pre-order traversal of a TDP paging
      structure. This iterator will be used in future patches to create
      an efficient implementation of the KVM MMU for the TDP case.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c9180b72
    • P
      KVM: mmu: extract spte.h and spte.c · 5a9624af
      Paolo Bonzini 提交于
      The SPTE format will be common to both the shadow and the TDP MMU.
      
      Extract code that implements the format to a separate module, as a
      first step towards adding the TDP MMU and putting mmu.c on a diet.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5a9624af
    • P
      KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp · cb3eedab
      Paolo Bonzini 提交于
      The TDP MMU's own function for the changed-PTE notifier will need to be
      update a PTE in the exact same way as the shadow MMU.  Rather than
      re-implementing this logic, factor the SPTE creation out of kvm_set_pte_rmapp.
      
      Extracted out of a patch by Ben Gardon. <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb3eedab
    • B
      kvm: x86/mmu: Separate making SPTEs from set_spte · 799a4190
      Ben Gardon 提交于
      Separate the functions for generating leaf page table entries from the
      function that inserts them into the paging structure. This refactoring
      will facilitate changes to the MMU sychronization model to use atomic
      compare / exchanges (which are not guaranteed to succeed) instead of a
      monolithic MMU lock.
      
      No functional change expected.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This commit introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      799a4190
    • B
      kvm: mmu: Separate making non-leaf sptes from link_shadow_page · cc4674d0
      Ben Gardon 提交于
      The TDP MMU page fault handler will need to be able to create non-leaf
      SPTEs to build up the paging structures. Rather than re-implementing the
      function, factor the SPTE creation out of link_shadow_page.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20200925212302.3979661-9-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cc4674d0
    • J
      KVM: PPC: Book3S HV: Make struct kernel_param_ops definition const · a4f1d94e
      Joe Perches 提交于
      This should be const, so make it so.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Message-Id: <d130e88dd4c82a12d979da747cc0365c72c3ba15.1601770305.git.joe@perches.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a4f1d94e
    • L
      KVM: x86: Let the guest own CR4.FSGSBASE · 30031c2b
      Lai Jiangshan 提交于
      Add FSGSBASE to the set of possible guest-owned CR4 bits, i.e. let the
      guest own it on VMX.  KVM never queries the guest's CR4.FSGSBASE value,
      thus there is no reason to force VM-Exit on FSGSBASE being toggled.
      
      Note, because FSGSBASE is conditionally available, this is dependent on
      recent changes to intercept reserved CR4 bits and to update the CR4
      guest/host mask in response to guest CPUID changes.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      [sean: added justification in changelog]
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-6-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      30031c2b
    • S
      KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault · 2ed41aa6
      Sean Christopherson 提交于
      Intercept CR4 bits that are guest reserved so that KVM correctly injects
      a #GP fault if the guest attempts to set a reserved bit.  If a feature
      is supported by the CPU but is not exposed to the guest, and its
      associated CR4 bit is not intercepted by KVM by default, then KVM will
      fail to inject a #GP if the guest sets the CR4 bit without triggering
      an exit, e.g. by toggling only the bit in question.
      
      Note, KVM doesn't give the guest direct access to any CR4 bits that are
      also dependent on guest CPUID.  Yet.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-5-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2ed41aa6
    • S
      KVM: x86: Move call to update_exception_bitmap() into VMX code · a6337a35
      Sean Christopherson 提交于
      Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called
      back-to-back, subsume the exception bitmap update into the common CPUID
      update.  Drop the SVM invocation entirely as SVM's exception bitmap
      doesn't vary with respect to guest CPUID.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-4-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a6337a35
    • S
      KVM: x86: Invoke vendor's vcpu_after_set_cpuid() after all common updates · c44d9b34
      Sean Christopherson 提交于
      Move the call to kvm_x86_ops.vcpu_after_set_cpuid() to the very end of
      kvm_vcpu_after_set_cpuid() to allow the vendor implementation to react
      to changes made by the common code.  In the near future, this will be
      used by VMX to update its CR4 guest/host masks to account for reserved
      bits.  In the long term, SGX support will update the allowed XCR0 mask
      for enclaves based on the vCPU's allowed XCR0.
      
      vcpu_after_set_cpuid() (nee kvm_update_cpuid()) was originally added by
      commit 2acf923e ("KVM: VMX: Enable XSAVE/XRSTOR for guest"), and was
      called separately after kvm_x86_ops.vcpu_after_set_cpuid() (nee
      kvm_x86_ops->cpuid_update()).  There is no indication that the placement
      of the common code updates after the vendor updates was anything more
      than a "new function at the end" decision.
      
      Inspection of the current code reveals no dependency on kvm_x86_ops'
      vcpu_after_set_cpuid() in kvm_vcpu_after_set_cpuid() or any of its
      helpers.  The bulk of the common code depends only on the guest's CPUID
      configuration, kvm_mmu_reset_context() does not consume dynamic vendor
      state, and there are no collisions between kvm_pmu_refresh() and VMX's
      update of PT state.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-3-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c44d9b34
    • L
      KVM: x86: Intercept LA57 to inject #GP fault when it's reserved · 6e1d849f
      Lai Jiangshan 提交于
      Unconditionally intercept changes to CR4.LA57 so that KVM correctly
      injects a #GP fault if the guest attempts to set CR4.LA57 when it's
      supported in hardware but not exposed to the guest.
      
      Long term, KVM needs to properly handle CR4 bits that can be under guest
      control but also may be reserved from the guest's perspective.  But, KVM
      currently sets the CR4 guest/host mask only during vCPU creation, and
      reworking flows to change that will take a bit of elbow grease.
      
      Even if/when generic support for intercepting reserved bits exists, it's
      probably not worth letting the guest set CR4.LA57 directly.  LA57 can't
      be toggled while long mode is enabled, thus it's all but guaranteed to
      be set once (maybe twice, e.g. by BIOS and kernel) during boot and never
      touched again.  On the flip side, letting the guest own CR4.LA57 may
      incur extra VMREADs.  In other words, this temporary "hack" is probably
      also the right long term fix.
      
      Fixes: fd8cb433 ("KVM: MMU: Expose the LA57 feature to VM.")
      Cc: stable@vger.kernel.org
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      [sean: rewrote changelog]
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200930041659.28181-2-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6e1d849f