1. 12 6月, 2021 2 次提交
  2. 11 6月, 2021 3 次提交
    • E
      coredump: Limit what can interrupt coredumps · 06af8679
      Eric W. Biederman 提交于
      Olivier Langlois has been struggling with coredumps being incompletely written in
      processes using io_uring.
      
      Olivier Langlois <olivier@trillion01.com> writes:
      > io_uring is a big user of task_work and any event that io_uring made a
      > task waiting for that occurs during the core dump generation will
      > generate a TIF_NOTIFY_SIGNAL.
      >
      > Here are the detailed steps of the problem:
      > 1. io_uring calls vfs_poll() to install a task to a file wait queue
      >    with io_async_wake() as the wakeup function cb from io_arm_poll_handler()
      > 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL
      > 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling
      >    set_notify_signal()
      
      The coredump code deliberately supports being interrupted by SIGKILL,
      and depends upon prepare_signal to filter out all other signals.   Now
      that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack
      in dump_emitted by the coredump code no longer works.
      
      Make the coredump code more robust by explicitly testing for all of
      the wakeup conditions the coredump code supports.  This prevents
      new wakeup conditions from breaking the coredump code, as well
      as fixing the current issue.
      
      The filesystem code that the coredump code uses already limits
      itself to only aborting on fatal_signal_pending.  So it should
      not develop surprising wake-up reasons either.
      
      v2: Don't remove the now unnecessary code in prepare_signal.
      
      Cc: stable@vger.kernel.org
      Fixes: 12db8b69 ("entry: Add support for TIF_NOTIFY_SIGNAL")
      Reported-by: NOlivier Langlois <olivier@trillion01.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06af8679
    • L
      Merge branch 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · f09eacca
      Linus Torvalds 提交于
      Pull cgroup fix from Tejun Heo:
       "This is a high priority but low risk fix for a cgroup1 bug where
        rename(2) can change a cgroup's name to something which can break
        parsing of /proc/PID/cgroup"
      
      * 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup1: don't allow '\n' in renaming
      f09eacca
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 29a877d5
      Linus Torvalds 提交于
      Pull rdma fixes from Jason Gunthorpe:
       "A mixture of small bug fixes and a small security issue:
      
         - WARN_ON when IPoIB is automatically moved between namespaces
      
         - Long standing bug where mlx5 would use the wrong page for the
           doorbell recovery memory if fork is used
      
         - Security fix for mlx4 that disables the timestamp feature
      
         - Several crashers for mlx5
      
         - Plug a recent mlx5 memory leak for the sig_mr"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        IB/mlx5: Fix initializing CQ fragments buffer
        RDMA/mlx5: Delete right entry from MR signature database
        RDMA: Verify port when creating flow rule
        RDMA/mlx5: Block FDB rules when not in switchdev mode
        RDMA/mlx4: Do not map the core_clock page to user space unless enabled
        RDMA/mlx5: Use different doorbell memory for different processes
        RDMA/ipoib: Fix warning caused by destroying non-initial netns
      29a877d5
  3. 10 6月, 2021 15 次提交
  4. 09 6月, 2021 16 次提交
    • P
      kvm: fix previous commit for 32-bit builds · 4422829e
      Paolo Bonzini 提交于
      array_index_nospec does not work for uint64_t on 32-bit builds.
      However, the size of a memory slot must be less than 20 bits wide
      on those system, since the memory slot must fit in the user
      address space.  So just store it in an unsigned long.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4422829e
    • P
      kvm: avoid speculation-based attacks from out-of-range memslot accesses · da27a83f
      Paolo Bonzini 提交于
      KVM's mechanism for accessing guest memory translates a guest physical
      address (gpa) to a host virtual address using the right-shifted gpa
      (also known as gfn) and a struct kvm_memory_slot.  The translation is
      performed in __gfn_to_hva_memslot using the following formula:
      
            hva = slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE
      
      It is expected that gfn falls within the boundaries of the guest's
      physical memory.  However, a guest can access invalid physical addresses
      in such a way that the gfn is invalid.
      
      __gfn_to_hva_memslot is called from kvm_vcpu_gfn_to_hva_prot, which first
      retrieves a memslot through __gfn_to_memslot.  While __gfn_to_memslot
      does check that the gfn falls within the boundaries of the guest's
      physical memory or not, a CPU can speculate the result of the check and
      continue execution speculatively using an illegal gfn. The speculation
      can result in calculating an out-of-bounds hva.  If the resulting host
      virtual address is used to load another guest physical address, this
      is effectively a Spectre gadget consisting of two consecutive reads,
      the second of which is data dependent on the first.
      
      Right now it's not clear if there are any cases in which this is
      exploitable.  One interesting case was reported by the original author
      of this patch, and involves visiting guest page tables on x86.  Right
      now these are not vulnerable because the hva read goes through get_user(),
      which contains an LFENCE speculation barrier.  However, there are
      patches in progress for x86 uaccess.h to mask kernel addresses instead of
      using LFENCE; once these land, a guest could use speculation to read
      from the VMM's ring 3 address space.  Other architectures such as ARM
      already use the address masking method, and would be susceptible to
      this same kind of data-dependent access gadgets.  Therefore, this patch
      proactively protects from these attacks by masking out-of-bounds gfns
      in __gfn_to_hva_memslot, which blocks speculation of invalid hvas.
      
      Sean Christopherson noted that this patch does not cover
      kvm_read_guest_offset_cached.  This however is limited to a few bytes
      past the end of the cache, and therefore it is unlikely to be useful in
      the context of building a chain of data dependent accesses.
      Reported-by: NArtemiy Margaritov <artemiy.margaritov@gmail.com>
      Co-developed-by: NArtemiy Margaritov <artemiy.margaritov@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da27a83f
    • L
      KVM: x86: Unload MMU on guest TLB flush if TDP disabled to force MMU sync · b53e84ee
      Lai Jiangshan 提交于
      When using shadow paging, unload the guest MMU when emulating a guest TLB
      flush to ensure all roots are synchronized.  From the guest's perspective,
      flushing the TLB ensures any and all modifications to its PTEs will be
      recognized by the CPU.
      
      Note, unloading the MMU is overkill, but is done to mirror KVM's existing
      handling of INVPCID(all) and ensure the bug is squashed.  Future cleanup
      can be done to more precisely synchronize roots when servicing a guest
      TLB flush.
      
      If TDP is enabled, synchronizing the MMU is unnecessary even if nested
      TDP is in play, as a "legacy" TLB flush from L1 does not invalidate L1's
      TDP mappings.  For EPT, an explicit INVEPT is required to invalidate
      guest-physical mappings; for NPT, guest mappings are always tagged with
      an ASID and thus can only be invalidated via the VMCB's ASID control.
      
      This bug has existed since the introduction of KVM_VCPU_FLUSH_TLB.
      It was only recently exposed after Linux guests stopped flushing the
      local CPU's TLB prior to flushing remote TLBs (see commit 4ce94eab,
      "x86/mm/tlb: Flush remote and local TLBs concurrently"), but is also
      visible in Windows 10 guests.
      Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Fixes: f38a7b75 ("KVM: X86: support paravirtualized help for TLB shootdowns")
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      [sean: massaged comment and changelog]
      Message-Id: <20210531172256.2908-1-jiangshanlai@gmail.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b53e84ee
    • M
      RDMA/mlx5: Block FDB rules when not in switchdev mode · edc0b0bc
      Mark Bloch 提交于
      Allow creating FDB steering rules only when in switchdev mode.
      
      The only software model where a userspace application can manipulate
      FDB entries is when it manages the eswitch. This is only possible in
      switchdev mode where we expose a single RDMA device with representors
      for all the vports that are connected to the eswitch.
      
      Fixes: 52438be4 ("RDMA/mlx5: Allow inserting a steering rule to the FDB")
      Link: https://lore.kernel.org/r/e928ae7c58d07f104716a2a8d730963d1bd01204.1623052923.git.leonro@nvidia.comReviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      edc0b0bc
    • S
      KVM: x86: Ensure liveliness of nested VM-Enter fail tracepoint message · f31500b0
      Sean Christopherson 提交于
      Use the __string() machinery provided by the tracing subystem to make a
      copy of the string literals consumed by the "nested VM-Enter failed"
      tracepoint.  A complete copy is necessary to ensure that the tracepoint
      can't outlive the data/memory it consumes and deference stale memory.
      
      Because the tracepoint itself is defined by kvm, if kvm-intel and/or
      kvm-amd are built as modules, the memory holding the string literals
      defined by the vendor modules will be freed when the module is unloaded,
      whereas the tracepoint and its data in the ring buffer will live until
      kvm is unloaded (or "indefinitely" if kvm is built-in).
      
      This bug has existed since the tracepoint was added, but was recently
      exposed by a new check in tracing to detect exactly this type of bug.
      
        fmt: '%s%s
        ' current_buffer: ' vmx_dirty_log_t-140127  [003] ....  kvm_nested_vmenter_failed: '
        WARNING: CPU: 3 PID: 140134 at kernel/trace/trace.c:3759 trace_check_vprintf+0x3be/0x3e0
        CPU: 3 PID: 140134 Comm: less Not tainted 5.13.0-rc1-ce2e73ce600a-req #184
        Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
        RIP: 0010:trace_check_vprintf+0x3be/0x3e0
        Code: <0f> 0b 44 8b 4c 24 1c e9 a9 fe ff ff c6 44 02 ff 00 49 8b 97 b0 20
        RSP: 0018:ffffa895cc37bcb0 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffffa895cc37bd08 RCX: 0000000000000027
        RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff9766cfad74f8
        RBP: ffffffffc0a041d4 R08: ffff9766cfad74f0 R09: ffffa895cc37bad8
        R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffc0a041d4
        R13: ffffffffc0f4dba8 R14: 0000000000000000 R15: ffff976409f2c000
        FS:  00007f92fa200740(0000) GS:ffff9766cfac0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000559bd11b0000 CR3: 000000019fbaa002 CR4: 00000000001726e0
        Call Trace:
         trace_event_printf+0x5e/0x80
         trace_raw_output_kvm_nested_vmenter_failed+0x3a/0x60 [kvm]
         print_trace_line+0x1dd/0x4e0
         s_show+0x45/0x150
         seq_read_iter+0x2d5/0x4c0
         seq_read+0x106/0x150
         vfs_read+0x98/0x180
         ksys_read+0x5f/0xe0
         do_syscall_64+0x40/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Fixes: 380e0055 ("KVM: nVMX: trace nested VM-Enter failures detected by H/W")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Message-Id: <20210607175748.674002-1-seanjc@google.com>
      f31500b0
    • L
      Merge tag 'for-linus-5.13b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 368094df
      Linus Torvalds 提交于
      Pull xen fix from Juergen Gross:
       "A single patch fixing a Xen related security bug: a malicious guest
        might be able to trigger a 'use after free' issue in the xen-netback
        driver"
      
      * tag 'for-linus-5.13b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen-netback: take a reference to the RX task thread
      368094df
    • Z
      selftests: kvm: Add support for customized slot0 memory size · f53b16ad
      Zhenzhong Duan 提交于
      Until commit 39fe2fc9 ("selftests: kvm: make allocation of extra
      memory take effect", 2021-05-27), parameter extra_mem_pages was used
      only to calculate the page table size for all the memory chunks,
      because real memory allocation happened with calls of
      vm_userspace_mem_region_add() after vm_create_default().
      
      Commit 39fe2fc9 however changed the meaning of extra_mem_pages to
      the size of memory slot 0.  This makes the memory allocation more
      flexible, but makes it harder to account for the number of
      pages needed for the page tables.  For example, memslot_perf_test
      has a small amount of memory in slot 0 but a lot in other slots,
      and adding that memory twice (both in slot 0 and with later
      calls to vm_userspace_mem_region_add()) causes an error that
      was fixed in commit 000ac429 ("selftests: kvm: fix overlapping
      addresses in memslot_perf_test", 2021-05-29)
      
      Since both uses are sensible, add a new parameter slot0_mem_pages
      to vm_create_with_vcpus() and some comments to clarify the meaning of
      slot0_mem_pages and extra_mem_pages.  With this change,
      memslot_perf_test can go back to passing the number of memory
      pages as extra_mem_pages.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
      Message-Id: <20210608233816.423958-4-zhenzhong.duan@intel.com>
      [Squashed in a single patch and rewrote the commit message. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f53b16ad
    • L
      Merge tag 'orphans-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 374aeb91
      Linus Torvalds 提交于
      Pull orphan section fixes from Kees Cook:
       "These two corner case fixes have been in -next for about a week:
      
         - Avoid orphan section in ARM cpuidle (Arnd Bergmann)
      
         - Avoid orphan section with !SMP (Nathan Chancellor)"
      
      * tag 'orphans-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        vmlinux.lds.h: Avoid orphan section with !SMP
        ARM: cpuidle: Avoid orphan section warning
      374aeb91
    • K
      proc: Track /proc/$pid/attr/ opener mm_struct · 591a22c1
      Kees Cook 提交于
      Commit bfb819ea ("proc: Check /proc/$pid/attr/ writes against file opener")
      tried to make sure that there could not be a confusion between the opener of
      a /proc/$pid/attr/ file and the writer. It used struct cred to make sure
      the privileges didn't change. However, there were existing cases where a more
      privileged thread was passing the opened fd to a differently privileged thread
      (during container setup). Instead, use mm_struct to track whether the opener
      and writer are still the same process. (This is what several other proc files
      already do, though for different reasons.)
      Reported-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Reported-by: NAndrea Righi <andrea.righi@canonical.com>
      Tested-by: NAndrea Righi <andrea.righi@canonical.com>
      Fixes: bfb819ea ("proc: Check /proc/$pid/attr/ writes against file opener")
      Cc: stable@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      591a22c1
    • C
      KVM: selftests: introduce P47V64 for s390x · 1bc603af
      Christian Borntraeger 提交于
      s390x can have up to 47bits of physical guest and 64bits of virtual
      address  bits. Add a new address mode to avoid errors of testcases
      going beyond 47bits.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Message-Id: <20210608123954.10991-1-borntraeger@de.ibm.com>
      Fixes: ef4c9f4f ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
      Cc: stable@vger.kernel.org
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1bc603af
    • L
      KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior · af3511ff
      Lai Jiangshan 提交于
      In record_steal_time(), st->preempted is read twice, and
      trace_kvm_pv_tlb_flush() might output result inconsistent if
      kvm_vcpu_flush_tlb_guest() see a different st->preempted later.
      
      It is a very trivial problem and hardly has actual harm and can be
      avoided by reseting and reading st->preempted in atomic way via xchg().
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      
      Message-Id: <20210531174628.10265-1-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af3511ff
    • L
      Merge tag 'spi-fix-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 4c8684fe
      Linus Torvalds 提交于
      Pull spi fixes from Mark Brown:
       "A small set of SPI fixes that have come up since the merge window, all
        fairly small fixes for rare cases"
      
      * tag 'spi-fix-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: stm32-qspi: Always wait BUSY bit to be cleared in stm32_qspi_wait_cmd()
        spi: spi-zynq-qspi: Fix some wrong goto jumps & missing error code
        spi: Cleanup on failure of initial setup
        spi: bcm2835: Fix out-of-bounds access with more than 4 slaves
      4c8684fe
    • L
      Merge tag 'regulator-fix-v5.13-rc4' of... · 9b1111fa
      Linus Torvalds 提交于
      Merge tag 'regulator-fix-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fixes from Mark Brown:
       "A collection of fixes for the regulator API that have come up since
        the merge window, including a big batch of fixes from Axel Lin's usual
        careful and detailed review.
      
        The one stand out fix here is Dmitry Baryshkov's fix for an issue
        where we fail to power on the parents of always on regulators during
        system startup if they weren't already powered on"
      
      * tag 'regulator-fix-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (21 commits)
        regulator: rt4801: Fix NULL pointer dereference if priv->enable_gpios is NULL
        regulator: hi6421v600: Fix .vsel_mask setting
        regulator: bd718x7: Fix the BUCK7 voltage setting on BD71837
        regulator: atc260x: Fix n_voltages and min_sel for pickable linear ranges
        regulator: rtmv20: Fix to make regcache value first reading back from HW
        regulator: mt6315: Fix function prototype for mt6315_map_mode
        regulator: rtmv20: Add Richtek to Kconfig text
        regulator: rtmv20: Fix .set_current_limit/.get_current_limit callbacks
        regulator: hisilicon: use the correct HiSilicon copyright
        regulator: bd71828: Fix .n_voltages settings
        regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting
        regulator: max77620: Silence deferred probe error
        regulator: max77620: Use device_set_of_node_from_dev()
        regulator: scmi: Fix off-by-one for linear regulators .n_voltages setting
        regulator: core: resolve supply for boot-on/always-on regulators
        regulator: fixed: Ensure enable_counter is correct if reg_domain_disable fails
        regulator: Check ramp_delay_table for regulator_set_ramp_delay_regmap
        regulator: fan53880: Fix missing n_voltages setting
        regulator: da9121: Return REGULATOR_MODE_INVALID for invalid mode
        regulator: fan53555: fix TCS4525 voltage calulation
        ...
      9b1111fa
    • L
      KVM: X86: MMU: Use the correct inherited permissions to get shadow page · b1bd5cba
      Lai Jiangshan 提交于
      When computing the access permissions of a shadow page, use the effective
      permissions of the walk up to that point, i.e. the logic AND of its parents'
      permissions.  Two guest PxE entries that point at the same table gfn need to
      be shadowed with different shadow pages if their parents' permissions are
      different.  KVM currently uses the effective permissions of the last
      non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
      full ("uwx") permissions, and the effective permissions are recorded only
      in role.access and merged into the leaves, this can lead to incorrect
      reuse of a shadow page and eventually to a missing guest protection page
      fault.
      
      For example, here is a shared pagetable:
      
         pgd[]   pud[]        pmd[]            virtual address pointers
                           /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
              /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
         pgd-|           (shared pmd[] as above)
              \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
                           \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
      
        pud1 and pud2 point to the same pmd table, so:
        - ptr1 and ptr3 points to the same page.
        - ptr2 and ptr4 points to the same page.
      
      (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
      
      - First, the guest reads from ptr1 first and KVM prepares a shadow
        page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
        "u--" comes from the effective permissions of pgd, pud1 and
        pmd1, which are stored in pt->access.  "u--" is used also to get
        the pagetable for pud1, instead of "uw-".
      
      - Then the guest writes to ptr2 and KVM reuses pud1 which is present.
        The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
        even though the pud1 pmd (because of the incorrect argument to
        kvm_mmu_get_page in the previous step) has role.access="u--".
      
      - Then the guest reads from ptr3.  The hypervisor reuses pud1's
        shadow pmd for pud2, because both use "u--" for their permissions.
        Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
      
      - At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
        because pud1's shadow page structures included an "uw-" page even though
        its role.access was "u--".
      
      Any kind of shared pagetable might have the similar problem when in
      virtual machine without TDP enabled if the permissions are different
      from different ancestors.
      
      In order to fix the problem, we change pt->access to be an array, and
      any access in it will not include permissions ANDed from child ptes.
      
      The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
      Remember to test it with TDP disabled.
      
      The problem had existed long before the commit 41074d07 ("KVM: MMU:
      Fix inherited permissions for emulated guest pte updates"), and it
      is hard to find which is the culprit.  So there is no fixes tag here.
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: cea0f0e7 ("[PATCH] KVM: MMU: Shadow page table caching")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b1bd5cba
    • W
      KVM: LAPIC: Write 0 to TMICT should also cancel vmx-preemption timer · e898da78
      Wanpeng Li 提交于
      According to the SDM 10.5.4.1:
      
        A write of 0 to the initial-count register effectively stops the local
        APIC timer, in both one-shot and periodic mode.
      
      However, the lapic timer oneshot/periodic mode which is emulated by vmx-preemption
      timer doesn't stop by writing 0 to TMICT since vmx->hv_deadline_tsc is still
      programmed and the guest will receive the spurious timer interrupt later. This
      patch fixes it by also cancelling the vmx-preemption timer when writing 0 to
      the initial-count register.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1623050385-100988-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e898da78
    • A
      KVM: SVM: Fix SEV SEND_START session length & SEND_UPDATE_DATA query length... · 4f13d471
      Ashish Kalra 提交于
      KVM: SVM: Fix SEV SEND_START session length & SEND_UPDATE_DATA query length after commit 238eca82
      
      Commit 238eca82 ("KVM: SVM: Allocate SEV command structures on local stack")
      uses the local stack to allocate the structures used to communicate with the PSP,
      which were earlier being kzalloced. This breaks SEV live migration for
      computing the SEND_START session length and SEND_UPDATE_DATA query length as
      session_len and trans_len and hdr_len fields are not zeroed respectively for
      the above commands before issuing the SEV Firmware API call, hence the
      firmware returns incorrect session length and update data header or trans length.
      
      Also the SEV Firmware API returns SEV_RET_INVALID_LEN firmware error
      for these length query API calls, and the return value and the
      firmware error needs to be passed to the userspace as it is, so
      need to remove the return check in the KVM code.
      Signed-off-by: NAshish Kalra <ashish.kalra@amd.com>
      Message-Id: <20210607061532.27459-1-Ashish.Kalra@amd.com>
      Fixes: 238eca82 ("KVM: SVM: Allocate SEV command structures on local stack")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f13d471
  5. 08 6月, 2021 3 次提交
  6. 07 6月, 2021 1 次提交