1. 30 11月, 2022 1 次提交
    • D
      KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured · d8ba8ba4
      David Woodhouse 提交于
      Closer inspection of the Xen code shows that we aren't supposed to be
      using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
      explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
      If we randomly set the top bit of ->state_entry_time for a guest that
      hasn't asked for it and doesn't expect it, that could make the runtimes
      fail to add up and confuse the guest. Without the flag it's perfectly
      safe for a vCPU to read its own vcpu_runstate_info; just not for one
      vCPU to read *another's*.
      
      I briefly pondered adding a word for the whole set of VMASST_TYPE_*
      flags but the only one we care about for HVM guests is this, so it
      seemed a bit pointless.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d8ba8ba4
  2. 29 11月, 2022 1 次提交
  3. 23 11月, 2022 2 次提交
    • C
      KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE · 8c516b25
      Claudio Imbrenda 提交于
      Add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE to signal that the
      KVM_PV_ASYNC_DISABLE and KVM_PV_ASYNC_DISABLE_PREPARE commands for the
      KVM_S390_PV_COMMAND ioctl are available.
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
      Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
      Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221111170632.77622-4-imbrenda@linux.ibm.com
      Message-Id: <20221111170632.77622-4-imbrenda@linux.ibm.com>
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      8c516b25
    • C
      KVM: s390: pv: asynchronous destroy for reboot · fb491d55
      Claudio Imbrenda 提交于
      Until now, destroying a protected guest was an entirely synchronous
      operation that could potentially take a very long time, depending on
      the size of the guest, due to the time needed to clean up the address
      space from protected pages.
      
      This patch implements an asynchronous destroy mechanism, that allows a
      protected guest to reboot significantly faster than previously.
      
      This is achieved by clearing the pages of the old guest in background.
      In case of reboot, the new guest will be able to run in the same
      address space almost immediately.
      
      The old protected guest is then only destroyed when all of its memory
      has been destroyed or otherwise made non protected.
      
      Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl:
      
      KVM_PV_ASYNC_CLEANUP_PREPARE: set aside the current protected VM for
      later asynchronous teardown. The current KVM VM will then continue
      immediately as non-protected. If a protected VM had already been
      set aside for asynchronous teardown, but without starting the teardown
      process, this call will fail. There can be at most one VM set aside at
      any time. Once it is set aside, the protected VM only exists in the
      context of the Ultravisor, it is not associated with the KVM VM
      anymore. Its protected CPUs have already been destroyed, but not its
      memory. This command can be issued again immediately after starting
      KVM_PV_ASYNC_CLEANUP_PERFORM, without having to wait for completion.
      
      KVM_PV_ASYNC_CLEANUP_PERFORM: tears down the protected VM previously
      set aside using KVM_PV_ASYNC_CLEANUP_PREPARE. Ideally the
      KVM_PV_ASYNC_CLEANUP_PERFORM PV command should be issued by userspace
      from a separate thread. If a fatal signal is received (or if the
      process terminates naturally), the command will terminate immediately
      without completing. All protected VMs whose teardown was interrupted
      will be put in the need_cleanup list. The rest of the normal KVM
      teardown process will take care of properly cleaning up all remaining
      protected VMs, including the ones on the need_cleanup list.
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
      Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
      Link: https://lore.kernel.org/r/20221111170632.77622-2-imbrenda@linux.ibm.com
      Message-Id: <20221111170632.77622-2-imbrenda@linux.ibm.com>
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      fb491d55
  4. 19 11月, 2022 2 次提交
  5. 10 11月, 2022 6 次提交
    • P
      KVM: replace direct irq.h inclusion · d663b8a2
      Paolo Bonzini 提交于
      virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source
      directory (i.e. not from arch/*/include) for the sole purpose of retrieving
      irqchip_in_kernel.
      
      Making the function inline in a header that is already included,
      such as asm/kvm_host.h, is not possible because it needs to look at
      struct kvm which is defined after asm/kvm_host.h is included.  So add a
      kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is
      only performance critical on arm64 and x86, and the non-inline function
      is enough on all other architectures.
      
      irq.h can then be deleted from all architectures except x86.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d663b8a2
    • A
      KVM: x86: Add a VALID_MASK for the MSR exit reason flags · db205f7e
      Aaron Lewis 提交于
      Add the mask KVM_MSR_EXIT_REASON_VALID_MASK for the MSR exit reason
      flags.  This simplifies checks that validate these flags, and makes it
      easier to introduce new flags in the future.
      
      No functional change intended.
      Signed-off-by: NAaron Lewis <aaronlewis@google.com>
      Message-Id: <20220921151525.904162-3-aaronlewis@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      db205f7e
    • P
      kvm: Add interruptible flag to __gfn_to_pfn_memslot() · c8b88b33
      Peter Xu 提交于
      Add a new "interruptible" flag showing that the caller is willing to be
      interrupted by signals during the __gfn_to_pfn_memslot() request.  Wire it
      up with a FOLL_INTERRUPTIBLE flag that we've just introduced.
      
      This prepares KVM to be able to respond to SIGUSR1 (for QEMU that's the
      SIGIPI) even during e.g. handling an userfaultfd page fault.
      
      No functional change intended.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20221011195809.557016-4-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c8b88b33
    • P
      kvm: Add KVM_PFN_ERR_SIGPENDING · fe5ed56c
      Peter Xu 提交于
      Add a new pfn error to show that we've got a pending signal to handle
      during hva_to_pfn_slow() procedure (of -EINTR retval).
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20221011195809.557016-3-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fe5ed56c
    • P
      mm/gup: Add FOLL_INTERRUPTIBLE · 93c5c61d
      Peter Xu 提交于
      We have had FAULT_FLAG_INTERRUPTIBLE but it was never applied to GUPs.  One
      issue with it is that not all GUP paths are able to handle signal delivers
      besides SIGKILL.
      
      That's not ideal for the GUP users who are actually able to handle these
      cases, like KVM.
      
      KVM uses GUP extensively on faulting guest pages, during which we've got
      existing infrastructures to retry a page fault at a later time.  Allowing
      the GUP to be interrupted by generic signals can make KVM related threads
      to be more responsive.  For examples:
      
        (1) SIGUSR1: which QEMU/KVM uses to deliver an inter-process IPI,
            e.g. when the admin issues a vm_stop QMP command, SIGUSR1 can be
            generated to kick the vcpus out of kernel context immediately,
      
        (2) SIGINT: which can be used with interactive hypervisor users to stop a
            virtual machine with Ctrl-C without any delays/hangs,
      
        (3) SIGTRAP: which grants GDB capability even during page faults that are
            stuck for a long time.
      
      Normally hypervisor will be able to receive these signals properly, but not
      if we're stuck in a GUP for a long time for whatever reason.  It happens
      easily with a stucked postcopy migration when e.g. a network temp failure
      happens, then some vcpu threads can hang death waiting for the pages.  With
      the new FOLL_INTERRUPTIBLE, we can allow GUP users like KVM to selectively
      enable the ability to trap these signals.
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20221011195809.557016-2-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      93c5c61d
    • M
      bug: introduce ASSERT_STRUCT_OFFSET · 07a368b3
      Maxim Levitsky 提交于
      ASSERT_STRUCT_OFFSET allows to assert during the build of
      the kernel that a field in a struct have an expected offset.
      
      KVM used to have such macro, but there is almost nothing KVM specific
      in it so move it to build_bug.h, so that it can be used in other
      places in KVM.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20221025124741.228045-10-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      07a368b3
  6. 02 11月, 2022 1 次提交
  7. 01 11月, 2022 1 次提交
  8. 29 10月, 2022 5 次提交
  9. 28 10月, 2022 1 次提交
    • T
      net/mlx5: Fix possible use-after-free in async command interface · bacd22df
      Tariq Toukan 提交于
      mlx5_cmd_cleanup_async_ctx should return only after all its callback
      handlers were completed. Before this patch, the below race between
      mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and
      lead to a use-after-free:
      
      1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e.
         elevated by 1, a single inflight callback).
      2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1.
      3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and
         is about to call wake_up().
      4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns
         immediately as the condition (num_inflight == 0) holds.
      5. mlx5_cmd_cleanup_async_ctx returns.
      6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx
         object.
      7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed
         object.
      
      Fix it by syncing using a completion object. Mark it completed when
      num_inflight reaches 0.
      
      Trace:
      
      BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270
      Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0
      
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <IRQ>
       dump_stack_lvl+0x57/0x7d
       print_report.cold+0x2d5/0x684
       ? do_raw_spin_lock+0x23d/0x270
       kasan_report+0xb1/0x1a0
       ? do_raw_spin_lock+0x23d/0x270
       do_raw_spin_lock+0x23d/0x270
       ? rwlock_bug.part.0+0x90/0x90
       ? __delete_object+0xb8/0x100
       ? lock_downgrade+0x6e0/0x6e0
       _raw_spin_lock_irqsave+0x43/0x60
       ? __wake_up_common_lock+0xb9/0x140
       __wake_up_common_lock+0xb9/0x140
       ? __wake_up_common+0x650/0x650
       ? destroy_tis_callback+0x53/0x70 [mlx5_core]
       ? kasan_set_track+0x21/0x30
       ? destroy_tis_callback+0x53/0x70 [mlx5_core]
       ? kfree+0x1ba/0x520
       ? do_raw_spin_unlock+0x54/0x220
       mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core]
       ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
       ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
       mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core]
       ? dump_command+0xcc0/0xcc0 [mlx5_core]
       ? lockdep_hardirqs_on_prepare+0x400/0x400
       ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
       cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
       atomic_notifier_call_chain+0xd7/0x1d0
       mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core]
       atomic_notifier_call_chain+0xd7/0x1d0
       ? irq_release+0x140/0x140 [mlx5_core]
       irq_int_handler+0x19/0x30 [mlx5_core]
       __handle_irq_event_percpu+0x1f2/0x620
       handle_irq_event+0xb2/0x1d0
       handle_edge_irq+0x21e/0xb00
       __common_interrupt+0x79/0x1a0
       common_interrupt+0x78/0xa0
       </IRQ>
       <TASK>
       asm_common_interrupt+0x22/0x40
      RIP: 0010:default_idle+0x42/0x60
      Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00
      RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242
      RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110
      RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc
      RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3
      R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005
      R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000
       ? default_idle_call+0xcc/0x450
       default_idle_call+0xec/0x450
       do_idle+0x394/0x450
       ? arch_cpu_idle_exit+0x40/0x40
       ? do_idle+0x17/0x450
       cpu_startup_entry+0x19/0x20
       start_secondary+0x221/0x2b0
       ? set_cpu_sibling_map+0x2070/0x2070
       secondary_startup_64_no_verify+0xcd/0xdb
       </TASK>
      
      Allocated by task 49502:
       kasan_save_stack+0x1e/0x40
       __kasan_kmalloc+0x81/0xa0
       kvmalloc_node+0x48/0xe0
       mlx5e_bulk_async_init+0x35/0x110 [mlx5_core]
       mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core]
       mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
       mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
       mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
       mlx5e_suspend+0xdb/0x140 [mlx5_core]
       mlx5e_remove+0x89/0x190 [mlx5_core]
       auxiliary_bus_remove+0x52/0x70
       device_release_driver_internal+0x40f/0x650
       driver_detach+0xc1/0x180
       bus_remove_driver+0x125/0x2f0
       auxiliary_driver_unregister+0x16/0x50
       mlx5e_cleanup+0x26/0x30 [mlx5_core]
       cleanup+0xc/0x4e [mlx5_core]
       __x64_sys_delete_module+0x2b5/0x450
       do_syscall_64+0x3d/0x90
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Freed by task 49502:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       kasan_set_free_info+0x20/0x30
       ____kasan_slab_free+0x11d/0x1b0
       kfree+0x1ba/0x520
       mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core]
       mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
       mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
       mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
       mlx5e_suspend+0xdb/0x140 [mlx5_core]
       mlx5e_remove+0x89/0x190 [mlx5_core]
       auxiliary_bus_remove+0x52/0x70
       device_release_driver_internal+0x40f/0x650
       driver_detach+0xc1/0x180
       bus_remove_driver+0x125/0x2f0
       auxiliary_driver_unregister+0x16/0x50
       mlx5e_cleanup+0x26/0x30 [mlx5_core]
       cleanup+0xc/0x4e [mlx5_core]
       __x64_sys_delete_module+0x2b5/0x450
       do_syscall_64+0x3d/0x90
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: e355477e ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API")
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      bacd22df
  10. 27 10月, 2022 3 次提交
  11. 26 10月, 2022 1 次提交
  12. 25 10月, 2022 4 次提交
  13. 24 10月, 2022 2 次提交
  14. 22 10月, 2022 2 次提交
  15. 21 10月, 2022 4 次提交
  16. 20 10月, 2022 4 次提交