1. 18 6月, 2019 10 次提交
  2. 14 6月, 2019 1 次提交
    • P
      KVM: x86: clean up conditions for asynchronous page fault handling · 1dfdb45e
      Paolo Bonzini 提交于
      Even when asynchronous page fault is disabled, KVM does not want to pause
      the host if a guest triggers a page fault; instead it will put it into
      an artificial HLT state that allows running other host processes while
      allowing interrupt delivery into the guest.
      
      However, the way this feature is triggered is a bit confusing.
      First, it is not used for page faults while a nested guest is
      running: but this is not an issue since the artificial halt
      is completely invisible to the guest, either L1 or L2.  Second,
      it is used even if kvm_halt_in_guest() returns true; in this case,
      the guest probably should not pay the additional latency cost of the
      artificial halt, and thus we should handle the page fault in a
      completely synchronous way.
      
      By introducing a new function kvm_can_deliver_async_pf, this patch
      commonizes the code that chooses whether to deliver an async page fault
      (kvm_arch_async_page_not_present) and the code that chooses whether a
      page fault should be handled synchronously (kvm_can_do_async_pf).
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1dfdb45e
  3. 05 6月, 2019 20 次提交
  4. 01 6月, 2019 2 次提交
  5. 30 5月, 2019 6 次提交
    • S
      KVM: PPC: Book3S HV: Restore SPRG3 in kvmhv_p9_guest_entry() · d724c9e5
      Suraj Jitindar Singh 提交于
      The sprgs are a set of 4 general purpose sprs provided for software use.
      SPRG3 is special in that it can also be read from userspace. Thus it is
      used on linux to store the cpu and numa id of the process to speed up
      syscall access to this information.
      
      This register is overwritten with the guest value on kvm guest entry,
      and so needs to be restored on exit again. Thus restore the value on
      the guest exit path in kvmhv_p9_guest_entry().
      
      Cc: stable@vger.kernel.org # v4.20+
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d724c9e5
    • P
      KVM: PPC: Book3S HV: Fix lockdep warning when entering guest on POWER9 · 1b28d553
      Paul Mackerras 提交于
      Commit 3309bec8 ("KVM: PPC: Book3S HV: Fix lockdep warning when
      entering the guest") moved calls to trace_hardirqs_{on,off} in the
      entry path used for HPT guests.  Similar code exists in the new
      streamlined entry path used for radix guests on POWER9.  This makes
      the same change there, so as to avoid lockdep warnings such as this:
      
      [  228.686461] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      [  228.686480] WARNING: CPU: 116 PID: 3803 at ../kernel/locking/lockdep.c:4219 check_flags.part.23+0x21c/0x270
      [  228.686544] Modules linked in: vhost_net vhost xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat
      +xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter
      +ebtables ip6table_filter ip6_tables iptable_filter fuse kvm_hv kvm at24 ipmi_powernv regmap_i2c ipmi_devintf
      +uio_pdrv_genirq ofpart ipmi_msghandler uio powernv_flash mtd ibmpowernv opal_prd ip_tables ext4 mbcache jbd2 btrfs
      +zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor
      +raid6_pq raid1 raid0 ses sd_mod enclosure scsi_transport_sas ast i2c_opal i2c_algo_bit drm_kms_helper syscopyarea
      +sysfillrect sysimgblt fb_sys_fops ttm drm i40e e1000e cxl aacraid tg3 drm_panel_orientation_quirks i2c_core
      [  228.686859] CPU: 116 PID: 3803 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc1-xive+ #42
      [  228.686911] NIP:  c0000000001b394c LR: c0000000001b3948 CTR: c000000000bfad20
      [  228.686963] REGS: c000200cdb50f570 TRAP: 0700   Not tainted  (5.2.0-rc1-xive+)
      [  228.687001] MSR:  9000000002823033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 48222222  XER: 20040000
      [  228.687060] CFAR: c000000000116db0 IRQMASK: 1
      [  228.687060] GPR00: c0000000001b3948 c000200cdb50f800 c0000000015e7600 000000000000002e
      [  228.687060] GPR04: 0000000000000001 c0000000001c71a0 000000006e655f73 72727563284e4f5f
      [  228.687060] GPR08: 0000200e60680000 0000000000000000 c000200cdb486180 0000000000000000
      [  228.687060] GPR12: 0000000000002000 c000200fff61a680 0000000000000000 00007fffb75c0000
      [  228.687060] GPR16: 0000000000000000 0000000000000000 c0000000017d6900 c000000001124900
      [  228.687060] GPR20: 0000000000000074 c008000006916f68 0000000000000074 0000000000000074
      [  228.687060] GPR24: ffffffffffffffff ffffffffffffffff 0000000000000003 c000200d4b600000
      [  228.687060] GPR28: c000000001627e58 c000000001489908 c000000001627e58 c000000002304de0
      [  228.687377] NIP [c0000000001b394c] check_flags.part.23+0x21c/0x270
      [  228.687415] LR [c0000000001b3948] check_flags.part.23+0x218/0x270
      [  228.687466] Call Trace:
      [  228.687488] [c000200cdb50f800] [c0000000001b3948] check_flags.part.23+0x218/0x270 (unreliable)
      [  228.687542] [c000200cdb50f870] [c0000000001b6548] lock_is_held_type+0x188/0x1c0
      [  228.687595] [c000200cdb50f8d0] [c0000000001d939c] rcu_read_lock_sched_held+0xdc/0x100
      [  228.687646] [c000200cdb50f900] [c0000000001dd704] rcu_note_context_switch+0x304/0x340
      [  228.687701] [c000200cdb50f940] [c0080000068fcc58] kvmhv_run_single_vcpu+0xdb0/0x1120 [kvm_hv]
      [  228.687756] [c000200cdb50fa20] [c0080000068fd5b0] kvmppc_vcpu_run_hv+0x5e8/0xe40 [kvm_hv]
      [  228.687816] [c000200cdb50faf0] [c0080000071797dc] kvmppc_vcpu_run+0x34/0x48 [kvm]
      [  228.687863] [c000200cdb50fb10] [c0080000071755dc] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
      [  228.687916] [c000200cdb50fba0] [c008000007165ccc] kvm_vcpu_ioctl+0x424/0x838 [kvm]
      [  228.687957] [c000200cdb50fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
      [  228.687995] [c000200cdb50fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
      [  228.688033] [c000200cdb50fe00] [c000000000434768] sys_ioctl+0x28/0x80
      [  228.688072] [c000200cdb50fe20] [c00000000000b888] system_call+0x5c/0x70
      [  228.688109] Instruction dump:
      [  228.688142] 4bf6342d 60000000 0fe00000 e8010080 7c0803a6 4bfffe60 3c82ff87 3c62ff87
      [  228.688196] 388472d0 3863d738 4bf63405 60000000 <0fe00000> 4bffff4c 3c82ff87 3c62ff87
      [  228.688251] irq event stamp: 205
      [  228.688287] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
      [  228.688344] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
      [  228.688412] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
      [  228.688464] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210
      [  228.688513] ---[ end trace eb16f6260022a812 ]---
      [  228.688548] possible reason: unannotated irqs-off.
      [  228.688571] irq event stamp: 205
      [  228.688607] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
      [  228.688664] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
      [  228.688719] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
      [  228.688758] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210
      
      Cc: stable@vger.kernel.org # v4.20+
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Tested-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1b28d553
    • C
      KVM: PPC: Book3S HV: XIVE: Fix page offset when clearing ESB pages · bcaa3110
      Cédric Le Goater 提交于
      Under XIVE, the ESB pages of an interrupt are used for interrupt
      management (EOI) and triggering. They are made available to guests
      through a mapping of the XIVE KVM device.
      
      When a device is passed-through, the passthru_irq helpers,
      kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped(), clear the ESB
      pages of the guest IRQ number being mapped and let the VM fault
      handler repopulate with the correct page.
      
      The ESB pages are mapped at offset 4 (KVM_XIVE_ESB_PAGE_OFFSET) in the
      KVM device mapping. Unfortunately, this offset was not taken into
      account when clearing the pages. This lead to issues with the
      passthrough devices for which the interrupts were not functional under
      some guest configuration (tg3 and single CPU) or in any configuration
      (e1000e adapter).
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bcaa3110
    • C
      KVM: PPC: Book3S HV: XIVE: Take the srcu read lock when accessing memslots · aedb5b19
      Cédric Le Goater 提交于
      According to Documentation/virtual/kvm/locking.txt, the srcu read lock
      should be taken when accessing the memslots of the VM. The XIVE KVM
      device needs to do so when configuring the page of the OS event queue
      of vCPU for a given priority and when marking the same page dirty
      before migration.
      
      This avoids warnings such as :
      
      [  208.224882] =============================
      [  208.224884] WARNING: suspicious RCU usage
      [  208.224889] 5.2.0-rc2-xive+ #47 Not tainted
      [  208.224890] -----------------------------
      [  208.224894] ../include/linux/kvm_host.h:633 suspicious rcu_dereference_check() usage!
      [  208.224896]
                     other info that might help us debug this:
      
      [  208.224898]
                     rcu_scheduler_active = 2, debug_locks = 1
      [  208.224901] no locks held by qemu-system-ppc/3923.
      [  208.224902]
                     stack backtrace:
      [  208.224907] CPU: 64 PID: 3923 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc2-xive+ #47
      [  208.224909] Call Trace:
      [  208.224918] [c000200cdd98fa30] [c000000000be1934] dump_stack+0xe8/0x164 (unreliable)
      [  208.224924] [c000200cdd98fa80] [c0000000001aec80] lockdep_rcu_suspicious+0x110/0x180
      [  208.224935] [c000200cdd98fb00] [c0080000075933a0] gfn_to_memslot+0x1c8/0x200 [kvm]
      [  208.224943] [c000200cdd98fb40] [c008000007599600] gfn_to_pfn+0x28/0x60 [kvm]
      [  208.224951] [c000200cdd98fb70] [c008000007599658] gfn_to_page+0x20/0x40 [kvm]
      [  208.224959] [c000200cdd98fb90] [c0080000075b495c] kvmppc_xive_native_set_attr+0x8b4/0x1480 [kvm]
      [  208.224967] [c000200cdd98fca0] [c00800000759261c] kvm_device_ioctl_attr+0x64/0xb0 [kvm]
      [  208.224974] [c000200cdd98fcf0] [c008000007592730] kvm_device_ioctl+0xc8/0x110 [kvm]
      [  208.224979] [c000200cdd98fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
      [  208.224981] [c000200cdd98fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
      [  208.224984] [c000200cdd98fe00] [c000000000434768] sys_ioctl+0x28/0x80
      [  208.224988] [c000200cdd98fe20] [c00000000000b888] system_call+0x5c/0x70
      legoater@boss01:~$
      
      Fixes: 13ce3297 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
      Fixes: e6714bd1 ("KVM: PPC: Book3S HV: XIVE: Add a control to dirty the XIVE EQ pages")
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      aedb5b19
    • C
      KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts · ef974020
      Cédric Le Goater 提交于
      The passthrough interrupts are defined at the host level and their IRQ
      data should not be cleared unless specifically deconfigured (shutdown)
      by the host. They differ from the IPI interrupts which are allocated
      by the XIVE KVM device and reserved to the guest usage only.
      
      This fixes a host crash when destroying a VM in which a PCI adapter
      was passed-through. In this case, the interrupt is cleared and freed
      by the KVM device and then shutdown by vfio at the host level.
      
      [ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
      [ 1007.360285] Faulting instruction address: 0xc00000000009da34
      [ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
      [ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
      [ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
      [ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
      [ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
      [ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
      [ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
      [ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
      [ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
      [ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
      [ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
      [ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
      [ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
      [ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
      [ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
      [ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
      [ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
      [ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
      [ 1007.360732] Call Trace:
      [ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
      [ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
      [ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
      [ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
      [ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
      [ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
      [ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
      [ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
      [ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
      [ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
      [ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
      [ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
      [ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
      [ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
      [ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
      [ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
      [ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
      [ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
      [ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
      [ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
      [ 1007.361159] Instruction dump:
      [ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
      [ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022
      
      Cc: stable@vger.kernel.org # v4.12+
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ef974020
    • C
      KVM: PPC: Book3S HV: XIVE: Introduce a new mutex for the XIVE device · 7e10b9a6
      Cédric Le Goater 提交于
      The XICS-on-XIVE KVM device needs to allocate XIVE event queues when a
      priority is used by the OS. This is referred as EQ provisioning and it
      is done under the hood when :
      
        1. a CPU is hot-plugged in the VM
        2. the "set-xive" is called at VM startup
        3. sources are restored at VM restore
      
      The kvm->lock mutex is used to protect the different XIVE structures
      being modified but in some contexts, kvm->lock is taken under the
      vcpu->mutex which is not permitted by the KVM locking rules.
      
      Introduce a new mutex 'lock' for the KVM devices for them to
      synchronize accesses to the XIVE device structures.
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7e10b9a6
  6. 29 5月, 2019 1 次提交