1. 05 6月, 2019 5 次提交
    • W
      KVM: LAPIC: Delay trace_kvm_wait_lapic_expire tracepoint to after vmexit · ec0671d5
      Wanpeng Li 提交于
      wait_lapic_expire() call was moved above guest_enter_irqoff() because of
      its tracepoint, which violated the RCU extended quiescent state invoked
      by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint
      below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot the delta before
      VM-Enter, but trace it after VM-Exit. This can help us to move
      wait_lapic_expire() just before vmentry in the later patch.
      
      [1] Commit 8b89fe1f ("kvm: x86: move tracepoints outside extended quiescent state")
      [2] https://patchwork.kernel.org/patch/7821111/
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      [Track whether wait_lapic_expire was called, and do not invoke the tracepoint
       if not. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ec0671d5
    • W
      KVM: LAPIC: Extract adaptive tune timer advancement logic · 84ea3aca
      Wanpeng Li 提交于
      Extract adaptive tune timer advancement logic to a single function.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      [Rename new function. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      84ea3aca
    • V
      KVM/nSVM: properly map nested VMCB · 8f38302c
      Vitaly Kuznetsov 提交于
      Commit 8c5fbf1a ("KVM/nSVM: Use the new mapping API for mapping guest
      memory") broke nested SVM completely: kvm_vcpu_map()'s second parameter is
      GFN so vmcb_gpa needs to be converted with gpa_to_gfn(), not the other way
      around.
      
      Fixes: 8c5fbf1a ("KVM/nSVM: Use the new mapping API for mapping guest memory")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f38302c
    • K
      kvm: x86: Fix reserved bits related calculation errors caused by MKTME · f3ecb59d
      Kai Huang 提交于
      Intel MKTME repurposes several high bits of physical address as 'keyID'
      for memory encryption thus effectively reduces platform's maximum
      physical address bits. Exactly how many bits are reduced is configured
      by BIOS. To honor such HW behavior, the repurposed bits are reduced from
      cpuinfo_x86->x86_phys_bits when MKTME is detected in CPU detection.
      Similarly, AMD SME/SEV also reduces physical address bits for memory
      encryption, and cpuinfo->x86_phys_bits is reduced too when SME/SEV is
      detected, so for both MKTME and SME/SEV, boot_cpu_data.x86_phys_bits
      doesn't hold physical address bits reported by CPUID anymore.
      
      Currently KVM treats bits from boot_cpu_data.x86_phys_bits to 51 as
      reserved bits, but it's not true anymore for MKTME, since MKTME treats
      those reduced bits as 'keyID', but not reserved bits. Therefore
      boot_cpu_data.x86_phys_bits cannot be used to calculate reserved bits
      anymore, although we can still use it for AMD SME/SEV since SME/SEV
      treats the reduced bits differently -- they are treated as reserved
      bits, the same as other reserved bits in page table entity [1].
      
      Fix by introducing a new 'shadow_phys_bits' variable in KVM x86 MMU code
      to store the effective physical bits w/o reserved bits -- for MKTME,
      it equals to physical address reported by CPUID, and for SME/SEV, it is
      boot_cpu_data.x86_phys_bits.
      
      Note that for the physical address bits reported to guest should remain
      unchanged -- KVM should report physical address reported by CPUID to
      guest, but not boot_cpu_data.x86_phys_bits. Because for Intel MKTME,
      there's no harm if guest sets up 'keyID' bits in guest page table (since
      MKTME only works at physical address level), and KVM doesn't even expose
      MKTME to guest. Arguably, for AMD SME/SEV, guest is aware of SEV thus it
      should adjust boot_cpu_data.x86_phys_bits when it detects SEV, therefore
      KVM should still reports physcial address reported by CPUID to guest.
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f3ecb59d
    • K
      kvm: x86: Move kvm_set_mmio_spte_mask() from x86.c to mmu.c · 7b6f8a06
      Kai Huang 提交于
      As a prerequisite to fix several SPTE reserved bits related calculation
      errors caused by MKTME, which requires kvm_set_mmio_spte_mask() to use
      local static variable defined in mmu.c.
      
      Also move call site of kvm_set_mmio_spte_mask() from kvm_arch_init() to
      kvm_mmu_module_init() so that kvm_set_mmio_spte_mask() can be static.
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7b6f8a06
  2. 30 5月, 2019 6 次提交
    • S
      KVM: PPC: Book3S HV: Restore SPRG3 in kvmhv_p9_guest_entry() · d724c9e5
      Suraj Jitindar Singh 提交于
      The sprgs are a set of 4 general purpose sprs provided for software use.
      SPRG3 is special in that it can also be read from userspace. Thus it is
      used on linux to store the cpu and numa id of the process to speed up
      syscall access to this information.
      
      This register is overwritten with the guest value on kvm guest entry,
      and so needs to be restored on exit again. Thus restore the value on
      the guest exit path in kvmhv_p9_guest_entry().
      
      Cc: stable@vger.kernel.org # v4.20+
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d724c9e5
    • P
      KVM: PPC: Book3S HV: Fix lockdep warning when entering guest on POWER9 · 1b28d553
      Paul Mackerras 提交于
      Commit 3309bec8 ("KVM: PPC: Book3S HV: Fix lockdep warning when
      entering the guest") moved calls to trace_hardirqs_{on,off} in the
      entry path used for HPT guests.  Similar code exists in the new
      streamlined entry path used for radix guests on POWER9.  This makes
      the same change there, so as to avoid lockdep warnings such as this:
      
      [  228.686461] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      [  228.686480] WARNING: CPU: 116 PID: 3803 at ../kernel/locking/lockdep.c:4219 check_flags.part.23+0x21c/0x270
      [  228.686544] Modules linked in: vhost_net vhost xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat
      +xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter
      +ebtables ip6table_filter ip6_tables iptable_filter fuse kvm_hv kvm at24 ipmi_powernv regmap_i2c ipmi_devintf
      +uio_pdrv_genirq ofpart ipmi_msghandler uio powernv_flash mtd ibmpowernv opal_prd ip_tables ext4 mbcache jbd2 btrfs
      +zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor
      +raid6_pq raid1 raid0 ses sd_mod enclosure scsi_transport_sas ast i2c_opal i2c_algo_bit drm_kms_helper syscopyarea
      +sysfillrect sysimgblt fb_sys_fops ttm drm i40e e1000e cxl aacraid tg3 drm_panel_orientation_quirks i2c_core
      [  228.686859] CPU: 116 PID: 3803 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc1-xive+ #42
      [  228.686911] NIP:  c0000000001b394c LR: c0000000001b3948 CTR: c000000000bfad20
      [  228.686963] REGS: c000200cdb50f570 TRAP: 0700   Not tainted  (5.2.0-rc1-xive+)
      [  228.687001] MSR:  9000000002823033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 48222222  XER: 20040000
      [  228.687060] CFAR: c000000000116db0 IRQMASK: 1
      [  228.687060] GPR00: c0000000001b3948 c000200cdb50f800 c0000000015e7600 000000000000002e
      [  228.687060] GPR04: 0000000000000001 c0000000001c71a0 000000006e655f73 72727563284e4f5f
      [  228.687060] GPR08: 0000200e60680000 0000000000000000 c000200cdb486180 0000000000000000
      [  228.687060] GPR12: 0000000000002000 c000200fff61a680 0000000000000000 00007fffb75c0000
      [  228.687060] GPR16: 0000000000000000 0000000000000000 c0000000017d6900 c000000001124900
      [  228.687060] GPR20: 0000000000000074 c008000006916f68 0000000000000074 0000000000000074
      [  228.687060] GPR24: ffffffffffffffff ffffffffffffffff 0000000000000003 c000200d4b600000
      [  228.687060] GPR28: c000000001627e58 c000000001489908 c000000001627e58 c000000002304de0
      [  228.687377] NIP [c0000000001b394c] check_flags.part.23+0x21c/0x270
      [  228.687415] LR [c0000000001b3948] check_flags.part.23+0x218/0x270
      [  228.687466] Call Trace:
      [  228.687488] [c000200cdb50f800] [c0000000001b3948] check_flags.part.23+0x218/0x270 (unreliable)
      [  228.687542] [c000200cdb50f870] [c0000000001b6548] lock_is_held_type+0x188/0x1c0
      [  228.687595] [c000200cdb50f8d0] [c0000000001d939c] rcu_read_lock_sched_held+0xdc/0x100
      [  228.687646] [c000200cdb50f900] [c0000000001dd704] rcu_note_context_switch+0x304/0x340
      [  228.687701] [c000200cdb50f940] [c0080000068fcc58] kvmhv_run_single_vcpu+0xdb0/0x1120 [kvm_hv]
      [  228.687756] [c000200cdb50fa20] [c0080000068fd5b0] kvmppc_vcpu_run_hv+0x5e8/0xe40 [kvm_hv]
      [  228.687816] [c000200cdb50faf0] [c0080000071797dc] kvmppc_vcpu_run+0x34/0x48 [kvm]
      [  228.687863] [c000200cdb50fb10] [c0080000071755dc] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
      [  228.687916] [c000200cdb50fba0] [c008000007165ccc] kvm_vcpu_ioctl+0x424/0x838 [kvm]
      [  228.687957] [c000200cdb50fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
      [  228.687995] [c000200cdb50fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
      [  228.688033] [c000200cdb50fe00] [c000000000434768] sys_ioctl+0x28/0x80
      [  228.688072] [c000200cdb50fe20] [c00000000000b888] system_call+0x5c/0x70
      [  228.688109] Instruction dump:
      [  228.688142] 4bf6342d 60000000 0fe00000 e8010080 7c0803a6 4bfffe60 3c82ff87 3c62ff87
      [  228.688196] 388472d0 3863d738 4bf63405 60000000 <0fe00000> 4bffff4c 3c82ff87 3c62ff87
      [  228.688251] irq event stamp: 205
      [  228.688287] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
      [  228.688344] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
      [  228.688412] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
      [  228.688464] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210
      [  228.688513] ---[ end trace eb16f6260022a812 ]---
      [  228.688548] possible reason: unannotated irqs-off.
      [  228.688571] irq event stamp: 205
      [  228.688607] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
      [  228.688664] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
      [  228.688719] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
      [  228.688758] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210
      
      Cc: stable@vger.kernel.org # v4.20+
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Tested-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1b28d553
    • C
      KVM: PPC: Book3S HV: XIVE: Fix page offset when clearing ESB pages · bcaa3110
      Cédric Le Goater 提交于
      Under XIVE, the ESB pages of an interrupt are used for interrupt
      management (EOI) and triggering. They are made available to guests
      through a mapping of the XIVE KVM device.
      
      When a device is passed-through, the passthru_irq helpers,
      kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped(), clear the ESB
      pages of the guest IRQ number being mapped and let the VM fault
      handler repopulate with the correct page.
      
      The ESB pages are mapped at offset 4 (KVM_XIVE_ESB_PAGE_OFFSET) in the
      KVM device mapping. Unfortunately, this offset was not taken into
      account when clearing the pages. This lead to issues with the
      passthrough devices for which the interrupts were not functional under
      some guest configuration (tg3 and single CPU) or in any configuration
      (e1000e adapter).
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bcaa3110
    • C
      KVM: PPC: Book3S HV: XIVE: Take the srcu read lock when accessing memslots · aedb5b19
      Cédric Le Goater 提交于
      According to Documentation/virtual/kvm/locking.txt, the srcu read lock
      should be taken when accessing the memslots of the VM. The XIVE KVM
      device needs to do so when configuring the page of the OS event queue
      of vCPU for a given priority and when marking the same page dirty
      before migration.
      
      This avoids warnings such as :
      
      [  208.224882] =============================
      [  208.224884] WARNING: suspicious RCU usage
      [  208.224889] 5.2.0-rc2-xive+ #47 Not tainted
      [  208.224890] -----------------------------
      [  208.224894] ../include/linux/kvm_host.h:633 suspicious rcu_dereference_check() usage!
      [  208.224896]
                     other info that might help us debug this:
      
      [  208.224898]
                     rcu_scheduler_active = 2, debug_locks = 1
      [  208.224901] no locks held by qemu-system-ppc/3923.
      [  208.224902]
                     stack backtrace:
      [  208.224907] CPU: 64 PID: 3923 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc2-xive+ #47
      [  208.224909] Call Trace:
      [  208.224918] [c000200cdd98fa30] [c000000000be1934] dump_stack+0xe8/0x164 (unreliable)
      [  208.224924] [c000200cdd98fa80] [c0000000001aec80] lockdep_rcu_suspicious+0x110/0x180
      [  208.224935] [c000200cdd98fb00] [c0080000075933a0] gfn_to_memslot+0x1c8/0x200 [kvm]
      [  208.224943] [c000200cdd98fb40] [c008000007599600] gfn_to_pfn+0x28/0x60 [kvm]
      [  208.224951] [c000200cdd98fb70] [c008000007599658] gfn_to_page+0x20/0x40 [kvm]
      [  208.224959] [c000200cdd98fb90] [c0080000075b495c] kvmppc_xive_native_set_attr+0x8b4/0x1480 [kvm]
      [  208.224967] [c000200cdd98fca0] [c00800000759261c] kvm_device_ioctl_attr+0x64/0xb0 [kvm]
      [  208.224974] [c000200cdd98fcf0] [c008000007592730] kvm_device_ioctl+0xc8/0x110 [kvm]
      [  208.224979] [c000200cdd98fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
      [  208.224981] [c000200cdd98fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
      [  208.224984] [c000200cdd98fe00] [c000000000434768] sys_ioctl+0x28/0x80
      [  208.224988] [c000200cdd98fe20] [c00000000000b888] system_call+0x5c/0x70
      legoater@boss01:~$
      
      Fixes: 13ce3297 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
      Fixes: e6714bd1 ("KVM: PPC: Book3S HV: XIVE: Add a control to dirty the XIVE EQ pages")
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      aedb5b19
    • C
      KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts · ef974020
      Cédric Le Goater 提交于
      The passthrough interrupts are defined at the host level and their IRQ
      data should not be cleared unless specifically deconfigured (shutdown)
      by the host. They differ from the IPI interrupts which are allocated
      by the XIVE KVM device and reserved to the guest usage only.
      
      This fixes a host crash when destroying a VM in which a PCI adapter
      was passed-through. In this case, the interrupt is cleared and freed
      by the KVM device and then shutdown by vfio at the host level.
      
      [ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
      [ 1007.360285] Faulting instruction address: 0xc00000000009da34
      [ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
      [ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
      [ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
      [ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
      [ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
      [ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
      [ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
      [ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
      [ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
      [ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
      [ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
      [ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
      [ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
      [ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
      [ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
      [ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
      [ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
      [ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
      [ 1007.360732] Call Trace:
      [ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
      [ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
      [ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
      [ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
      [ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
      [ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
      [ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
      [ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
      [ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
      [ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
      [ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
      [ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
      [ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
      [ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
      [ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
      [ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
      [ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
      [ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
      [ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
      [ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
      [ 1007.361159] Instruction dump:
      [ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
      [ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022
      
      Cc: stable@vger.kernel.org # v4.12+
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ef974020
    • C
      KVM: PPC: Book3S HV: XIVE: Introduce a new mutex for the XIVE device · 7e10b9a6
      Cédric Le Goater 提交于
      The XICS-on-XIVE KVM device needs to allocate XIVE event queues when a
      priority is used by the OS. This is referred as EQ provisioning and it
      is done under the hood when :
      
        1. a CPU is hot-plugged in the VM
        2. the "set-xive" is called at VM startup
        3. sources are restored at VM restore
      
      The kvm->lock mutex is used to protect the different XIVE structures
      being modified but in some contexts, kvm->lock is taken under the
      vcpu->mutex which is not permitted by the KVM locking rules.
      
      Introduce a new mutex 'lock' for the KVM devices for them to
      synchronize accesses to the XIVE device structures.
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7e10b9a6
  3. 29 5月, 2019 7 次提交
  4. 28 5月, 2019 1 次提交
    • T
      KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID · a86cb413
      Thomas Huth 提交于
      KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
      architectures. However, on s390x, the amount of usable CPUs is determined
      during runtime - it is depending on the features of the machine the code
      is running on. Since we are using the vcpu_id as an index into the SCA
      structures that are defined by the hardware (see e.g. the sca_add_vcpu()
      function), it is not only the amount of CPUs that is limited by the hard-
      ware, but also the range of IDs that we can use.
      Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
      So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
      code into the architecture specific code, and on s390x we have to return
      the same value here as for KVM_CAP_MAX_VCPUS.
      This problem has been discovered with the kvm_create_max_vcpus selftest.
      With this change applied, the selftest now passes on s390x, too.
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Message-Id: <20190523164309.13345-9-thuth@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      a86cb413
  5. 25 5月, 2019 16 次提交
  6. 24 5月, 2019 5 次提交