1. 31 3月, 2020 1 次提交
  2. 27 3月, 2020 1 次提交
  3. 26 3月, 2020 4 次提交
  4. 24 3月, 2020 17 次提交
  5. 19 3月, 2020 8 次提交
    • G
      KVM: PPC: Kill kvmppc_ops::mmu_destroy() and kvmppc_mmu_destroy() · 6fef0c6b
      Greg Kurz 提交于
      These are only used by HV KVM and BookE, and in both cases they are
      nops.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6fef0c6b
    • G
      KVM: PPC: Book3S PR: Move kvmppc_mmu_init() into PR KVM · 3f1268dd
      Greg Kurz 提交于
      This is only relevant to PR KVM. Make it obvious by moving the
      function declaration to the Book3s header and rename it with
      a _pr suffix.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3f1268dd
    • G
      KVM: PPC: Book3S PR: Fix kernel crash with PR KVM · b2fa4f90
      Greg Kurz 提交于
      With PR KVM, shutting down a VM causes the host kernel to crash:
      
      [  314.219284] BUG: Unable to handle kernel data access on read at 0xc00800000176c638
      [  314.219299] Faulting instruction address: 0xc008000000d4ddb0
      cpu 0x0: Vector: 300 (Data Access) at [c00000036da077a0]
          pc: c008000000d4ddb0: kvmppc_mmu_pte_flush_all+0x68/0xd0 [kvm_pr]
          lr: c008000000d4dd94: kvmppc_mmu_pte_flush_all+0x4c/0xd0 [kvm_pr]
          sp: c00000036da07a30
         msr: 900000010280b033
         dar: c00800000176c638
       dsisr: 40000000
        current = 0xc00000036d4c0000
        paca    = 0xc000000001a00000   irqmask: 0x03   irq_happened: 0x01
          pid   = 1992, comm = qemu-system-ppc
      Linux version 5.6.0-master-gku+ (greg@palmb) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #17 SMP Wed Mar 18 13:49:29 CET 2020
      enter ? for help
      [c00000036da07ab0] c008000000d4fbe0 kvmppc_mmu_destroy_pr+0x28/0x60 [kvm_pr]
      [c00000036da07ae0] c0080000009eab8c kvmppc_mmu_destroy+0x34/0x50 [kvm]
      [c00000036da07b00] c0080000009e50c0 kvm_arch_vcpu_destroy+0x108/0x140 [kvm]
      [c00000036da07b30] c0080000009d1b50 kvm_vcpu_destroy+0x28/0x80 [kvm]
      [c00000036da07b60] c0080000009e4434 kvm_arch_destroy_vm+0xbc/0x190 [kvm]
      [c00000036da07ba0] c0080000009d9c2c kvm_put_kvm+0x1d4/0x3f0 [kvm]
      [c00000036da07c00] c0080000009da760 kvm_vm_release+0x38/0x60 [kvm]
      [c00000036da07c30] c000000000420be0 __fput+0xe0/0x310
      [c00000036da07c90] c0000000001747a0 task_work_run+0x150/0x1c0
      [c00000036da07cf0] c00000000014896c do_exit+0x44c/0xd00
      [c00000036da07dc0] c0000000001492f4 do_group_exit+0x64/0xd0
      [c00000036da07e00] c000000000149384 sys_exit_group+0x24/0x30
      [c00000036da07e20] c00000000000b9d0 system_call+0x5c/0x68
      
      This is caused by a use-after-free in kvmppc_mmu_pte_flush_all()
      which dereferences vcpu->arch.book3s which was previously freed by
      kvmppc_core_vcpu_free_pr(). This happens because kvmppc_mmu_destroy()
      is called after kvmppc_core_vcpu_free() since commit ff030fdf
      ("KVM: PPC: Move kvm_vcpu_init() invocation to common code").
      
      The kvmppc_mmu_destroy() helper calls one of the following depending
      on the KVM backend:
      
      - kvmppc_mmu_destroy_hv() which does nothing (Book3s HV)
      
      - kvmppc_mmu_destroy_pr() which undoes the effects of
        kvmppc_mmu_init() (Book3s PR 32-bit)
      
      - kvmppc_mmu_destroy_pr() which undoes the effects of
        kvmppc_mmu_init() (Book3s PR 64-bit)
      
      - kvmppc_mmu_destroy_e500() which does nothing (BookE e500/e500mc)
      
      It turns out that this is only relevant to PR KVM actually. And both
      32 and 64 backends need vcpu->arch.book3s to be valid when calling
      kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy()
      from kvm_arch_vcpu_destroy(), call kvmppc_mmu_destroy_pr() at the
      beginning of kvmppc_core_vcpu_free_pr(). This is consistent with
      kvmppc_mmu_init() being the last call in kvmppc_core_vcpu_create_pr().
      
      For the same reason, if kvmppc_core_vcpu_create_pr() returns an
      error then this means that kvmppc_mmu_init() was either not called
      or failed, in which case kvmppc_mmu_destroy() should not be called.
      Drop the line in the error path of kvm_arch_vcpu_create().
      
      Fixes: ff030fdf ("KVM: PPC: Move kvm_vcpu_init() invocation to common code")
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      b2fa4f90
    • J
      KVM: PPC: Use fallthrough; · 8fc6ba0a
      Joe Perches 提交于
      Convert the various uses of fallthrough comments to fallthrough;
      
      Done via script
      Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8fc6ba0a
    • M
      KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests · 1f50cc17
      Michael Roth 提交于
      The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest
      via the guest/nested hypervisor.
      
        ./run-tests.sh -v
        ...
        TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 2,threads=2 -machine cap-htm=on -append "h_cede_tm"
        FAIL h_cede_tm (2 tests, 1 unexpected failures)
      
      While the test relates to transactional memory instructions, the actual
      failure is due to the return code of the H_CEDE hypercall, which is
      reported as 224 instead of 0. This happens even when no TM instructions
      are issued.
      
      224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3
      is where the caller expects the return code to be placed upon return.
      
      In the case of guest running under a nested hypervisor, issuing H_CEDE
      causes a return from H_ENTER_NESTED. In this case H_CEDE is
      specially-handled immediately rather than later in
      kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to
      set the return code for the caller, hence why kvm-unit-test sees the
      224 return code and reports an error.
      
      Guest kernels generally don't check the return value of H_CEDE, so
      that likely explains why this hasn't caused issues outside of
      kvm-unit-tests so far.
      
      Fix this by setting r3 to 0 after we finish processing the H_CEDE.
      
      RHBZ: 1778556
      
      Fixes: 4bad7779 ("KVM: PPC: Book3S HV: Handle hypercalls correctly when nested")
      Cc: linuxppc-dev@ozlabs.org
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1f50cc17
    • G
      KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones · 1dff3064
      Gustavo Romero 提交于
      On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by
      KVM. This is handled at first by the hardware raising a softpatch interrupt
      when certain TM instructions that need KVM assistance are executed in the
      guest. Althought some TM instructions per Power ISA are invalid forms they
      can raise a softpatch interrupt too. For instance, 'tresume.' instruction
      as defined in the ISA must have bit 31 set (1), but an instruction that
      matches 'tresume.' PO and XO opcode fields but has bit 31 not set (0), like
      0x7cfe9ddc, also raises a softpatch interrupt. Similarly for 'treclaim.'
      and 'trechkpt.' instructions with bit 31 = 0, i.e. 0x7c00075c and
      0x7c0007dc, respectively. Hence, if a code like the following is executed
      in the guest it will raise a softpatch interrupt just like a 'tresume.'
      when the TM facility is enabled ('tabort. 0' in the example is used only
      to enable the TM facility):
      
      int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); }
      
      Currently in such a case KVM throws a complete trace like:
      
      [345523.705984] WARNING: CPU: 24 PID: 64413 at arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
      [345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat
      iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter
      ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 vmx_crypto ipmi_devintf ipmi_msghandler
      ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
      libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456
      async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear tg3
      crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv]
      [345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: G        W   E     5.5.0+ #1
      [345523.706031] NIP:  c0080000072cb9c0 LR: c0080000072b5e80 CTR: c0080000085c7850
      [345523.706034] REGS: c000000399467680 TRAP: 0700   Tainted: G        W   E      (5.5.0+)
      [345523.706034] MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24022428  XER: 00000000
      [345523.706042] CFAR: c0080000072b5e7c IRQMASK: 0
                      GPR00: c0080000072b5e80 c000000399467910 c0080000072db500 c000000375ccc720
                      GPR04: c000000375ccc720 00000003fbec0000 0000a10395dda5a6 0000000000000000
                      GPR08: 000000007cfe9ddc 7cfe9ddc000005dc 7cfe9ddc7c0005dc c0080000072cd530
                      GPR12: c0080000085c7850 c0000003fffeb800 0000000000000001 00007dfb737f0000
                      GPR16: c0002001edcca558 0000000000000000 0000000000000000 0000000000000001
                      GPR20: c000000001b21258 c0002001edcca558 0000000000000018 0000000000000000
                      GPR24: 0000000001000000 ffffffffffffffff 0000000000000001 0000000000001500
                      GPR28: c0002001edcc4278 c00000037dd80000 800000050280f033 c000000375ccc720
      [345523.706062] NIP [c0080000072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
      [345523.706065] LR [c0080000072b5e80] kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 [kvm_hv]
      [345523.706066] Call Trace:
      [345523.706069] [c000000399467910] [c000000399467940] 0xc000000399467940 (unreliable)
      [345523.706071] [c000000399467950] [c000000399467980] 0xc000000399467980
      [345523.706075] [c0000003994679f0] [c0080000072bd1c4] kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv]
      [345523.706079] [c000000399467ac0] [c0080000072bd8e0] kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv]
      [345523.706087] [c000000399467b90] [c0080000085c93cc] kvmppc_vcpu_run+0x34/0x48 [kvm]
      [345523.706095] [c000000399467bb0] [c0080000085c582c] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
      [345523.706101] [c000000399467c40] [c0080000085b7498] kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm]
      [345523.706105] [c000000399467db0] [c0000000004adf9c] ksys_ioctl+0x13c/0x170
      [345523.706107] [c000000399467e00] [c0000000004adff8] sys_ioctl+0x28/0x80
      [345523.706111] [c000000399467e20] [c00000000000b278] system_call+0x5c/0x68
      [345523.706112] Instruction dump:
      [345523.706114] 419e0390 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 6d497c00 2f8907dd
      [345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe00000> 38210040 38600000 ebc1fff0
      
      and then treats the executed instruction as a 'nop'.
      
      However the POWER9 User's Manual, in section "4.6.10 Book II Invalid
      Forms", informs that for TM instructions bit 31 is in fact ignored, thus
      for the TM-related invalid forms ignoring bit 31 and handling them like the
      valid forms is an acceptable way to handle them. POWER8 behaves the same
      way too.
      
      This commit changes the handling of the cases here described by treating
      the TM-related invalid forms that can generate a softpatch interrupt
      just like their valid forms (w/ bit 31 = 1) instead of as a 'nop' and by
      gently reporting any other unrecognized case to the host and treating it as
      illegal instruction instead of throwing a trace and treating it as a 'nop'.
      Signed-off-by: NGustavo Romero <gromero@linux.ibm.com>
      Reviewed-by: NSegher Boessenkool <segher@kernel.crashing.org>
      Acked-By: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NLeonardo Bras <leonardo@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1dff3064
    • M
      KVM: PPC: Book3S HV: Use RADIX_PTE_INDEX_SIZE in Radix MMU code · afd31356
      Michael Ellerman 提交于
      In kvmppc_unmap_free_pte() in book3s_64_mmu_radix.c, we use the
      non-constant value PTE_INDEX_SIZE to clear a PTE page.
      
      We can instead use the constant RADIX_PTE_INDEX_SIZE, because we know
      this code will only be running when the Radix MMU is active.
      
      Note that we already use RADIX_PTE_INDEX_SIZE for the allocation of
      kvm_pte_cache.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NLeonardo Bras <leonardo@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      afd31356
    • P
      KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler · cd758a9b
      Paul Mackerras 提交于
      This makes the same changes in the page fault handler for HPT guests
      that commits 31c8b0d0 ("KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot()
      in page fault handler", 2018-03-01), 71d29f43 ("KVM: PPC: Book3S HV:
      Don't use compound_order to determine host mapping size", 2018-09-11)
      and 6579804c ("KVM: PPC: Book3S HV: Avoid crash from THP collapse
      during radix page fault", 2018-10-04) made for the page fault handler
      for radix guests.
      
      In summary, where we used to call get_user_pages_fast() and then do
      special handling for VM_PFNMAP vmas, we now call __get_user_pages_fast()
      and then __gfn_to_pfn_memslot() if that fails, followed by reading the
      Linux PTE to get the host PFN, host page size and mapping attributes.
      
      This also brings in the change from SetPageDirty() to set_page_dirty_lock()
      which was done for the radix page fault handler in commit c3856aeb
      ("KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault
      handler", 2018-02-23).
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      cd758a9b
  6. 18 3月, 2020 6 次提交
  7. 17 3月, 2020 3 次提交
    • U
      KVM: VMX: access regs array in vmenter.S in its natural order · bb03911f
      Uros Bizjak 提交于
      Registers in "regs" array are indexed as rax/rcx/rdx/.../rsi/rdi/r8/...
      Reorder access to "regs" array in vmenter.S to follow its natural order.
      Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb03911f
    • V
      KVM: nVMX: properly handle errors in nested_vmx_handle_enlightened_vmptrld() · b6a0653a
      Vitaly Kuznetsov 提交于
      nested_vmx_handle_enlightened_vmptrld() fails in two cases:
      - when we fail to kvm_vcpu_map() the supplied GPA
      - when revision_id is incorrect.
      Genuine Hyper-V raises #UD in the former case (at least with *some*
      incorrect GPAs) and does VMfailInvalid() in the later. KVM doesn't do
      anything so L1 just gets stuck retrying the same faulty VMLAUNCH.
      
      nested_vmx_handle_enlightened_vmptrld() has two call sites:
      nested_vmx_run() and nested_get_vmcs12_pages(). The former needs to queue
      do much: the failure there happens after migration when L2 was running (and
      L1 did something weird like wrote to VP assist page from a different vCPU),
      just kill L1 with KVM_EXIT_INTERNAL_ERROR.
      Reported-by: NMiaohe Lin <linmiaohe@huawei.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      [Squash kbuild autopatch. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6a0653a
    • V
      KVM: nVMX: stop abusing need_vmcs12_to_shadow_sync for eVMCS mapping · e942dbf8
      Vitaly Kuznetsov 提交于
      When vmx_set_nested_state() happens, we may not have all the required
      data to map enlightened VMCS: e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not
      yet be restored so we need a postponed action. Currently, we (ab)use
      need_vmcs12_to_shadow_sync/nested_sync_vmcs12_to_shadow() for that but
      this is not ideal:
      - We may not need to sync anything if L2 is running
      - It is hard to propagate errors from nested_sync_vmcs12_to_shadow()
       as we call it from vmx_prepare_switch_to_guest() which happens just
       before we do VMLAUNCH, the code is not ready to handle errors there.
      
      Move eVMCS mapping to nested_get_vmcs12_pages() and request
      KVM_REQ_GET_VMCS12_PAGES, it seems to be is less abusive in nature.
      It would probably be possible to introduce a specialized KVM_REQ_EVMCS_MAP
      but it is undesirable to propagate eVMCS specifics all the way up to x86.c
      
      Note, we don't need to request KVM_REQ_GET_VMCS12_PAGES from
      vmx_set_nested_state() directly as nested_vmx_enter_non_root_mode() already
      does that. Requesting KVM_REQ_GET_VMCS12_PAGES is done to document the
      (non-obvious) side-effect and to be future proof.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e942dbf8