1. 18 5月, 2018 1 次提交
  2. 17 5月, 2018 2 次提交
  3. 15 5月, 2018 1 次提交
  4. 28 3月, 2018 1 次提交
    • P
      KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler · 31c8b0d0
      Paul Mackerras 提交于
      This changes the hypervisor page fault handler for radix guests to use
      the generic KVM __gfn_to_pfn_memslot() function instead of using
      get_user_pages_fast() and then handling the case of VM_PFNMAP vmas
      specially.  The old code missed the case of VM_IO vmas; with this
      change, VM_IO vmas will now be handled correctly by code within
      __gfn_to_pfn_memslot.
      
      Currently, __gfn_to_pfn_memslot calls hva_to_pfn, which only uses
      __get_user_pages_fast for the initial lookup in the cases where
      either atomic or async is set.  Since we are not setting either
      atomic or async, we do our own __get_user_pages_fast first, for now.
      
      This also adds code to check for the KVM_MEM_READONLY flag on the
      memslot.  If it is set and this is a write access, we synthesize a
      data storage interrupt for the guest.
      
      In the case where the page is not normal RAM (i.e. page == NULL in
      kvmppc_book3s_radix_page_fault(), we read the PTE from the Linux page
      tables because we need the mapping attribute bits as well as the PFN.
      (The mapping attribute bits indicate whether accesses have to be
      non-cacheable and/or guarded.)
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      31c8b0d0
  5. 23 3月, 2018 1 次提交
  6. 19 3月, 2018 3 次提交
    • P
      KVM: PPC: Book3S HV: Handle 1GB pages in radix page fault handler · 58c5c276
      Paul Mackerras 提交于
      This adds code to the radix hypervisor page fault handler to handle the
      case where the guest memory is backed by 1GB hugepages, and put them
      into the partition-scoped radix tree at the PUD level.  The code is
      essentially analogous to the code for 2MB pages.  This also rearranges
      kvmppc_create_pte() to make it easier to follow.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      58c5c276
    • P
      KVM: PPC: Book3S HV: Streamline setting of reference and change bits · f7caf712
      Paul Mackerras 提交于
      When using the radix MMU, we can get hypervisor page fault interrupts
      with the DSISR_SET_RC bit set in DSISR/HSRR1, indicating that an
      attempt to set the R (reference) or C (change) bit in a PTE atomically
      failed.  Previously we would find the corresponding Linux PTE and
      check the permission and dirty bits there, but this is not really
      necessary since we only need to do what the hardware was trying to
      do, namely set R or C atomically.  This removes the code that reads
      the Linux PTE and just update the partition-scoped PTE, having first
      checked that it is still present, and if the access is a write, that
      the PTE still has write permission.
      
      Furthermore, we now check whether any other relevant bits are set
      in DSISR, and if there are, then we proceed with the rest of the
      function in order to handle whatever condition they represent,
      instead of returning to the guest as we did previously.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f7caf712
    • P
      KVM: PPC: Book3S HV: Radix page fault handler optimizations · c4c8a764
      Paul Mackerras 提交于
      This improves the handling of transparent huge pages in the radix
      hypervisor page fault handler.  Previously, if a small page is faulted
      in to a 2MB region of guest physical space, that means that there is
      a page table pointer at the PMD level, which could never be replaced
      by a leaf (2MB) PMD entry.  This adds the code to clear the PMD,
      invlidate the page walk cache and free the page table page in this
      situation, so that the leaf PMD entry can be created.
      
      This also adds code to check whether a PMD or PTE being inserted is
      the same as is already there (because of a race with another CPU that
      faulted on the same page) and if so, we don't replace the existing
      entry, meaning that we don't invalidate the PTE or PMD and do a TLB
      invalidation.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      c4c8a764
  7. 02 3月, 2018 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler · c3856aeb
      Paul Mackerras 提交于
      This fixes several bugs in the radix page fault handler relating to
      the way large pages in the memory backing the guest were handled.
      First, the check for large pages only checked for explicit huge pages
      and missed transparent huge pages.  Then the check that the addresses
      (host virtual vs. guest physical) had appropriate alignment was
      wrong, meaning that the code never put a large page in the partition
      scoped radix tree; it was always demoted to a small page.
      
      Fixing this exposed bugs in kvmppc_create_pte().  We were never
      invalidating a 2MB PTE, which meant that if a page was initially
      faulted in without write permission and the guest then attempted
      to store to it, we would never update the PTE to have write permission.
      If we find a valid 2MB PTE in the PMD, we need to clear it and
      do a TLB invalidation before installing either the new 2MB PTE or
      a pointer to a page table page.
      
      This also corrects an assumption that get_user_pages_fast would set
      the _PAGE_DIRTY bit if we are writing, which is not true.  Instead we
      mark the page dirty explicitly with set_page_dirty_lock().  This
      also means we don't need the dirty bit set on the host PTE when
      providing write access on a read fault.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      c3856aeb
  8. 23 11月, 2017 1 次提交
  9. 01 11月, 2017 2 次提交
    • P
      KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host · 18c3640c
      Paul Mackerras 提交于
      This sets up the machinery for switching a guest between HPT (hashed
      page table) and radix MMU modes, so that in future we can run a HPT
      guest on a radix host on POWER9 machines.
      
      * The KVM_PPC_CONFIGURE_V3_MMU ioctl can now specify either HPT or
        radix mode, on a radix host.
      
      * The KVM_CAP_PPC_MMU_HASH_V3 capability now returns 1 on POWER9
        with HV KVM on a radix host.
      
      * The KVM_PPC_GET_SMMU_INFO returns information about the HPT MMU on a
        radix host.
      
      * The KVM_PPC_ALLOCATE_HTAB ioctl on a radix host will switch the
        guest to HPT mode and allocate a HPT.
      
      * For simplicity, we now allocate the rmap array for each memslot,
        even on a radix host, since it will be needed if the guest switches
        to HPT mode.
      
      * Since we cannot yet run a HPT guest on a radix host, the KVM_RUN
        ioctl will return an EINVAL error in that case.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      18c3640c
    • P
      KVM: PPC: Book3S HV: Unify dirty page map between HPT and radix · e641a317
      Paul Mackerras 提交于
      Currently, the HPT code in HV KVM maintains a dirty bit per guest page
      in the rmap array, whether or not dirty page tracking has been enabled
      for the memory slot.  In contrast, the radix code maintains a dirty
      bit per guest page in memslot->dirty_bitmap, and only does so when
      dirty page tracking has been enabled.
      
      This changes the HPT code to maintain the dirty bits in the memslot
      dirty_bitmap like radix does.  This results in slightly less code
      overall, and will mean that we do not lose the dirty bits when
      transitioning between HPT and radix mode in future.
      
      There is one minor change to behaviour as a result.  With HPT, when
      dirty tracking was enabled for a memslot, we would previously clear
      all the dirty bits at that point (both in the HPT entries and in the
      rmap arrays), meaning that a KVM_GET_DIRTY_LOG ioctl immediately
      following would show no pages as dirty (assuming no vcpus have run
      in the meantime).  With this change, the dirty bits on HPT entries
      are not cleared at the point where dirty tracking is enabled, so
      KVM_GET_DIRTY_LOG would show as dirty any guest pages that are
      resident in the HPT and dirty.  This is consistent with what happens
      on radix.
      
      This also fixes a bug in the mark_pages_dirty() function for radix
      (in the sense that the function no longer exists).  In the case where
      a large page of 64 normal pages or more is marked dirty, the
      addressing of the dirty bitmap was incorrect and could write past
      the end of the bitmap.  Fortunately this case was never hit in
      practice because a 2MB large page is only 32 x 64kB pages, and we
      don't support backing the guest with 1GB huge pages at this point.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e641a317
  10. 17 8月, 2017 1 次提交
    • A
      powerpc/mm: Rename find_linux_pte_or_hugepte() · 94171b19
      Aneesh Kumar K.V 提交于
      Add newer helpers to make the function usage simpler. It is always
      recommended to use find_current_mm_pte() for walking the page table.
      If we cannot use find_current_mm_pte(), it should be documented why
      the said usage of __find_linux_pte() is safe against a parallel THP
      split.
      
      For now we have KVM code using __find_linux_pte(). This is because kvm
      code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
      with PACA soft_enabled = 1. We may want to fix that later and make
      sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
      we can switch kvm to use find_linux_pte().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      94171b19
  11. 03 8月, 2017 1 次提交
  12. 01 3月, 2017 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix software walk of guest process page tables · 70cd4c10
      Paul Mackerras 提交于
      This fixes some bugs in the code that walks the guest's page tables.
      These bugs cause MMIO emulation to fail whenever the guest is in
      virtial mode (MMU on), leading to the guest hanging if it tried to
      access a virtio device.
      
      The first bug was that when reading the guest's process table, we were
      using the whole of arch->process_table, not just the field that contains
      the process table base address.  The second bug was that the mask used
      when reading the process table entry to get the radix tree base address,
      RPDB_MASK, had the wrong value.
      
      Fixes: 9e04ba69 ("KVM: PPC: Book3S HV: Add basic infrastructure for radix guests")
      Fixes: e9983344 ("powerpc/mm/radix: Add partition table format & callback")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      70cd4c10
  13. 31 1月, 2017 5 次提交
    • P
      KVM: PPC: Book3S HV: Enable radix guest support · 8cf4ecc0
      Paul Mackerras 提交于
      This adds a few last pieces of the support for radix guests:
      
      * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
        KVM_PPC_GET_RMMU_INFO ioctls for radix guests
      
      * On POWER9, allow secondary threads to be on/off-lined while guests
        are running.
      
      * Set up LPCR and the partition table entry for radix guests.
      
      * Don't allocate the rmap array in the kvm_memory_slot structure
        on radix.
      
      * Don't try to initialize the HPT for radix guests, since they don't
        have an HPT.
      
      * Take out the code that prevents the HV KVM module from
        initializing on radix hosts.
      
      At this stage, we only support radix guests if the host is running
      in radix mode, and only support HPT guests if the host is running in
      HPT mode.  Thus a guest cannot switch from one mode to the other,
      which enables some simplifications.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8cf4ecc0
    • P
      KVM: PPC: Book3S HV: Implement dirty page logging for radix guests · 8f7b79b8
      Paul Mackerras 提交于
      This adds code to keep track of dirty pages when requested (that is,
      when memslot->dirty_bitmap is non-NULL) for radix guests.  We use the
      dirty bits in the PTEs in the second-level (partition-scoped) page
      tables, together with a bitmap of pages that were dirty when their
      PTE was invalidated (e.g., when the page was paged out).  This bitmap
      is stored in the first half of the memslot->dirty_bitmap area, and
      kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
      bitmap that gets returned to userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8f7b79b8
    • P
      KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests · 01756099
      Paul Mackerras 提交于
      This adapts our implementations of the MMU notifier callbacks
      (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
      to call radix functions when the guest is using radix.  These
      implementations are much simpler than for HPT guests because we
      have only one PTE to deal with, so we don't need to traverse
      rmap chains.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      01756099
    • P
      KVM: PPC: Book3S HV: Page table construction and page faults for radix guests · 5a319350
      Paul Mackerras 提交于
      This adds the code to construct the second-level ("partition-scoped" in
      architecturese) page tables for guests using the radix MMU.  Apart from
      the PGD level, which is allocated when the guest is created, the rest
      of the tree is all constructed in response to hypervisor page faults.
      
      As well as hypervisor page faults for missing pages, we also get faults
      for reference/change (RC) bits needing to be set, as well as various
      other error conditions.  For now, we only set the R or C bit in the
      guest page table if the same bit is set in the host PTE for the
      backing page.
      
      This code can take advantage of the guest being backed with either
      transparent or ordinary 2MB huge pages, and insert 2MB page entries
      into the guest page tables.  There is no support for 1GB huge pages
      yet.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5a319350
    • P
      KVM: PPC: Book3S HV: Add basic infrastructure for radix guests · 9e04ba69
      Paul Mackerras 提交于
      This adds a field in struct kvm_arch and an inline helper to
      indicate whether a guest is a radix guest or not, plus a new file
      to contain the radix MMU code, which currently contains just a
      translate function which knows how to traverse the guest page
      tables to translate an address.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9e04ba69