1. 21 12月, 2018 1 次提交
  2. 22 5月, 2018 1 次提交
  3. 19 3月, 2018 1 次提交
  4. 14 10月, 2017 1 次提交
  5. 17 8月, 2017 1 次提交
    • A
      powerpc/mm: Rename find_linux_pte_or_hugepte() · 94171b19
      Aneesh Kumar K.V 提交于
      Add newer helpers to make the function usage simpler. It is always
      recommended to use find_current_mm_pte() for walking the page table.
      If we cannot use find_current_mm_pte(), it should be documented why
      the said usage of __find_linux_pte() is safe against a parallel THP
      split.
      
      For now we have KVM code using __find_linux_pte(). This is because kvm
      code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
      with PACA soft_enabled = 1. We may want to fix that later and make
      sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
      we can switch kvm to use find_linux_pte().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      94171b19
  6. 20 4月, 2017 1 次提交
  7. 02 3月, 2017 1 次提交
  8. 16 1月, 2016 1 次提交
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
  9. 15 10月, 2015 1 次提交
  10. 12 10月, 2015 1 次提交
    • A
      powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6
      Aneesh Kumar K.V 提交于
      We need to properly identify whether a hugepage is an explicit or
      a transparent hugepage in follow_huge_addr(). We used to depend
      on hugepage shift argument to do that. But in some case that can
      result in wrong results. For ex:
      
      On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
      But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
      We do prevent reusing the pfn page via the usage of
      kick_all_cpus_sync(). But that happens after we updated the pte to 0.
      Hence in follow_huge_addr() we can find hugepage shift set, but transparent
      huge page check fail for a thp pte.
      
      NOTE: We fixed a variant of this race against thp split in commit
      691e95fd
      ("powerpc/mm/thp: Make page table walk safe against thp split/collapse")
      
      Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
      follow_page_mask occasionally.
      
      In the long term, we may want to switch ppc64 64k page size config to
      enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      891121e6
  11. 17 4月, 2015 3 次提交
  12. 20 11月, 2014 1 次提交
  13. 24 9月, 2014 1 次提交
    • A
      kvm: Fix page ageing bugs · 57128468
      Andres Lagar-Cavilla 提交于
      1. We were calling clear_flush_young_notify in unmap_one, but we are
      within an mmu notifier invalidate range scope. The spte exists no more
      (due to range_start) and the accessed bit info has already been
      propagated (due to kvm_pfn_set_accessed). Simply call
      clear_flush_young.
      
      2. We clear_flush_young on a primary MMU PMD, but this may be mapped
      as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
      This required expanding the interface of the clear_flush_young mmu
      notifier, so a lot of code has been trivially touched.
      
      3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
      the access bit by blowing the spte. This requires proper synchronizing
      with MMU notifier consumers, like every other removal of spte's does.
      Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      57128468
  14. 22 9月, 2014 1 次提交
    • M
      KVM: PPC: e500mc: Add support for single threaded vcpus on e6500 core · 188e267c
      Mihai Caraman 提交于
      ePAPR represents hardware threads as cpu node properties in device tree.
      So with existing QEMU, hardware threads are simply exposed as vcpus with
      one hardware thread.
      
      The e6500 core shares TLBs between hardware threads. Without tlb write
      conditional instruction, the Linux kernel uses per core mechanisms to
      protect against duplicate TLB entries.
      
      The guest is unable to detect real siblings threads, so it can't use the
      TLB protection mechanism. An alternative solution is to use the hypervisor
      to allocate different lpids to guest's vcpus that runs simultaneous on real
      siblings threads. On systems with two threads per core this patch halves
      the size of the lpid pool that the allocator sees and use two lpids per VM.
      Use even numbers to speedup vcpu lpid computation with consecutive lpids
      per VM: vm1 will use lpids 2 and 3, vm2 lpids 4 and 5, and so on.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      [agraf: fix spelling]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      188e267c
  15. 28 7月, 2014 3 次提交
    • M
      KVM: PPC: Bookehv: Get vcpu's last instruction for emulation · f5250471
      Mihai Caraman 提交于
      On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
      guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
      and LRAT), generated by loading a guest address, needs to be handled by KVM.
      These exceptions are generated in a substituted guest translation context
      (EPLC[EGS] = 1) from host context (MSR[GS] = 0).
      
      Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1),
      doing minimal checks on the fast path to avoid host performance degradation.
      lwepx exceptions originate from host state (MSR[GS] = 0) which implies
      additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking
      at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context
      Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
      too intrusive for the host.
      
      Read guest last instruction from kvmppc_load_last_inst() by searching for the
      physical address and kmap it. This address the TODO for TLB eviction and
      execute-but-not-read entries, and allow us to get rid of lwepx until we are
      able to handle failures.
      
      A simple stress benchmark shows a 1% sys performance degradation compared with
      previous approach (lwepx without failure handling):
      
      time for i in `seq 1 10000`; do /bin/echo > /dev/null; done
      
      real    0m 8.85s
      user    0m 4.34s
      sys     0m 4.48s
      
      vs
      
      real    0m 8.84s
      user    0m 4.36s
      sys     0m 4.44s
      
      A solution to use lwepx and to handle its exceptions in KVM would be to temporary
      highjack the interrupt vector from host. This imposes additional synchronizations
      for cores like FSL e6500 that shares host IVOR registers between hardware threads.
      This optimized solution can be later developed on top of this patch.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f5250471
    • M
      KVM: PPC: Allow kvmppc_get_last_inst() to fail · 51f04726
      Mihai Caraman 提交于
      On book3e, guest last instruction is read on the exit path using load
      external pid (lwepx) dedicated instruction. This load operation may fail
      due to TLB eviction and execute-but-not-read entries.
      
      This patch lay down the path for an alternative solution to read the guest
      last instruction, by allowing kvmppc_get_lat_inst() function to fail.
      Architecture specific implmentations of kvmppc_load_last_inst() may read
      last guest instruction and instruct the emulation layer to re-execute the
      guest in case of failure.
      
      Make kvmppc_get_last_inst() definition common between architectures.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      51f04726
    • M
      KVM: PPC: e500: Fix default tlb for victim hint · d57cef91
      Mihai Caraman 提交于
      Tlb search operation used for victim hint relies on the default tlb set by the
      host. When hardware tablewalk support is enabled in the host, the default tlb is
      TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb
      when searching for victim hint.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Reviewed-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d57cef91
  16. 24 6月, 2014 1 次提交
  17. 09 1月, 2014 2 次提交
    • B
      kvm: powerpc: use caching attributes as per linux pte · 08c9a188
      Bharat Bhushan 提交于
      KVM uses same WIM tlb attributes as the corresponding qemu pte.
      For this we now search the linux pte for the requested page and
      get these cache caching/coherency attributes from pte.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Reviewed-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      08c9a188
    • B
      kvm: booke: clear host tlb reference flag on guest tlb invalidation · 30a91fe2
      Bharat Bhushan 提交于
      On booke, "struct tlbe_ref" contains host tlb mapping information
      (pfn: for guest-pfn to pfn, flags: attribute associated with this mapping)
      for a guest tlb entry. So when a guest creates a TLB entry then
      "struct tlbe_ref" is set to point to valid "pfn" and set attributes in
      "flags" field of the above said structure. When a guest TLB entry is
      invalidated then flags field of corresponding "struct tlbe_ref" is
      updated to point that this is no more valid, also we selectively clear
      some other attribute bits, example: if E500_TLB_BITMAP was set then we clear
      E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this.
      
      Ideally we should clear complete "flags" as this entry is invalid and does not
      have anything to re-used. The other part of the problem is that when we use
      the same entry again then also we do not clear (started doing or-ing etc).
      
      So far it was working because the selectively clearing mentioned above
      actually clears "flags" what was set during TLB mapping. But the problem
      starts coming when we add more attributes to this then we need to selectively
      clear them and which is not needed.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Reviewed-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      30a91fe2
  18. 17 10月, 2013 2 次提交
  19. 10 10月, 2013 1 次提交
    • B
      kvm: ppc: booke: check range page invalidation progress on page setup · 40fde70d
      Bharat Bhushan 提交于
      When the MM code is invalidating a range of pages, it calls the KVM
      kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
      kvm_unmap_hva_range(), which arranges to flush all the TLBs for guest pages.
      However, the Linux PTEs for the range being flushed are still valid at
      that point.  We are not supposed to establish any new references to pages
      in the range until the ...range_end() notifier gets called.
      The PPC-specific KVM code doesn't get any explicit notification of that;
      instead, we are supposed to use mmu_notifier_retry() to test whether we
      are or have been inside a range flush notifier pair while we have been
      referencing a page.
      
      This patch calls the mmu_notifier_retry() while mapping the guest
      page to ensure we are not referencing a page when in range invalidation.
      
      This call is inside a region locked with kvm->mmu_lock, which is the
      same lock that is called by the KVM MMU notifier functions, thus
      ensuring that no new notification can proceed while we are in the
      locked region.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Acked-by: NAlexander Graf <agraf@suse.de>
      [Backported to 3.12 - Paolo]
      Reviewed-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      40fde70d
  20. 11 4月, 2013 3 次提交
    • S
      kvm/ppc/e500: eliminate tlb_refs · 4d2be6f7
      Scott Wood 提交于
      Commit 523f0e54 ("KVM: PPC: E500:
      Explicitly mark shadow maps invalid") began using E500_TLB_VALID
      for guest TLB1 entries, and skipping invalidations if it's not set.
      
      However, when E500_TLB_VALID was set for such entries, it was on a
      fake local ref, and so the invalidations never happen.  gtlb_privs
      is documented as being only for guest TLB0, though we already violate
      that with E500_TLB_BITMAP.
      
      Now that we have MMU notifiers, and thus don't need to actually
      retain a reference to the mapped pages, get rid of tlb_refs, and
      use gtlb_privs for E500_TLB_VALID in TLB1.
      
      Since we can have more than one host TLB entry for a given tlbe_ref,
      be careful not to clear existing flags that are relevant to other
      host TLB entries when preparing a new host TLB entry.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4d2be6f7
    • S
      kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit · 66a5fecd
      Scott Wood 提交于
      It's possible that we're using the same host TLB1 slot to map (a
      presumably different portion of) the same guest TLB1 entry.  Clear
      the bit in the map before setting it, so that if the esels are the same
      the bit will remain set.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      66a5fecd
    • S
      kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid · 6b2ba1a9
      Scott Wood 提交于
      Add one to esel values in h2g_tlb1_rmap, so that "no mapping" can be
      distinguished from "esel 0".  Note that we're not saved by the fact
      that host esel 0 is reserved for non-KVM use, because KVM host esel
      numbering is not the raw host numbering (see to_htlb1_esel).
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6b2ba1a9
  21. 22 3月, 2013 3 次提交
    • S
      kvm/ppc/e500: eliminate tlb_refs · 47bf3797
      Scott Wood 提交于
      Commit 523f0e54 ("KVM: PPC: E500:
      Explicitly mark shadow maps invalid") began using E500_TLB_VALID
      for guest TLB1 entries, and skipping invalidations if it's not set.
      
      However, when E500_TLB_VALID was set for such entries, it was on a
      fake local ref, and so the invalidations never happen.  gtlb_privs
      is documented as being only for guest TLB0, though we already violate
      that with E500_TLB_BITMAP.
      
      Now that we have MMU notifiers, and thus don't need to actually
      retain a reference to the mapped pages, get rid of tlb_refs, and
      use gtlb_privs for E500_TLB_VALID in TLB1.
      
      Since we can have more than one host TLB entry for a given tlbe_ref,
      be careful not to clear existing flags that are relevant to other
      host TLB entries when preparing a new host TLB entry.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      47bf3797
    • S
      kvm/ppc/e500: g2h_tlb1_map: clear old bit before setting new bit · 36ada4f4
      Scott Wood 提交于
      It's possible that we're using the same host TLB1 slot to map (a
      presumably different portion of) the same guest TLB1 entry.  Clear
      the bit in the map before setting it, so that if the esels are the same
      the bit will remain set.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      36ada4f4
    • S
      kvm/ppc/e500: h2g_tlb1_rmap: esel 0 is valid · d6940b64
      Scott Wood 提交于
      Add one to esel values in h2g_tlb1_rmap, so that "no mapping" can be
      distinguished from "esel 0".  Note that we're not saved by the fact
      that host esel 0 is reserved for non-KVM use, because KVM host esel
      numbering is not the raw host numbering (see to_htlb1_esel).
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d6940b64
  22. 25 1月, 2013 3 次提交
    • A
      KVM: PPC: E500: Make clear_tlb_refs and clear_tlb1_bitmap static · 483ba97c
      Alexander Graf 提交于
      Host shadow TLB flushing is logic that the guest TLB code should have
      no insight about. Declare the internal clear_tlb_refs and clear_tlb1_bitmap
      functions static to the host TLB handling file.
      
      Instead of these, we can use the already exported kvmppc_core_flush_tlb().
      This gives us a common API across the board to say "please flush any
      pending host shadow translation".
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      483ba97c
    • A
      KVM: PPC: e500: Implement TLB1-in-TLB0 mapping · c015c62b
      Alexander Graf 提交于
      When a host mapping fault happens in a guest TLB1 entry today, we
      map the translated guest entry into the host's TLB1.
      
      This isn't particularly clever when the guest is mapped by normal 4k
      pages, since these would be a lot better to put into TLB0 instead.
      
      This patch adds the required logic to map 4k TLB1 shadow maps into
      the host's TLB0.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c015c62b
    • A
      KVM: PPC: E500: Split host and guest MMU parts · b71c9e2f
      Alexander Graf 提交于
      This patch splits the file e500_tlb.c into e500_mmu.c (guest TLB handling)
      and e500_mmu_host.c (host TLB handling).
      
      The main benefit of this split is readability and maintainability. It's
      just a lot harder to write dirty code :).
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b71c9e2f