1. 01 4月, 2017 1 次提交
    • A
      powerpc/pseries: Skip using reserved virtual address range · 82228e36
      Aneesh Kumar K.V 提交于
      Now that we use all the available virtual address range, we need to make
      sure we don't generate VSID such that it overlaps with the reserved vsid
      range. Reserved vsid range include the virtual address range used by the
      adjunct partition and also the VRMA virtual segment. We find the context
      value that can result in generating such a VSID and reserve it early in
      boot.
      
      We don't look at the adjunct range, because for now we disable the
      adjunct usage in a Linux LPAR via CAS interface.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Rewrite hash__reserve_context_id(), move the rest into pseries]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      82228e36
  2. 31 1月, 2017 3 次提交
    • D
      KVM: PPC: Book3S HV: Split HPT allocation from activation · aae0777f
      David Gibson 提交于
      Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
      and sets it up as the active page table for a VM.  For the upcoming HPT
      resize implementation we're going to want to allocate HPTs separately from
      activating them.
      
      So, split the allocation itself out into kvmppc_allocate_hpt() and perform
      the activation with a new kvmppc_set_hpt() function.  Likewise we split
      kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
      which unsets it as an active HPT, then frees it.
      
      We also move the logic to fall back to smaller HPT sizes if the first try
      fails into the single caller which used that behaviour,
      kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
      that previously if the initial attempt at CMA allocation failed, we would
      fall back to attempting smaller sizes with the page allocator.  Now, we
      try first CMA, then the page allocator at each size.  As far as I can tell
      this change should be harmless.
      
      To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
      call to kvmppc_free_lpid() that was there, we move to the single caller.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      aae0777f
    • D
      KVM: PPC: Book3S HV: Don't store values derivable from HPT order · 3d089f84
      David Gibson 提交于
      Currently the kvm_hpt_info structure stores the hashed page table's order,
      and also the number of HPTEs it contains and a mask for its size.  The
      last two can be easily derived from the order, so remove them and just
      calculate them as necessary with a couple of helper inlines.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3d089f84
    • P
      KVM: PPC: Book3S HV: Add basic infrastructure for radix guests · 9e04ba69
      Paul Mackerras 提交于
      This adds a field in struct kvm_arch and an inline helper to
      indicate whether a guest is a radix guest or not, plus a new file
      to contain the radix MMU code, which currently contains just a
      translate function which knows how to traverse the guest page
      tables to translate an address.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9e04ba69
  3. 27 9月, 2016 1 次提交
  4. 09 9月, 2016 1 次提交
    • P
      powerpc/mm: Speed up computation of base and actual page size for a HPTE · 0eeede0c
      Paul Mackerras 提交于
      This replaces a 2-D search through an array with a simple 8-bit table
      lookup for determining the actual and/or base page size for a HPT entry.
      
      The encoding in the second doubleword of the HPTE is designed to encode
      the actual and base page sizes without using any more bits than would be
      needed for a 4k page number, by using between 1 and 8 low-order bits of
      the RPN (real page number) field to encode the page sizes.  A single
      "large page" bit in the first doubleword indicates that these low-order
      bits are to be interpreted like this.
      
      We can determine the page sizes by using the low-order 8 bits of the RPN
      to look up a 256-entry table.  For actual page sizes less than 1MB, some
      of the upper bits of these 8 bits are going to be real address bits, but
      we can cope with that by replicating the entries for those smaller page
      sizes.
      
      While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
      functions from a KVM-specific header to a header for 64-bit HPT systems,
      since this computation doesn't have anything specifically to do with KVM.
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0eeede0c
  5. 01 8月, 2016 1 次提交
  6. 01 5月, 2016 3 次提交
    • A
      powerpc/mm/book3s: Rename hash specific PTE bits to carry H_ prefix · 945537df
      Aneesh Kumar K.V 提交于
      This helps to make following hash only pte bits easier.
      
      We have kept _PAGE_CHG_MASK, _HPAGE_CHG_MASK and _PAGE_PROT_BITS as it
      is in this patch eventhough they use hash specific bits. Using them in
      radix as it is should be ok, because with radix we expect those bit
      positions to be zero.
      
      Only renames in this patch, no change in functionality.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      945537df
    • A
      powerpc/mm: Drop WIMG in favour of new constants · 30bda41a
      Aneesh Kumar K.V 提交于
      PowerISA 3.0 introduces two pte bits with the below meaning for radix:
        00 -> Normal Memory
        01 -> Strong Access Order (SAO)
        10 -> Non idempotent I/O (Cache inhibited and guarded)
        11 -> Tolerant I/O (Cache inhibited)
      
      We drop the existing WIMG bits in the Linux page table in favour of the
      above constants. We loose _PAGE_WRITETHRU with this conversion. We only
      use writethru via pgprot_cached_wthru() which is used by
      fbdev/controlfb.c which is Apple control display and also PPC32.
      
      With respect to _PAGE_COHERENCE, we have been marking hpte always
      coherent for some time now. htab_convert_pte_flags() always added
      HPTE_R_M.
      
      NOTE: KVM changes need closer review.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      30bda41a
    • M
      powerpc/mm: Add pte_xchg() helper · 3910a7f4
      Michael Ellerman 提交于
      We have five locations in 64-bit hash MMU code that do a cmpxchg() of a
      PTE. Currently doing it inline OK, but in a future patch we will be
      converting the PTEs to __be64 in some configs. In that case we will need
      casts at every cmpxchg() site in order to keep sparse happy.
      
      So move the logic into a helper, this is a reasonably nice cleanup on
      its own.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3910a7f4
  7. 16 2月, 2016 1 次提交
  8. 05 6月, 2015 1 次提交
  9. 21 4月, 2015 3 次提交
  10. 17 4月, 2015 2 次提交
  11. 10 4月, 2015 1 次提交
  12. 31 3月, 2015 1 次提交
  13. 17 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
  14. 15 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix computation of tlbie operand · d506735b
      Paul Mackerras 提交于
      The B (segment size) field in the RB operand for the tlbie
      instruction is two bits, which we get from the top two bits of
      the first doubleword of the HPT entry to be invalidated.  These
      bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
      bit numbering).
      
      The compute_tlbie_rb() function gets these bits as v >> (62 - 8),
      which is not correct as it will bring in the top 10 bits, not
      just the top two.  These extra bits could corrupt the AP, AVAL
      and L fields in the RB value.  To fix this we shift right 62 bits
      and then shift left 8 bits, so we only get the two bits of the
      B field.
      
      The first doubleword of the HPT entry is under the control of the
      guest kernel.  In fact, Linux guests will always put zeroes in bits
      54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
      this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d506735b
  15. 28 7月, 2014 3 次提交
  16. 25 6月, 2014 1 次提交
  17. 30 5月, 2014 1 次提交
    • A
      KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest · 1f365bb0
      Aneesh Kumar K.V 提交于
      On recent IBM Power CPUs, while the hashed page table is looked up using
      the page size from the segmentation hardware (i.e. the SLB), it is
      possible to have the HPT entry indicate a larger page size.  Thus for
      example it is possible to put a 16MB page in a 64kB segment, but since
      the hash lookup is done using a 64kB page size, it may be necessary to
      put multiple entries in the HPT for a single 16MB page.  This
      capability is called mixed page-size segment (MPSS).  With MPSS,
      there are two relevant page sizes: the base page size, which is the
      size used in searching the HPT, and the actual page size, which is the
      size indicated in the HPT entry. [ Note that the actual page size is
      always >= base page size ].
      
      We use "ibm,segment-page-sizes" device tree node to advertise
      the MPSS support to PAPR guest. The penc encoding indicates whether
      we support a specific combination of base page size and actual
      page size in the same segment. We also use the penc value in the
      LP encoding of HPTE entry.
      
      This patch exposes MPSS support to KVM guest by advertising the
      feature via "ibm,segment-page-sizes". It also adds the necessary changes
      to decode the base page size and the actual page size correctly from the
      HPTE entry.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1f365bb0
  18. 29 3月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode · 797f9c07
      Paul Mackerras 提交于
      With HV KVM, some high-frequency hypercalls such as H_ENTER are handled
      in real mode, and need to access the memslots array for the guest.
      Accessing the memslots array is safe, because we hold the SRCU read
      lock for the whole time that a guest vcpu is running.  However, the
      checks that kvm_memslots() does when lockdep is enabled are potentially
      unsafe in real mode, when only the linear mapping is available.
      Furthermore, kvm_memslots() can be called from a secondary CPU thread,
      which is an offline CPU from the point of view of the host kernel,
      and is not running the task which holds the SRCU read lock.
      
      To avoid false positives in the checks in kvm_memslots(), and to avoid
      possible side effects from doing the checks in real mode, this replaces
      kvm_memslots() with kvm_memslots_raw() in all the places that execute
      in real mode.  kvm_memslots_raw() is a new function that is like
      kvm_memslots() but uses rcu_dereference_raw_notrace() instead of
      kvm_dereference_check().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      797f9c07
  19. 17 10月, 2013 2 次提交
  20. 10 7月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Correct tlbie usage · 54480501
      Paul Mackerras 提交于
      This corrects the usage of the tlbie (TLB invalidate entry) instruction
      in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
      On the PPC970, the bit to select large vs. small page is in the instruction,
      not in the RB register value.  This changes the code to use the correct
      form on PPC970.
      
      On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
      field of the RB value incorrectly for 64k pages.  This fixes it.
      
      Since we now have several cases to handle for the tlbie instruction, this
      factors out the code to do a sequence of tlbies into a new function,
      do_tlbies(), and calls that from the various places where the code was
      doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
      use the same global_invalidates() function for determining whether to do
      local or global TLB invalidations as is used in other places, for
      consistency, and also to make sure that kvm->arch.need_tlb_flush gets
      updated properly.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54480501
  21. 08 7月, 2013 2 次提交
  22. 21 6月, 2013 1 次提交
  23. 27 4月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes · a1b4a0f6
      Paul Mackerras 提交于
      At present, the code that determines whether a HPT entry has changed,
      and thus needs to be sent to userspace when it is copying the HPT,
      doesn't consider a hardware update to the reference and change bits
      (R and C) in the HPT entries to constitute a change that needs to
      be sent to userspace.  This adds code to check for changes in R and C
      when we are scanning the HPT to find changed entries, and adds code
      to set the changed flag for the HPTE when we update the R and C bits
      in the guest view of the HPTE.
      
      Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
      as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
      function into kvm_book3s_64.h.
      
      Current Linux guest kernels don't use the hardware updates of R and C
      in the HPT, so this change won't affect them.  Linux (or other) kernels
      might in future want to use the R and C bits and have them correctly
      transferred across when a guest is migrated, so it is better to correct
      this deficiency.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a1b4a0f6
  24. 06 12月, 2012 2 次提交
    • P
      KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT · a2932923
      Paul Mackerras 提交于
      A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
      this fd return the contents of the HPT (hashed page table), writes
      create and/or remove entries in the HPT.  There is a new capability,
      KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
      takes an argument structure with the index of the first HPT entry to
      read out and a set of flags.  The flags indicate whether the user is
      intending to read or write the HPT, and whether to return all entries
      or only the "bolted" entries (those with the bolted bit, 0x10, set in
      the first doubleword).
      
      This is intended for use in implementing qemu's savevm/loadvm and for
      live migration.  Therefore, on reads, the first pass returns information
      about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
      end of the HPT, it returns from the read.  Subsequent reads only return
      information about HPTEs that have changed since they were last read.
      A read that finds no changed HPTEs in the HPT following where the last
      read finished will return 0 bytes.
      
      The format of the data provides a simple run-length compression of the
      invalid entries.  Each block of data starts with a header that indicates
      the index (position in the HPT, which is just an array), the number of
      valid entries starting at that index (may be zero), and the number of
      invalid entries following those valid entries.  The valid entries, 16
      bytes each, follow the header.  The invalid entries are not explicitly
      represented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2932923
    • P
      KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs · 44e5f6be
      Paul Mackerras 提交于
      This uses a bit in our record of the guest view of the HPTE to record
      when the HPTE gets modified.  We use a reserved bit for this, and ensure
      that this bit is always cleared in HPTE values returned to the guest.
      
      The recording of modified HPTEs is only done if other code indicates
      its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
      The reason for this is that when later commits add facilities for
      userspace to read the HPT, the first pass of reading the HPT will be
      quicker if there are no (or very few) HPTEs marked as modified,
      rather than having most HPTEs marked as modified.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      44e5f6be
  25. 30 10月, 2012 1 次提交
  26. 30 5月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
  27. 05 3月, 2012 2 次提交
    • P
      KVM: PPC: Allow for read-only pages backing a Book3S HV guest · 4cf302bc
      Paul Mackerras 提交于
      With this, if a guest does an H_ENTER with a read/write HPTE on a page
      which is currently read-only, we make the actual HPTE inserted be a
      read-only version of the HPTE.  We now intercept protection faults as
      well as HPTE not found faults, and for a protection fault we work out
      whether it should be reflected to the guest (e.g. because the guest HPTE
      didn't allow write access to usermode) or handled by switching to
      kernel context and calling kvmppc_book3s_hv_page_fault, which will then
      request write access to the page and update the actual HPTE.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4cf302bc
    • P
      KVM: PPC: Implement MMU notifiers for Book3S HV guests · 342d3db7
      Paul Mackerras 提交于
      This adds the infrastructure to enable us to page out pages underneath
      a Book3S HV guest, on processors that support virtualized partition
      memory, that is, POWER7.  Instead of pinning all the guest's pages,
      we now look in the host userspace Linux page tables to find the
      mapping for a given guest page.  Then, if the userspace Linux PTE
      gets invalidated, kvm_unmap_hva() gets called for that address, and
      we replace all the guest HPTEs that refer to that page with absent
      HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
      set, which will cause an HDSI when the guest tries to access them.
      Finally, the page fault handler is extended to reinstantiate the
      guest HPTE when the guest tries to access a page which has been paged
      out.
      
      Since we can't intercept the guest DSI and ISI interrupts on PPC970,
      we still have to pin all the guest pages on PPC970.  We have a new flag,
      kvm->arch.using_mmu_notifiers, that indicates whether we can page
      guest pages out.  If it is not set, the MMU notifier callbacks do
      nothing and everything operates as before.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      342d3db7