1. 23 11月, 2017 1 次提交
    • P
      KVM: PPC: Book3S HV: Avoid shifts by negative amounts · cda2eaa3
      Paul Mackerras 提交于
      The kvmppc_hpte_page_shifts function decodes the actual and base page
      sizes for a HPTE, returning -1 if it doesn't recognize the page size
      encoding.  This then gets used as a shift amount in various places,
      which is undefined behaviour.  This was reported by Coverity.
      
      In fact this should never occur, since we should only get HPTEs in the
      HPT which have a recognized page size encoding.  The only place where
      this might not be true is in the call to kvmppc_actual_pgsz() near the
      beginning of kvmppc_do_h_enter(), where we are validating the HPTE
      value passed in from the guest.
      
      So to fix this and eliminate the undefined behaviour, we make
      kvmppc_hpte_page_shifts return 0 for unrecognized page size encodings,
      and make kvmppc_actual_pgsz() detect that case and return 0 for the
      page size, which will then cause kvmppc_do_h_enter() to return an
      error and refuse to insert any HPTE with an unrecognized page size
      encoding.
      
      To ensure that we don't get undefined behaviour in compute_tlbie_rb(),
      we take the 4k page size path for any unrecognized page size encoding.
      This should never be hit in practice because it is only used on HPTE
      values which have previously been checked for having a recognized
      page size encoding.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      cda2eaa3
  2. 01 11月, 2017 2 次提交
    • P
      KVM: PPC: Book3S HV: Unify dirty page map between HPT and radix · e641a317
      Paul Mackerras 提交于
      Currently, the HPT code in HV KVM maintains a dirty bit per guest page
      in the rmap array, whether or not dirty page tracking has been enabled
      for the memory slot.  In contrast, the radix code maintains a dirty
      bit per guest page in memslot->dirty_bitmap, and only does so when
      dirty page tracking has been enabled.
      
      This changes the HPT code to maintain the dirty bits in the memslot
      dirty_bitmap like radix does.  This results in slightly less code
      overall, and will mean that we do not lose the dirty bits when
      transitioning between HPT and radix mode in future.
      
      There is one minor change to behaviour as a result.  With HPT, when
      dirty tracking was enabled for a memslot, we would previously clear
      all the dirty bits at that point (both in the HPT entries and in the
      rmap arrays), meaning that a KVM_GET_DIRTY_LOG ioctl immediately
      following would show no pages as dirty (assuming no vcpus have run
      in the meantime).  With this change, the dirty bits on HPT entries
      are not cleared at the point where dirty tracking is enabled, so
      KVM_GET_DIRTY_LOG would show as dirty any guest pages that are
      resident in the HPT and dirty.  This is consistent with what happens
      on radix.
      
      This also fixes a bug in the mark_pages_dirty() function for radix
      (in the sense that the function no longer exists).  In the case where
      a large page of 64 normal pages or more is marked dirty, the
      addressing of the dirty bitmap was incorrect and could write past
      the end of the bitmap.  Fortunately this case was never hit in
      practice because a 2MB large page is only 32 x 64kB pages, and we
      don't support backing the guest with 1GB huge pages at this point.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e641a317
    • P
      KVM: PPC: Book3S HV: Don't rely on host's page size information · 8dc6cca5
      Paul Mackerras 提交于
      This removes the dependence of KVM on the mmu_psize_defs array (which
      stores information about hardware support for various page sizes) and
      the things derived from it, chiefly hpte_page_sizes[], hpte_page_size(),
      hpte_actual_page_size() and get_sllp_encoding().  We also no longer
      rely on the mmu_slb_size variable or the MMU_FTR_1T_SEGMENTS feature
      bit.
      
      The reason for doing this is so we can support a HPT guest on a radix
      host.  In a radix host, the mmu_psize_defs array contains information
      about page sizes supported by the MMU in radix mode rather than the
      page sizes supported by the MMU in HPT mode.  Similarly, mmu_slb_size
      and the MMU_FTR_1T_SEGMENTS bit are not set.
      
      Instead we hard-code knowledge of the behaviour of the HPT MMU in the
      POWER7, POWER8 and POWER9 processors (which are the only processors
      supported by HV KVM) - specifically the encoding of the LP fields in
      the HPT and SLB entries, and the fact that they have 32 SLB entries
      and support 1TB segments.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8dc6cca5
  3. 01 4月, 2017 1 次提交
    • A
      powerpc/pseries: Skip using reserved virtual address range · 82228e36
      Aneesh Kumar K.V 提交于
      Now that we use all the available virtual address range, we need to make
      sure we don't generate VSID such that it overlaps with the reserved vsid
      range. Reserved vsid range include the virtual address range used by the
      adjunct partition and also the VRMA virtual segment. We find the context
      value that can result in generating such a VSID and reserve it early in
      boot.
      
      We don't look at the adjunct range, because for now we disable the
      adjunct usage in a Linux LPAR via CAS interface.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Rewrite hash__reserve_context_id(), move the rest into pseries]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      82228e36
  4. 31 1月, 2017 3 次提交
    • D
      KVM: PPC: Book3S HV: Split HPT allocation from activation · aae0777f
      David Gibson 提交于
      Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
      and sets it up as the active page table for a VM.  For the upcoming HPT
      resize implementation we're going to want to allocate HPTs separately from
      activating them.
      
      So, split the allocation itself out into kvmppc_allocate_hpt() and perform
      the activation with a new kvmppc_set_hpt() function.  Likewise we split
      kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
      which unsets it as an active HPT, then frees it.
      
      We also move the logic to fall back to smaller HPT sizes if the first try
      fails into the single caller which used that behaviour,
      kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
      that previously if the initial attempt at CMA allocation failed, we would
      fall back to attempting smaller sizes with the page allocator.  Now, we
      try first CMA, then the page allocator at each size.  As far as I can tell
      this change should be harmless.
      
      To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
      call to kvmppc_free_lpid() that was there, we move to the single caller.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      aae0777f
    • D
      KVM: PPC: Book3S HV: Don't store values derivable from HPT order · 3d089f84
      David Gibson 提交于
      Currently the kvm_hpt_info structure stores the hashed page table's order,
      and also the number of HPTEs it contains and a mask for its size.  The
      last two can be easily derived from the order, so remove them and just
      calculate them as necessary with a couple of helper inlines.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3d089f84
    • P
      KVM: PPC: Book3S HV: Add basic infrastructure for radix guests · 9e04ba69
      Paul Mackerras 提交于
      This adds a field in struct kvm_arch and an inline helper to
      indicate whether a guest is a radix guest or not, plus a new file
      to contain the radix MMU code, which currently contains just a
      translate function which knows how to traverse the guest page
      tables to translate an address.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9e04ba69
  5. 27 9月, 2016 1 次提交
  6. 09 9月, 2016 1 次提交
    • P
      powerpc/mm: Speed up computation of base and actual page size for a HPTE · 0eeede0c
      Paul Mackerras 提交于
      This replaces a 2-D search through an array with a simple 8-bit table
      lookup for determining the actual and/or base page size for a HPT entry.
      
      The encoding in the second doubleword of the HPTE is designed to encode
      the actual and base page sizes without using any more bits than would be
      needed for a 4k page number, by using between 1 and 8 low-order bits of
      the RPN (real page number) field to encode the page sizes.  A single
      "large page" bit in the first doubleword indicates that these low-order
      bits are to be interpreted like this.
      
      We can determine the page sizes by using the low-order 8 bits of the RPN
      to look up a 256-entry table.  For actual page sizes less than 1MB, some
      of the upper bits of these 8 bits are going to be real address bits, but
      we can cope with that by replicating the entries for those smaller page
      sizes.
      
      While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
      functions from a KVM-specific header to a header for 64-bit HPT systems,
      since this computation doesn't have anything specifically to do with KVM.
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0eeede0c
  7. 01 8月, 2016 1 次提交
  8. 01 5月, 2016 3 次提交
    • A
      powerpc/mm/book3s: Rename hash specific PTE bits to carry H_ prefix · 945537df
      Aneesh Kumar K.V 提交于
      This helps to make following hash only pte bits easier.
      
      We have kept _PAGE_CHG_MASK, _HPAGE_CHG_MASK and _PAGE_PROT_BITS as it
      is in this patch eventhough they use hash specific bits. Using them in
      radix as it is should be ok, because with radix we expect those bit
      positions to be zero.
      
      Only renames in this patch, no change in functionality.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      945537df
    • A
      powerpc/mm: Drop WIMG in favour of new constants · 30bda41a
      Aneesh Kumar K.V 提交于
      PowerISA 3.0 introduces two pte bits with the below meaning for radix:
        00 -> Normal Memory
        01 -> Strong Access Order (SAO)
        10 -> Non idempotent I/O (Cache inhibited and guarded)
        11 -> Tolerant I/O (Cache inhibited)
      
      We drop the existing WIMG bits in the Linux page table in favour of the
      above constants. We loose _PAGE_WRITETHRU with this conversion. We only
      use writethru via pgprot_cached_wthru() which is used by
      fbdev/controlfb.c which is Apple control display and also PPC32.
      
      With respect to _PAGE_COHERENCE, we have been marking hpte always
      coherent for some time now. htab_convert_pte_flags() always added
      HPTE_R_M.
      
      NOTE: KVM changes need closer review.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      30bda41a
    • M
      powerpc/mm: Add pte_xchg() helper · 3910a7f4
      Michael Ellerman 提交于
      We have five locations in 64-bit hash MMU code that do a cmpxchg() of a
      PTE. Currently doing it inline OK, but in a future patch we will be
      converting the PTEs to __be64 in some configs. In that case we will need
      casts at every cmpxchg() site in order to keep sparse happy.
      
      So move the logic into a helper, this is a reasonably nice cleanup on
      its own.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3910a7f4
  9. 16 2月, 2016 1 次提交
  10. 05 6月, 2015 1 次提交
  11. 21 4月, 2015 3 次提交
  12. 17 4月, 2015 2 次提交
  13. 10 4月, 2015 1 次提交
  14. 31 3月, 2015 1 次提交
  15. 17 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
  16. 15 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix computation of tlbie operand · d506735b
      Paul Mackerras 提交于
      The B (segment size) field in the RB operand for the tlbie
      instruction is two bits, which we get from the top two bits of
      the first doubleword of the HPT entry to be invalidated.  These
      bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
      bit numbering).
      
      The compute_tlbie_rb() function gets these bits as v >> (62 - 8),
      which is not correct as it will bring in the top 10 bits, not
      just the top two.  These extra bits could corrupt the AP, AVAL
      and L fields in the RB value.  To fix this we shift right 62 bits
      and then shift left 8 bits, so we only get the two bits of the
      B field.
      
      The first doubleword of the HPT entry is under the control of the
      guest kernel.  In fact, Linux guests will always put zeroes in bits
      54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
      this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d506735b
  17. 28 7月, 2014 3 次提交
  18. 25 6月, 2014 1 次提交
  19. 30 5月, 2014 1 次提交
    • A
      KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest · 1f365bb0
      Aneesh Kumar K.V 提交于
      On recent IBM Power CPUs, while the hashed page table is looked up using
      the page size from the segmentation hardware (i.e. the SLB), it is
      possible to have the HPT entry indicate a larger page size.  Thus for
      example it is possible to put a 16MB page in a 64kB segment, but since
      the hash lookup is done using a 64kB page size, it may be necessary to
      put multiple entries in the HPT for a single 16MB page.  This
      capability is called mixed page-size segment (MPSS).  With MPSS,
      there are two relevant page sizes: the base page size, which is the
      size used in searching the HPT, and the actual page size, which is the
      size indicated in the HPT entry. [ Note that the actual page size is
      always >= base page size ].
      
      We use "ibm,segment-page-sizes" device tree node to advertise
      the MPSS support to PAPR guest. The penc encoding indicates whether
      we support a specific combination of base page size and actual
      page size in the same segment. We also use the penc value in the
      LP encoding of HPTE entry.
      
      This patch exposes MPSS support to KVM guest by advertising the
      feature via "ibm,segment-page-sizes". It also adds the necessary changes
      to decode the base page size and the actual page size correctly from the
      HPTE entry.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1f365bb0
  20. 29 3月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode · 797f9c07
      Paul Mackerras 提交于
      With HV KVM, some high-frequency hypercalls such as H_ENTER are handled
      in real mode, and need to access the memslots array for the guest.
      Accessing the memslots array is safe, because we hold the SRCU read
      lock for the whole time that a guest vcpu is running.  However, the
      checks that kvm_memslots() does when lockdep is enabled are potentially
      unsafe in real mode, when only the linear mapping is available.
      Furthermore, kvm_memslots() can be called from a secondary CPU thread,
      which is an offline CPU from the point of view of the host kernel,
      and is not running the task which holds the SRCU read lock.
      
      To avoid false positives in the checks in kvm_memslots(), and to avoid
      possible side effects from doing the checks in real mode, this replaces
      kvm_memslots() with kvm_memslots_raw() in all the places that execute
      in real mode.  kvm_memslots_raw() is a new function that is like
      kvm_memslots() but uses rcu_dereference_raw_notrace() instead of
      kvm_dereference_check().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      797f9c07
  21. 17 10月, 2013 2 次提交
  22. 10 7月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Correct tlbie usage · 54480501
      Paul Mackerras 提交于
      This corrects the usage of the tlbie (TLB invalidate entry) instruction
      in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
      On the PPC970, the bit to select large vs. small page is in the instruction,
      not in the RB register value.  This changes the code to use the correct
      form on PPC970.
      
      On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
      field of the RB value incorrectly for 64k pages.  This fixes it.
      
      Since we now have several cases to handle for the tlbie instruction, this
      factors out the code to do a sequence of tlbies into a new function,
      do_tlbies(), and calls that from the various places where the code was
      doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
      use the same global_invalidates() function for determining whether to do
      local or global TLB invalidations as is used in other places, for
      consistency, and also to make sure that kvm->arch.need_tlb_flush gets
      updated properly.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54480501
  23. 08 7月, 2013 2 次提交
  24. 21 6月, 2013 1 次提交
  25. 27 4月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes · a1b4a0f6
      Paul Mackerras 提交于
      At present, the code that determines whether a HPT entry has changed,
      and thus needs to be sent to userspace when it is copying the HPT,
      doesn't consider a hardware update to the reference and change bits
      (R and C) in the HPT entries to constitute a change that needs to
      be sent to userspace.  This adds code to check for changes in R and C
      when we are scanning the HPT to find changed entries, and adds code
      to set the changed flag for the HPTE when we update the R and C bits
      in the guest view of the HPTE.
      
      Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
      as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
      function into kvm_book3s_64.h.
      
      Current Linux guest kernels don't use the hardware updates of R and C
      in the HPT, so this change won't affect them.  Linux (or other) kernels
      might in future want to use the R and C bits and have them correctly
      transferred across when a guest is migrated, so it is better to correct
      this deficiency.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a1b4a0f6
  26. 06 12月, 2012 2 次提交
    • P
      KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT · a2932923
      Paul Mackerras 提交于
      A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
      this fd return the contents of the HPT (hashed page table), writes
      create and/or remove entries in the HPT.  There is a new capability,
      KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
      takes an argument structure with the index of the first HPT entry to
      read out and a set of flags.  The flags indicate whether the user is
      intending to read or write the HPT, and whether to return all entries
      or only the "bolted" entries (those with the bolted bit, 0x10, set in
      the first doubleword).
      
      This is intended for use in implementing qemu's savevm/loadvm and for
      live migration.  Therefore, on reads, the first pass returns information
      about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
      end of the HPT, it returns from the read.  Subsequent reads only return
      information about HPTEs that have changed since they were last read.
      A read that finds no changed HPTEs in the HPT following where the last
      read finished will return 0 bytes.
      
      The format of the data provides a simple run-length compression of the
      invalid entries.  Each block of data starts with a header that indicates
      the index (position in the HPT, which is just an array), the number of
      valid entries starting at that index (may be zero), and the number of
      invalid entries following those valid entries.  The valid entries, 16
      bytes each, follow the header.  The invalid entries are not explicitly
      represented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2932923
    • P
      KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs · 44e5f6be
      Paul Mackerras 提交于
      This uses a bit in our record of the guest view of the HPTE to record
      when the HPTE gets modified.  We use a reserved bit for this, and ensure
      that this bit is always cleared in HPTE values returned to the guest.
      
      The recording of modified HPTEs is only done if other code indicates
      its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
      The reason for this is that when later commits add facilities for
      userspace to read the HPT, the first pass of reading the HPT will be
      quicker if there are no (or very few) HPTEs marked as modified,
      rather than having most HPTEs marked as modified.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      44e5f6be
  27. 30 10月, 2012 1 次提交