1. 31 1月, 2017 1 次提交
  2. 27 9月, 2016 1 次提交
  3. 09 9月, 2016 1 次提交
    • P
      powerpc/mm: Speed up computation of base and actual page size for a HPTE · 0eeede0c
      Paul Mackerras 提交于
      This replaces a 2-D search through an array with a simple 8-bit table
      lookup for determining the actual and/or base page size for a HPT entry.
      
      The encoding in the second doubleword of the HPTE is designed to encode
      the actual and base page sizes without using any more bits than would be
      needed for a 4k page number, by using between 1 and 8 low-order bits of
      the RPN (real page number) field to encode the page sizes.  A single
      "large page" bit in the first doubleword indicates that these low-order
      bits are to be interpreted like this.
      
      We can determine the page sizes by using the low-order 8 bits of the RPN
      to look up a 256-entry table.  For actual page sizes less than 1MB, some
      of the upper bits of these 8 bits are going to be real address bits, but
      we can cope with that by replicating the entries for those smaller page
      sizes.
      
      While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
      functions from a KVM-specific header to a header for 64-bit HPT systems,
      since this computation doesn't have anything specifically to do with KVM.
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0eeede0c
  4. 01 8月, 2016 1 次提交
  5. 01 5月, 2016 3 次提交
    • A
      powerpc/mm/book3s: Rename hash specific PTE bits to carry H_ prefix · 945537df
      Aneesh Kumar K.V 提交于
      This helps to make following hash only pte bits easier.
      
      We have kept _PAGE_CHG_MASK, _HPAGE_CHG_MASK and _PAGE_PROT_BITS as it
      is in this patch eventhough they use hash specific bits. Using them in
      radix as it is should be ok, because with radix we expect those bit
      positions to be zero.
      
      Only renames in this patch, no change in functionality.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      945537df
    • A
      powerpc/mm: Drop WIMG in favour of new constants · 30bda41a
      Aneesh Kumar K.V 提交于
      PowerISA 3.0 introduces two pte bits with the below meaning for radix:
        00 -> Normal Memory
        01 -> Strong Access Order (SAO)
        10 -> Non idempotent I/O (Cache inhibited and guarded)
        11 -> Tolerant I/O (Cache inhibited)
      
      We drop the existing WIMG bits in the Linux page table in favour of the
      above constants. We loose _PAGE_WRITETHRU with this conversion. We only
      use writethru via pgprot_cached_wthru() which is used by
      fbdev/controlfb.c which is Apple control display and also PPC32.
      
      With respect to _PAGE_COHERENCE, we have been marking hpte always
      coherent for some time now. htab_convert_pte_flags() always added
      HPTE_R_M.
      
      NOTE: KVM changes need closer review.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      30bda41a
    • M
      powerpc/mm: Add pte_xchg() helper · 3910a7f4
      Michael Ellerman 提交于
      We have five locations in 64-bit hash MMU code that do a cmpxchg() of a
      PTE. Currently doing it inline OK, but in a future patch we will be
      converting the PTEs to __be64 in some configs. In that case we will need
      casts at every cmpxchg() site in order to keep sparse happy.
      
      So move the logic into a helper, this is a reasonably nice cleanup on
      its own.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3910a7f4
  6. 16 2月, 2016 1 次提交
  7. 05 6月, 2015 1 次提交
  8. 21 4月, 2015 3 次提交
  9. 17 4月, 2015 2 次提交
  10. 10 4月, 2015 1 次提交
  11. 31 3月, 2015 1 次提交
  12. 17 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
  13. 15 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix computation of tlbie operand · d506735b
      Paul Mackerras 提交于
      The B (segment size) field in the RB operand for the tlbie
      instruction is two bits, which we get from the top two bits of
      the first doubleword of the HPT entry to be invalidated.  These
      bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
      bit numbering).
      
      The compute_tlbie_rb() function gets these bits as v >> (62 - 8),
      which is not correct as it will bring in the top 10 bits, not
      just the top two.  These extra bits could corrupt the AP, AVAL
      and L fields in the RB value.  To fix this we shift right 62 bits
      and then shift left 8 bits, so we only get the two bits of the
      B field.
      
      The first doubleword of the HPT entry is under the control of the
      guest kernel.  In fact, Linux guests will always put zeroes in bits
      54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
      this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d506735b
  14. 28 7月, 2014 3 次提交
  15. 25 6月, 2014 1 次提交
  16. 30 5月, 2014 1 次提交
    • A
      KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest · 1f365bb0
      Aneesh Kumar K.V 提交于
      On recent IBM Power CPUs, while the hashed page table is looked up using
      the page size from the segmentation hardware (i.e. the SLB), it is
      possible to have the HPT entry indicate a larger page size.  Thus for
      example it is possible to put a 16MB page in a 64kB segment, but since
      the hash lookup is done using a 64kB page size, it may be necessary to
      put multiple entries in the HPT for a single 16MB page.  This
      capability is called mixed page-size segment (MPSS).  With MPSS,
      there are two relevant page sizes: the base page size, which is the
      size used in searching the HPT, and the actual page size, which is the
      size indicated in the HPT entry. [ Note that the actual page size is
      always >= base page size ].
      
      We use "ibm,segment-page-sizes" device tree node to advertise
      the MPSS support to PAPR guest. The penc encoding indicates whether
      we support a specific combination of base page size and actual
      page size in the same segment. We also use the penc value in the
      LP encoding of HPTE entry.
      
      This patch exposes MPSS support to KVM guest by advertising the
      feature via "ibm,segment-page-sizes". It also adds the necessary changes
      to decode the base page size and the actual page size correctly from the
      HPTE entry.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1f365bb0
  17. 29 3月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode · 797f9c07
      Paul Mackerras 提交于
      With HV KVM, some high-frequency hypercalls such as H_ENTER are handled
      in real mode, and need to access the memslots array for the guest.
      Accessing the memslots array is safe, because we hold the SRCU read
      lock for the whole time that a guest vcpu is running.  However, the
      checks that kvm_memslots() does when lockdep is enabled are potentially
      unsafe in real mode, when only the linear mapping is available.
      Furthermore, kvm_memslots() can be called from a secondary CPU thread,
      which is an offline CPU from the point of view of the host kernel,
      and is not running the task which holds the SRCU read lock.
      
      To avoid false positives in the checks in kvm_memslots(), and to avoid
      possible side effects from doing the checks in real mode, this replaces
      kvm_memslots() with kvm_memslots_raw() in all the places that execute
      in real mode.  kvm_memslots_raw() is a new function that is like
      kvm_memslots() but uses rcu_dereference_raw_notrace() instead of
      kvm_dereference_check().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      797f9c07
  18. 17 10月, 2013 2 次提交
  19. 10 7月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Correct tlbie usage · 54480501
      Paul Mackerras 提交于
      This corrects the usage of the tlbie (TLB invalidate entry) instruction
      in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
      On the PPC970, the bit to select large vs. small page is in the instruction,
      not in the RB register value.  This changes the code to use the correct
      form on PPC970.
      
      On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
      field of the RB value incorrectly for 64k pages.  This fixes it.
      
      Since we now have several cases to handle for the tlbie instruction, this
      factors out the code to do a sequence of tlbies into a new function,
      do_tlbies(), and calls that from the various places where the code was
      doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
      use the same global_invalidates() function for determining whether to do
      local or global TLB invalidations as is used in other places, for
      consistency, and also to make sure that kvm->arch.need_tlb_flush gets
      updated properly.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54480501
  20. 08 7月, 2013 2 次提交
  21. 21 6月, 2013 1 次提交
  22. 27 4月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes · a1b4a0f6
      Paul Mackerras 提交于
      At present, the code that determines whether a HPT entry has changed,
      and thus needs to be sent to userspace when it is copying the HPT,
      doesn't consider a hardware update to the reference and change bits
      (R and C) in the HPT entries to constitute a change that needs to
      be sent to userspace.  This adds code to check for changes in R and C
      when we are scanning the HPT to find changed entries, and adds code
      to set the changed flag for the HPTE when we update the R and C bits
      in the guest view of the HPTE.
      
      Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
      as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
      function into kvm_book3s_64.h.
      
      Current Linux guest kernels don't use the hardware updates of R and C
      in the HPT, so this change won't affect them.  Linux (or other) kernels
      might in future want to use the R and C bits and have them correctly
      transferred across when a guest is migrated, so it is better to correct
      this deficiency.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a1b4a0f6
  23. 06 12月, 2012 2 次提交
    • P
      KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT · a2932923
      Paul Mackerras 提交于
      A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
      this fd return the contents of the HPT (hashed page table), writes
      create and/or remove entries in the HPT.  There is a new capability,
      KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
      takes an argument structure with the index of the first HPT entry to
      read out and a set of flags.  The flags indicate whether the user is
      intending to read or write the HPT, and whether to return all entries
      or only the "bolted" entries (those with the bolted bit, 0x10, set in
      the first doubleword).
      
      This is intended for use in implementing qemu's savevm/loadvm and for
      live migration.  Therefore, on reads, the first pass returns information
      about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
      end of the HPT, it returns from the read.  Subsequent reads only return
      information about HPTEs that have changed since they were last read.
      A read that finds no changed HPTEs in the HPT following where the last
      read finished will return 0 bytes.
      
      The format of the data provides a simple run-length compression of the
      invalid entries.  Each block of data starts with a header that indicates
      the index (position in the HPT, which is just an array), the number of
      valid entries starting at that index (may be zero), and the number of
      invalid entries following those valid entries.  The valid entries, 16
      bytes each, follow the header.  The invalid entries are not explicitly
      represented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2932923
    • P
      KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs · 44e5f6be
      Paul Mackerras 提交于
      This uses a bit in our record of the guest view of the HPTE to record
      when the HPTE gets modified.  We use a reserved bit for this, and ensure
      that this bit is always cleared in HPTE values returned to the guest.
      
      The recording of modified HPTEs is only done if other code indicates
      its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
      The reason for this is that when later commits add facilities for
      userspace to read the HPT, the first pass of reading the HPT will be
      quicker if there are no (or very few) HPTEs marked as modified,
      rather than having most HPTEs marked as modified.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      44e5f6be
  24. 30 10月, 2012 1 次提交
  25. 30 5月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
  26. 05 3月, 2012 5 次提交
    • P
      KVM: PPC: Allow for read-only pages backing a Book3S HV guest · 4cf302bc
      Paul Mackerras 提交于
      With this, if a guest does an H_ENTER with a read/write HPTE on a page
      which is currently read-only, we make the actual HPTE inserted be a
      read-only version of the HPTE.  We now intercept protection faults as
      well as HPTE not found faults, and for a protection fault we work out
      whether it should be reflected to the guest (e.g. because the guest HPTE
      didn't allow write access to usermode) or handled by switching to
      kernel context and calling kvmppc_book3s_hv_page_fault, which will then
      request write access to the page and update the actual HPTE.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4cf302bc
    • P
      KVM: PPC: Implement MMU notifiers for Book3S HV guests · 342d3db7
      Paul Mackerras 提交于
      This adds the infrastructure to enable us to page out pages underneath
      a Book3S HV guest, on processors that support virtualized partition
      memory, that is, POWER7.  Instead of pinning all the guest's pages,
      we now look in the host userspace Linux page tables to find the
      mapping for a given guest page.  Then, if the userspace Linux PTE
      gets invalidated, kvm_unmap_hva() gets called for that address, and
      we replace all the guest HPTEs that refer to that page with absent
      HPTEs, i.e. ones with the valid bit clear and the HPTE_V_ABSENT bit
      set, which will cause an HDSI when the guest tries to access them.
      Finally, the page fault handler is extended to reinstantiate the
      guest HPTE when the guest tries to access a page which has been paged
      out.
      
      Since we can't intercept the guest DSI and ISI interrupts on PPC970,
      we still have to pin all the guest pages on PPC970.  We have a new flag,
      kvm->arch.using_mmu_notifiers, that indicates whether we can page
      guest pages out.  If it is not set, the MMU notifier callbacks do
      nothing and everything operates as before.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      342d3db7
    • P
      KVM: PPC: Implement MMIO emulation support for Book3S HV guests · 697d3899
      Paul Mackerras 提交于
      This provides the low-level support for MMIO emulation in Book3S HV
      guests.  When the guest tries to map a page which is not covered by
      any memslot, that page is taken to be an MMIO emulation page.  Instead
      of inserting a valid HPTE, we insert an HPTE that has the valid bit
      clear but another hypervisor software-use bit set, which we call
      HPTE_V_ABSENT, to indicate that this is an absent page.  An
      absent page is treated much like a valid page as far as guest hcalls
      (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
      an absent HPTE doesn't need to be invalidated with tlbie since it
      was never valid as far as the hardware is concerned.
      
      When the guest accesses a page for which there is an absent HPTE, it
      will take a hypervisor data storage interrupt (HDSI) since we now set
      the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
      looks up the hash table and if it finds an absent HPTE mapping the
      requested virtual address, will switch to kernel mode and handle the
      fault in kvmppc_book3s_hv_page_fault(), which at present just calls
      kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
      
      This is based on an earlier patch by Benjamin Herrenschmidt, but since
      heavily reworked.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      697d3899
    • P
      KVM: PPC: Maintain a doubly-linked list of guest HPTEs for each gfn · 06ce2c63
      Paul Mackerras 提交于
      This expands the reverse mapping array to contain two links for each
      HPTE which are used to link together HPTEs that correspond to the
      same guest logical page.  Each circular list of HPTEs is pointed to
      by the rmap array entry for the guest logical page, pointed to by
      the relevant memslot.  Links are 32-bit HPT entry indexes rather than
      full 64-bit pointers, to save space.  We use 3 of the remaining 32
      bits in the rmap array entries as a lock bit, a referenced bit and
      a present bit (the present bit is needed since HPTE index 0 is valid).
      The bit lock for the rmap chain nests inside the HPTE lock bit.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      06ce2c63
    • P
      KVM: PPC: Allow I/O mappings in memory slots · 9d0ef5ea
      Paul Mackerras 提交于
      This provides for the case where userspace maps an I/O device into the
      address range of a memory slot using a VM_PFNMAP mapping.  In that
      case, we work out the pfn from vma->vm_pgoff, and record the cache
      enable bits from vma->vm_page_prot in two low-order bits in the
      slot_phys array entries.  Then, in kvmppc_h_enter() we check that the
      cache bits in the HPTE that the guest wants to insert match the cache
      bits in the slot_phys array entry.  However, we do allow the guest to
      create what it thinks is a non-cacheable or write-through mapping to
      memory that is actually cacheable, so that we can use normal system
      memory as part of an emulated device later on.  In that case the actual
      HPTE we insert is a cacheable HPTE.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      9d0ef5ea