1. 22 8月, 2015 2 次提交
    • P
      KVM: PPC: Book3S HV: Fix bug in dirty page tracking · 08fe1e7b
      Paul Mackerras 提交于
      This fixes a bug in the tracking of pages that get modified by the
      guest.  If the guest creates a large-page HPTE, writes to memory
      somewhere within the large page, and then removes the HPTE, we only
      record the modified state for the first normal page within the large
      page, when in fact the guest might have modified some other normal
      page within the large page.
      
      To fix this we use some unused bits in the rmap entry to record the
      order (log base 2) of the size of the page that was modified, when
      removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
      order to return the correct number of modified pages.
      
      The same thing could in principle happen when removing a HPTE at the
      host's request, i.e. when paging out a page, except that we never
      page out large pages, and the guest can only create large-page HPTEs
      if the guest RAM is backed by large pages.  However, we also fix
      this case for the sake of future-proofing.
      
      The reference bit is also subject to the same loss of information.  We
      don't make the same fix here for the reference bit because there isn't
      an interface for userspace to find out which pages the guest has
      referenced, whereas there is one for userspace to find out which pages
      the guest has modified.  Because of this loss of information, the
      kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
      say that a page has not been referenced when it has, but that doesn't
      matter greatly because we never page or swap out large pages.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      08fe1e7b
    • P
      KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE · 1e5bf454
      Paul Mackerras 提交于
      The reference (R) and change (C) bits in a HPT entry can be set by
      hardware at any time up until the HPTE is invalidated and the TLB
      invalidation sequence has completed.  This means that when removing
      a HPTE, we need to read the HPTE after the invalidation sequence has
      completed in order to obtain reliable values of R and C.  The code
      in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd32
      ("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
      read after invalidation as a side effect of other changes.  This
      restores the read of the HPTE after invalidation.
      
      The user-visible effect of this bug would be that when migrating a
      guest, there is a small probability that a page modified by the guest
      and then unmapped by the guest might not get re-transmitted and thus
      the destination might end up with a stale copy of the page.
      
      Fixes: 6f22bd32Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1e5bf454
  2. 21 4月, 2015 1 次提交
  3. 17 4月, 2015 3 次提交
  4. 13 2月, 2015 1 次提交
  5. 17 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
  6. 15 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix KSM memory corruption · b4a83900
      Paul Mackerras 提交于
      Testing with KSM active in the host showed occasional corruption of
      guest memory.  Typically a page that should have contained zeroes
      would contain values that look like the contents of a user process
      stack (values such as 0x0000_3fff_xxxx_xxx).
      
      Code inspection in kvmppc_h_protect revealed that there was a race
      condition with the possibility of granting write access to a page
      which is read-only in the host page tables.  The code attempts to keep
      the host mapping read-only if the host userspace PTE is read-only, but
      if that PTE had been temporarily made invalid for any reason, the
      read-only check would not trigger and the host HPTE could end up
      read-write.  Examination of the guest HPT in the failure situation
      revealed that there were indeed shared pages which should have been
      read-only that were mapped read-write.
      
      To close this race, we don't let a page go from being read-only to
      being read-write, as far as the real HPTE mapping the page is
      concerned (the guest view can go to read-write, but the actual mapping
      stays read-only).  When the guest tries to write to the page, we take
      an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a
      writable HPTE for the page.
      
      This eliminates the occasional corruption of shared pages
      that was previously seen with KSM active.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4a83900
  7. 28 7月, 2014 1 次提交
  8. 25 6月, 2014 1 次提交
  9. 30 5月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() · 55765483
      Paul Mackerras 提交于
      The global_invalidates() function contains a check that is intended
      to tell whether we are currently executing in the context of a hypercall
      issued by the guest.  The reason is that the optimization of using a
      local TLB invalidate instruction is only valid in that context.  The
      check was testing local_paca->kvm_hstate.kvm_vcore, which gets set
      when entering the guest but no longer gets cleared when exiting the
      guest.  To fix this, we use the kvm_vcpu field instead, which does
      get cleared when exiting the guest, by the kvmppc_release_hwthread()
      calls inside kvmppc_run_core().
      
      The effect of having the check wrong was that when kvmppc_do_h_remove()
      got called from htab_write() on the destination machine during a
      migration, it cleared the current cpu's bit in kvm->arch.need_tlb_flush.
      This meant that when the guest started running in the destination VM,
      it may miss out on doing a complete TLB flush, and therefore may end
      up using stale TLB entries from a previous guest that used the same
      LPID value.
      
      This should make migration more reliable.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      55765483
  10. 28 4月, 2014 1 次提交
    • P
      KVM: PPC: Book3S: HV: make _PAGE_NUMA take effect · 1ad9f238
      pingfank@linux.vnet.ibm.com 提交于
      Numa fault is a method which help to achieve auto numa balancing.
      When such a page fault takes place, the page fault handler will check
      whether the page is placed correctly. If not, migration should be
      involved to cut down the distance between the cpu and pages.
      
      A pte with _PAGE_NUMA help to implement numa fault. It means not to
      allow the MMU to access the page directly. So a page fault is triggered
      and numa fault handler gets the opportunity to run checker.
      
      As for the access of MMU, we need special handling for the powernv's guest.
      When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
      invalidate it in guest's htab, but when we tried to re-insert them,
      we firstly try to map it in real-mode. Only after this fails, we fallback
      to virt mode, and most of important, we run numa fault handler in virt
      mode.  This patch guards the way of real-mode to ensure that if a pte is
      marked with _PAGE_NUMA, it will NOT be mapped in real mode, instead, it will
      be mapped in virt mode and have the opportunity to be checked with placement.
      Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1ad9f238
  11. 29 3月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode · 797f9c07
      Paul Mackerras 提交于
      With HV KVM, some high-frequency hypercalls such as H_ENTER are handled
      in real mode, and need to access the memslots array for the guest.
      Accessing the memslots array is safe, because we hold the SRCU read
      lock for the whole time that a guest vcpu is running.  However, the
      checks that kvm_memslots() does when lockdep is enabled are potentially
      unsafe in real mode, when only the linear mapping is available.
      Furthermore, kvm_memslots() can be called from a secondary CPU thread,
      which is an offline CPU from the point of view of the host kernel,
      and is not running the task which holds the SRCU read lock.
      
      To avoid false positives in the checks in kvm_memslots(), and to avoid
      possible side effects from doing the checks in real mode, this replaces
      kvm_memslots() with kvm_memslots_raw() in all the places that execute
      in real mode.  kvm_memslots_raw() is a new function that is like
      kvm_memslots() but uses rcu_dereference_raw_notrace() instead of
      kvm_dereference_check().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      797f9c07
  12. 09 1月, 2014 1 次提交
  13. 18 12月, 2013 1 次提交
  14. 19 11月, 2013 2 次提交
    • P
      powerpc: kvm: fix rare but potential deadlock scene · 91648ec0
      pingfan liu 提交于
      Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
      realmode, so it can trigger the deadlock.
      
      Suppose the following scene:
      
      Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of
      vcpus.
      
      If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched
      out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be
      caught in realmode for a long time.
      
      What makes things even worse if the following happens,
        On cpuM, bitlockX is hold, on cpuN, Y is hold.
        vcpu_B_2 try to lock Y on cpuM in realmode
        vcpu_A_2 try to lock X on cpuN in realmode
      
      Oops! deadlock happens
      Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      CC: stable@vger.kernel.org
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      91648ec0
    • P
      KVM: PPC: Book3S HV: Fix physical address calculations · caaa4c80
      Paul Mackerras 提交于
      This fixes a bug in kvmppc_do_h_enter() where the physical address
      for a page can be calculated incorrectly if transparent huge pages
      (THP) are active.  Until THP came along, it was true that if we
      encountered a large (16M) page in kvmppc_do_h_enter(), then the
      associated memslot must be 16M aligned for both its guest physical
      address and the userspace address, and the physical address
      calculations in kvmppc_do_h_enter() assumed that.  With THP, that
      is no longer true.
      
      In the case where we are using MMU notifiers and the page size that
      we get from the Linux page tables is larger than the page being mapped
      by the guest, we need to fill in some low-order bits of the physical
      address.  Without THP, these bits would be the same in the guest
      physical address (gpa) and the host virtual address (hva).  With THP,
      they can be different, and we need to use the bits from hva rather
      than gpa.
      
      In the case where we are not using MMU notifiers, the host physical
      address we get from the memslot->arch.slot_phys[] array already
      includes the low-order bits down to the PAGE_SIZE level, even if
      we are using large pages.  Thus we can simplify the calculation in
      this case to just add in the remaining bits in the case where
      PAGE_SIZE is 64k and the guest is mapping a 4k page.
      
      The same bug exists in kvmppc_book3s_hv_page_fault().  The basic fix
      is to use psize (the page size from the HPTE) rather than pte_size
      (the page size from the Linux PTE) when updating the HPTE low word
      in r.  That means that pfn needs to be computed to PAGE_SIZE
      granularity even if the Linux PTE is a huge page PTE.  That can be
      arranged simply by doing the page_to_pfn() before setting page to
      the head of the compound page.  If psize is less than PAGE_SIZE,
      then we need to make sure we only update the bits from PAGE_SIZE
      upwards, in order not to lose any sub-page offset bits in r.
      On the other hand, if psize is greater than PAGE_SIZE, we need to
      make sure we don't bring in non-zero low order bits in pfn, hence
      we mask (pfn << PAGE_SHIFT) with ~(psize - 1).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      caaa4c80
  15. 14 8月, 2013 1 次提交
  16. 10 7月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Correct tlbie usage · 54480501
      Paul Mackerras 提交于
      This corrects the usage of the tlbie (TLB invalidate entry) instruction
      in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
      On the PPC970, the bit to select large vs. small page is in the instruction,
      not in the RB register value.  This changes the code to use the correct
      form on PPC970.
      
      On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
      field of the RB value incorrectly for 64k pages.  This fixes it.
      
      Since we now have several cases to handle for the tlbie instruction, this
      factors out the code to do a sequence of tlbies into a new function,
      do_tlbies(), and calls that from the various places where the code was
      doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
      use the same global_invalidates() function for determining whether to do
      local or global TLB invalidations as is used in other places, for
      consistency, and also to make sure that kvm->arch.need_tlb_flush gets
      updated properly.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54480501
  17. 21 6月, 2013 2 次提交
  18. 27 4月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes · a1b4a0f6
      Paul Mackerras 提交于
      At present, the code that determines whether a HPT entry has changed,
      and thus needs to be sent to userspace when it is copying the HPT,
      doesn't consider a hardware update to the reference and change bits
      (R and C) in the HPT entries to constitute a change that needs to
      be sent to userspace.  This adds code to check for changes in R and C
      when we are scanning the HPT to find changed entries, and adds code
      to set the changed flag for the HPTE when we update the R and C bits
      in the guest view of the HPTE.
      
      Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
      as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
      function into kvm_book3s_64.h.
      
      Current Linux guest kernels don't use the hardware updates of R and C
      in the HPT, so this change won't affect them.  Linux (or other) kernels
      might in future want to use the R and C bits and have them correctly
      transferred across when a guest is migrated, so it is better to correct
      this deficiency.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a1b4a0f6
  19. 06 12月, 2012 6 次提交
    • P
      KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations · 1b400ba0
      Paul Mackerras 提交于
      When we change or remove a HPT (hashed page table) entry, we can do
      either a global TLB invalidation (tlbie) that works across the whole
      machine, or a local invalidation (tlbiel) that only affects this core.
      Currently we do local invalidations if the VM has only one vcpu or if
      the guest requests it with the H_LOCAL flag, though the guest Linux
      kernel currently doesn't ever use H_LOCAL.  Then, to cope with the
      possibility that vcpus moving around to different physical cores might
      expose stale TLB entries, there is some code in kvmppc_hv_entry to
      flush the whole TLB of entries for this VM if either this vcpu is now
      running on a different physical core from where it last ran, or if this
      physical core last ran a different vcpu.
      
      There are a number of problems on POWER7 with this as it stands:
      
      - The TLB invalidation is done per thread, whereas it only needs to be
        done per core, since the TLB is shared between the threads.
      - With the possibility of the host paging out guest pages, the use of
        H_LOCAL by an SMP guest is dangerous since the guest could possibly
        retain and use a stale TLB entry pointing to a page that had been
        removed from the guest.
      - The TLB invalidations that we do when a vcpu moves from one physical
        core to another are unnecessary in the case of an SMP guest that isn't
        using H_LOCAL.
      - The optimization of using local invalidations rather than global should
        apply to guests with one virtual core, not just one vcpu.
      
      (None of this applies on PPC970, since there we always have to
      invalidate the whole TLB when entering and leaving the guest, and we
      can't support paging out guest memory.)
      
      To fix these problems and simplify the code, we now maintain a simple
      cpumask of which cpus need to flush the TLB on entry to the guest.
      (This is indexed by cpu, though we only ever use the bits for thread
      0 of each core.)  Whenever we do a local TLB invalidation, we set the
      bits for every cpu except the bit for thread 0 of the core that we're
      currently running on.  Whenever we enter a guest, we test and clear the
      bit for our core, and flush the TLB if it was set.
      
      On initial startup of the VM, and when resetting the HPT, we set all the
      bits in the need_tlb_flush cpumask, since any core could potentially have
      stale TLB entries from the previous VM to use the same LPID, or the
      previous contents of the HPT.
      
      Then, we maintain a count of the number of online virtual cores, and use
      that when deciding whether to use a local invalidation rather than the
      number of online vcpus.  The code to make that decision is extracted out
      into a new function, global_invalidates().  For multi-core guests on
      POWER7 (i.e. when we are using mmu notifiers), we now never do local
      invalidations regardless of the H_LOCAL flag.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1b400ba0
    • P
      KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages · 1cc8ed0b
      Paul Mackerras 提交于
      Currently, if the guest does an H_PROTECT hcall requesting that the
      permissions on a HPT entry be changed to allow writing, we make the
      requested change even if the page is marked read-only in the host
      Linux page tables.  This is a problem since it would for instance
      allow a guest to modify a page that KSM has decided can be shared
      between multiple guests.
      
      To fix this, if the new permissions for the page allow writing, we need
      to look up the memslot for the page, work out the host virtual address,
      and look up the Linux page tables to get the PTE for the page.  If that
      PTE is read-only, we reduce the HPTE permissions to read-only.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1cc8ed0b
    • P
      KVM: PPC: Book3S HV: Make a HPTE removal function available · 6b445ad4
      Paul Mackerras 提交于
      This makes a HPTE removal function, kvmppc_do_h_remove(), available
      outside book3s_hv_rm_mmu.c.  This will be used by the HPT writing
      code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6b445ad4
    • P
      KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs · 44e5f6be
      Paul Mackerras 提交于
      This uses a bit in our record of the guest view of the HPTE to record
      when the HPTE gets modified.  We use a reserved bit for this, and ensure
      that this bit is always cleared in HPTE values returned to the guest.
      
      The recording of modified HPTEs is only done if other code indicates
      its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
      The reason for this is that when later commits add facilities for
      userspace to read the HPT, the first pass of reading the HPT will be
      quicker if there are no (or very few) HPTEs marked as modified,
      rather than having most HPTEs marked as modified.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      44e5f6be
    • P
      KVM: PPC: Book3S HV: Fix bug causing loss of page dirty state · 4879f241
      Paul Mackerras 提交于
      This fixes a bug where adding a new guest HPT entry via the H_ENTER
      hcall would lose the "changed" bit in the reverse map information
      for the guest physical page being mapped.  The result was that the
      KVM_GET_DIRTY_LOG could return a zero bit for the page even though
      the page had been modified by the guest.
      
      This fixes it by only modifying the index and present bits in the
      reverse map entry, thus preserving the reference and change bits.
      We were also unnecessarily setting the reference bit, and this
      fixes that too.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4879f241
    • P
      KVM: PPC: Book3S HV: Restructure HPT entry creation code · 7ed661bf
      Paul Mackerras 提交于
      This restructures the code that creates HPT (hashed page table)
      entries so that it can be called in situations where we don't have a
      struct vcpu pointer, only a struct kvm pointer.  It also fixes a bug
      where kvmppc_map_vrma() would corrupt the guest R4 value.
      
      Most of the work of kvmppc_virtmode_h_enter is now done by a new
      function, kvmppc_virtmode_do_h_enter, which itself calls another new
      function, kvmppc_do_h_enter, which contains most of the old
      kvmppc_h_enter.  The new kvmppc_do_h_enter takes explicit arguments
      for the place to return the HPTE index, the Linux page tables to use,
      and whether it is being called in real mode, thus removing the need
      for it to have the vcpu as an argument.
      
      Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
      HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
      to handle H_ENTER hcalls from the guest that need to pin a page of
      memory.  Since H_ENTER returns the index of the created HPTE in R4,
      kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
      in the case when it gets called from kvmppc_map_vrma on the first
      VCPU_RUN ioctl.  With this, kvmppc_map_vrma instead calls
      kvmppc_virtmode_do_h_enter with the address of a dummy word as the
      place to store the HPTE index, thus avoiding corrupting the guest R4.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7ed661bf
  20. 23 10月, 2012 1 次提交
  21. 06 10月, 2012 2 次提交
    • P
      KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly · dfe49dbd
      Paul Mackerras 提交于
      This adds an implementation of kvm_arch_flush_shadow_memslot for
      Book3S HV, and arranges for kvmppc_core_commit_memory_region to
      flush the dirty log when modifying an existing slot.  With this,
      we can handle deletion and modification of memory slots.
      
      kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which
      on Book3S HV now traverses the reverse map chains to remove any HPT
      (hashed page table) entries referring to pages in the memslot.  This
      gets called by generic code whenever deleting a memslot or changing
      the guest physical address for a memslot.
      
      We flush the dirty log in kvmppc_core_commit_memory_region for
      consistency with what x86 does.  We only need to flush when an
      existing memslot is being modified, because for a new memslot the
      rmap array (which stores the dirty bits) is all zero, meaning that
      every page is considered clean already, and when deleting a memslot
      we obviously don't care about the dirty bits any more.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      dfe49dbd
    • P
      KVM: PPC: Move kvm->arch.slot_phys into memslot.arch · a66b48c3
      Paul Mackerras 提交于
      Now that we have an architecture-specific field in the kvm_memory_slot
      structure, we can use it to store the array of page physical addresses
      that we need for Book3S HV KVM on PPC970 processors.  This reduces the
      size of struct kvm_arch for Book3S HV, and also reduces the size of
      struct kvm_arch_memory_slot for other PPC KVM variants since the fields
      in it are now only compiled in for Book3S HV.
      
      This necessitates making the kvm_arch_create_memslot and
      kvm_arch_free_memslot operations specific to each PPC KVM variant.
      That in turn means that we now don't allocate the rmap arrays on
      Book3S PR and Book E.
      
      Since we now unpin pages and free the slot_phys array in
      kvmppc_core_free_memslot, we no longer need to do it in
      kvmppc_core_destroy_vm, since the generic code takes care to free
      all the memslots when destroying a VM.
      
      We now need the new memslot to be passed in to
      kvmppc_core_prepare_memory_region, since we need to initialize its
      arch.slot_phys member on Book3S HV.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a66b48c3
  22. 28 8月, 2012 1 次提交
  23. 06 8月, 2012 1 次提交
  24. 30 5月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Make the guest hash table size configurable · 32fad281
      Paul Mackerras 提交于
      This adds a new ioctl to enable userspace to control the size of the guest
      hashed page table (HPT) and to clear it out when resetting the guest.
      The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter
      a pointer to a u32 containing the desired order of the HPT (log base 2
      of the size in bytes), which is updated on successful return to the
      actual order of the HPT which was allocated.
      
      There must be no vcpus running at the time of this ioctl.  To enforce
      this, we now keep a count of the number of vcpus running in
      kvm->arch.vcpus_running.
      
      If the ioctl is called when a HPT has already been allocated, we don't
      reallocate the HPT but just clear it out.  We first clear the
      kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold
      the kvm->lock mutex, it will prevent any vcpus from starting to run until
      we're done, and (b) it means that the first vcpu to run after we're done
      will re-establish the VRMA if necessary.
      
      If userspace doesn't call this ioctl before running the first vcpu, the
      kernel will allocate a default-sized HPT at that point.  We do it then
      rather than when creating the VM, as the code did previously, so that
      userspace has a chance to do the ioctl if it wants.
      
      When allocating the HPT, we can allocate either from the kernel page
      allocator, or from the preallocated pool.  If userspace is asking for
      a different size from the preallocated HPTs, we first try to allocate
      using the kernel page allocator.  Then we try to allocate from the
      preallocated pool, and then if that fails, we try allocating decreasing
      sizes from the kernel page allocator, down to the minimum size allowed
      (256kB).  Note that the kernel page allocator limits allocations to
      1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to
      16MB (on 64-bit powerpc, at least).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix module compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      32fad281
  25. 16 5月, 2012 1 次提交
  26. 05 3月, 2012 4 次提交
    • P
      KVM: Move gfn_to_memslot() to kvm_host.h · 9d4cba7f
      Paul Mackerras 提交于
      This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to
      kvm_host.h to reduce the code duplication caused by the need for
      non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call
      gfn_to_memslot() in real mode.
      
      Rather than putting gfn_to_memslot() itself in a header, which would
      lead to increased code size, this puts __gfn_to_memslot() in a header.
      Then, the non-modular uses of gfn_to_memslot() are changed to call
      __gfn_to_memslot() instead.  This way there is only one place in the
      source code that needs to be changed should the gfn_to_memslot()
      implementation need to be modified.
      
      On powerpc, the Book3S HV style of KVM has code that is called from
      real mode which needs to call gfn_to_memslot() and thus needs this.
      (Module code is allocated in the vmalloc region, which can't be
      accessed in real mode.)
      
      With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      9d4cba7f
    • P
      KVM: PPC: Book3S HV: Use the hardware referenced bit for kvm_age_hva · 55514893
      Paul Mackerras 提交于
      This uses the host view of the hardware R (referenced) bit to speed
      up kvm_age_hva() and kvm_test_age_hva().  Instead of removing all
      the relevant HPTEs in kvm_age_hva(), we now just reset their R bits
      if set.  Also, kvm_test_age_hva() now scans the relevant HPTEs to
      see if any of them have R set.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      55514893
    • P
      KVM: PPC: Book3s HV: Maintain separate guest and host views of R and C bits · bad3b507
      Paul Mackerras 提交于
      This allows both the guest and the host to use the referenced (R) and
      changed (C) bits in the guest hashed page table.  The guest has a view
      of R and C that is maintained in the guest_rpte field of the revmap
      entry for the HPTE, and the host has a view that is maintained in the
      rmap entry for the associated gfn.
      
      Both view are updated from the guest HPT.  If a bit (R or C) is zero
      in either view, it will be initially set to zero in the HPTE (or HPTEs),
      until set to 1 by hardware.  When an HPTE is removed for any reason,
      the R and C bits from the HPTE are ORed into both views.  We have to
      be careful to read the R and C bits from the HPTE after invalidating
      it, but before unlocking it, in case of any late updates by the hardware.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      bad3b507
    • P
      KVM: PPC: Book3S HV: Keep HPTE locked when invalidating · a92bce95
      Paul Mackerras 提交于
      This reworks the implementations of the H_REMOVE and H_BULK_REMOVE
      hcalls to make sure that we keep the HPTE locked and in the reverse-
      mapping chain until we have finished invalidating it.  Previously
      we would remove it from the chain and unlock it before invalidating
      it, leaving a tiny window when the guest could access the page even
      though we believe we have removed it from the guest (e.g.,
      kvm_unmap_hva() has been called for the page and has found no HPTEs
      in the chain).  In addition, we'll need this for future patches where
      we will need to read the R and C bits in the HPTE after invalidating
      it.
      
      Doing this required restructuring kvmppc_h_bulk_remove() substantially.
      Since we want to batch up the tlbies, we now need to keep several
      HPTEs locked simultaneously.  In order to avoid possible deadlocks,
      we don't spin on the HPTE bitlock for any except the first HPTE in
      a batch.  If we can't acquire the HPTE bitlock for the second or
      subsequent HPTE, we terminate the batch at that point, do the tlbies
      that we have accumulated so far, unlock those HPTEs, and then start
      a new batch to do the remaining invalidations.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a92bce95