1. 31 1月, 2017 3 次提交
    • D
      KVM: PPC: Book3S HV: Gather HPT related variables into sub-structure · 3f9d4f5a
      David Gibson 提交于
      Currently, the powerpc kvm_arch structure contains a number of variables
      tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
      patch gathers them all together into a single kvm_hpt_info substructure.
      This makes life more convenient for the upcoming HPT resizing
      implementation.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3f9d4f5a
    • P
      KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement · a29ebeaf
      Paul Mackerras 提交于
      With radix, the guest can do TLB invalidations itself using the tlbie
      (global) and tlbiel (local) TLB invalidation instructions.  Linux guests
      use local TLB invalidations for translations that have only ever been
      accessed on one vcpu.  However, that doesn't mean that the translations
      have only been accessed on one physical cpu (pcpu) since vcpus can move
      around from one pcpu to another.  Thus a tlbiel might leave behind stale
      TLB entries on a pcpu where the vcpu previously ran, and if that task
      then moves back to that previous pcpu, it could see those stale TLB
      entries and thus access memory incorrectly.  The usual symptom of this
      is random segfaults in userspace programs in the guest.
      
      To cope with this, we detect when a vcpu is about to start executing on
      a thread in a core that is a different core from the last time it
      executed.  If that is the case, then we mark the core as needing a
      TLB flush and then send an interrupt to any thread in the core that is
      currently running a vcpu from the same guest.  This will get those vcpus
      out of the guest, and the first one to re-enter the guest will do the
      TLB flush.  The reason for interrupting the vcpus executing on the old
      core is to cope with the following scenario:
      
      	CPU 0			CPU 1			CPU 4
      	(core 0)			(core 0)			(core 1)
      
      	VCPU 0 runs task X      VCPU 1 runs
      	core 0 TLB gets
      	entries from task X
      	VCPU 0 moves to CPU 4
      							VCPU 0 runs task X
      							Unmap pages of task X
      							tlbiel
      
      				(still VCPU 1)			task X moves to VCPU 1
      				task X runs
      				task X sees stale TLB
      				entries
      
      That is, as soon as the VCPU starts executing on the new core, it
      could unmap and tlbiel some page table entries, and then the task
      could migrate to one of the VCPUs running on the old core and
      potentially see stale TLB entries.
      
      Since the TLB is shared between all the threads in a core, we only
      use the bit of kvm->arch.need_tlb_flush corresponding to the first
      thread in the core.  To ensure that we don't have a window where we
      can miss a flush, this moves the clearing of the bit from before the
      actual flush to after it.  This way, two threads might both do the
      flush, but we prevent the situation where one thread can enter the
      guest before the flush is finished.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a29ebeaf
    • P
      KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode · 65dae540
      Paul Mackerras 提交于
      If the guest is in radix mode, then it doesn't have a hashed page
      table (HPT), so all of the hypercalls that manipulate the HPT can't
      work and should return an error.  This adds checks to make them
      return H_FUNCTION ("function not supported").
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      65dae540
  2. 01 12月, 2016 1 次提交
  3. 24 11月, 2016 2 次提交
    • P
      KVM: PPC: Book3S HV: Adapt TLB invalidations to work on POWER9 · 7c5b06ca
      Paul Mackerras 提交于
      POWER9 adds new capabilities to the tlbie (TLB invalidate entry)
      and tlbiel (local tlbie) instructions.  Both instructions get a
      set of new parameters (RIC, PRS and R) which appear as bits in the
      instruction word.  The tlbiel instruction now has a second register
      operand, which contains a PID and/or LPID value if needed, and
      should otherwise contain 0.
      
      This adapts KVM-HV's usage of tlbie and tlbiel to work on POWER9
      as well as older processors.  Since we only handle HPT guests so
      far, we need RIC=0 PRS=0 R=0, which ends up with the same instruction
      word as on previous processors, so we don't need to conditionally
      execute different instructions depending on the processor.
      
      The local flush on first entry to a guest in book3s_hv_rmhandlers.S
      is a loop which depends on the number of TLB sets.  Rather than
      using feature sections to set the number of iterations based on
      which CPU we're on, we now work out this number at VM creation time
      and store it in the kvm_arch struct.  That will make it possible to
      get the number from the device tree in future, which will help with
      compatibility with future processors.
      
      Since mmu_partition_table_set_entry() does a global flush of the
      whole LPID, we don't need to do the TLB flush on first entry to the
      guest on each processor.  Therefore we don't set all bits in the
      tlb_need_flush bitmap on VM startup on POWER9.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7c5b06ca
    • P
      KVM: PPC: Book3S HV: Adapt to new HPTE format on POWER9 · abb7c7dd
      Paul Mackerras 提交于
      This adapts the KVM-HV hashed page table (HPT) code to read and write
      HPT entries in the new format defined in Power ISA v3.00 on POWER9
      machines.  The new format moves the B (segment size) field from the
      first doubleword to the second, and trims some bits from the AVA
      (abbreviated virtual address) and ARPN (abbreviated real page number)
      fields.  As far as possible, the conversion is done when reading or
      writing the HPT entries, and the rest of the code continues to use
      the old format.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      abb7c7dd
  4. 21 11月, 2016 4 次提交
    • P
      KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT · f064a0de
      Paul Mackerras 提交于
      The hashed page table MMU in POWER processors can update the R
      (reference) and C (change) bits in a HPTE at any time until the
      HPTE has been invalidated and the TLB invalidation sequence has
      completed.  In kvmppc_h_protect, which implements the H_PROTECT
      hypercall, we read the HPTE, modify the second doubleword,
      invalidate the HPTE in memory, do the TLB invalidation sequence,
      and then write the modified value of the second doubleword back
      to memory.  In doing so we could overwrite an R/C bit update done
      by hardware between when we read the HPTE and when the TLB
      invalidation completed.  To fix this we re-read the second
      doubleword after the TLB invalidation and OR in the (possibly)
      new values of R and C.  We can use an OR since hardware only ever
      sets R and C, never clears them.
      
      This race was found by code inspection.  In principle this bug could
      cause occasional guest memory corruption under host memory pressure.
      
      Fixes: a8606e20 ("KVM: PPC: Handle some PAPR hcalls in the kernel", 2011-06-29)
      Cc: stable@vger.kernel.org # v3.19+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f064a0de
    • Y
      KVM: PPC: Book3S HV: Add a per vcpu cache for recently page faulted MMIO entries · a56ee9f8
      Yongji Xie 提交于
      This keeps a per vcpu cache for recently page faulted MMIO entries.
      On a page fault, if the entry exists in the cache, we can avoid some
      time-consuming paths, for example, looking up HPT, locking HPTE twice
      and searching mmio gfn from memslots, then directly call
      kvmppc_hv_emulate_mmio().
      
      In current implenment, we limit the size of cache to four. We think
      it's enough to cover the high-frequency MMIO HPTEs in most case.
      For example, considering the case of using virtio device, for virtio
      legacy devices, one HPTE could handle notifications from up to
      1024 (64K page / 64 byte Port IO register) devices, so one cache entry
      is enough; for virtio modern devices, we always need one HPTE to handle
      notification for each device because modern device would use a 8M MMIO
      register to notify host instead of Port IO register, typically the
      system's configuration should not exceed four virtio devices per
      vcpu, four cache entry is also enough in this case. Of course, if needed,
      we could also modify the macro to a module parameter in the future.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      a56ee9f8
    • Y
      KVM: PPC: Book3S HV: Clear the key field of HPTE when the page is paged out · f0585982
      Yongji Xie 提交于
      Currently we mark a HPTE for emulated MMIO with HPTE_V_ABSENT bit
      set as well as key 0x1f. However, those HPTEs may be conflicted with
      the HPTE for real guest RAM page HPTE with key 0x1f when the page
      get paged out.
      
      This patch clears the key field of HPTE when the page is paged out,
      then recover it when HPTE is re-established.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f0585982
    • D
      KVM: PPC: Book3S HV: sparse: prototypes for functions called from assembler · ebe4535f
      Daniel Axtens 提交于
      A bunch of KVM functions are only called from assembler.
      Give them prototypes in asm-prototypes.h
      This reduces sparse warnings.
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ebe4535f
  5. 01 5月, 2016 1 次提交
    • A
      powerpc/mm: Drop WIMG in favour of new constants · 30bda41a
      Aneesh Kumar K.V 提交于
      PowerISA 3.0 introduces two pte bits with the below meaning for radix:
        00 -> Normal Memory
        01 -> Strong Access Order (SAO)
        10 -> Non idempotent I/O (Cache inhibited and guarded)
        11 -> Tolerant I/O (Cache inhibited)
      
      We drop the existing WIMG bits in the Linux page table in favour of the
      above constants. We loose _PAGE_WRITETHRU with this conversion. We only
      use writethru via pgprot_cached_wthru() which is used by
      fbdev/controlfb.c which is Apple control display and also PPC32.
      
      With respect to _PAGE_COHERENCE, we have been marking hpte always
      coherent for some time now. htab_convert_pte_flags() always added
      HPTE_R_M.
      
      NOTE: KVM changes need closer review.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      30bda41a
  6. 03 3月, 2016 1 次提交
  7. 21 10月, 2015 1 次提交
  8. 12 10月, 2015 1 次提交
    • A
      powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6
      Aneesh Kumar K.V 提交于
      We need to properly identify whether a hugepage is an explicit or
      a transparent hugepage in follow_huge_addr(). We used to depend
      on hugepage shift argument to do that. But in some case that can
      result in wrong results. For ex:
      
      On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
      But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
      We do prevent reusing the pfn page via the usage of
      kick_all_cpus_sync(). But that happens after we updated the pte to 0.
      Hence in follow_huge_addr() we can find hugepage shift set, but transparent
      huge page check fail for a thp pte.
      
      NOTE: We fixed a variant of this race against thp split in commit
      691e95fd
      ("powerpc/mm/thp: Make page table walk safe against thp split/collapse")
      
      Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
      follow_page_mask occasionally.
      
      In the long term, we may want to switch ppc64 64k page size config to
      enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      891121e6
  9. 22 8月, 2015 3 次提交
    • P
      KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD · cdeee518
      Paul Mackerras 提交于
      This adds implementations for the H_CLEAR_REF (test and clear reference
      bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.
      
      When clearing the reference or change bit in the guest view of the HPTE,
      we also have to clear it in the real HPTE so that we can detect future
      references or changes.  When we do so, we transfer the R or C bit value
      to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
      kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
      has been referenced and/or changed.
      
      These hypercalls are not used by Linux guests.  These implementations
      have been tested using a FreeBSD guest.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      cdeee518
    • P
      KVM: PPC: Book3S HV: Fix bug in dirty page tracking · 08fe1e7b
      Paul Mackerras 提交于
      This fixes a bug in the tracking of pages that get modified by the
      guest.  If the guest creates a large-page HPTE, writes to memory
      somewhere within the large page, and then removes the HPTE, we only
      record the modified state for the first normal page within the large
      page, when in fact the guest might have modified some other normal
      page within the large page.
      
      To fix this we use some unused bits in the rmap entry to record the
      order (log base 2) of the size of the page that was modified, when
      removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
      order to return the correct number of modified pages.
      
      The same thing could in principle happen when removing a HPTE at the
      host's request, i.e. when paging out a page, except that we never
      page out large pages, and the guest can only create large-page HPTEs
      if the guest RAM is backed by large pages.  However, we also fix
      this case for the sake of future-proofing.
      
      The reference bit is also subject to the same loss of information.  We
      don't make the same fix here for the reference bit because there isn't
      an interface for userspace to find out which pages the guest has
      referenced, whereas there is one for userspace to find out which pages
      the guest has modified.  Because of this loss of information, the
      kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
      say that a page has not been referenced when it has, but that doesn't
      matter greatly because we never page or swap out large pages.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      08fe1e7b
    • P
      KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE · 1e5bf454
      Paul Mackerras 提交于
      The reference (R) and change (C) bits in a HPT entry can be set by
      hardware at any time up until the HPTE is invalidated and the TLB
      invalidation sequence has completed.  This means that when removing
      a HPTE, we need to read the HPTE after the invalidation sequence has
      completed in order to obtain reliable values of R and C.  The code
      in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd32
      ("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
      read after invalidation as a side effect of other changes.  This
      restores the read of the HPTE after invalidation.
      
      The user-visible effect of this bug would be that when migrating a
      guest, there is a small probability that a page modified by the guest
      and then unmapped by the guest might not get re-transmitted and thus
      the destination might end up with a stale copy of the page.
      
      Fixes: 6f22bd32Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1e5bf454
  10. 21 4月, 2015 1 次提交
  11. 17 4月, 2015 3 次提交
  12. 13 2月, 2015 1 次提交
  13. 17 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
  14. 15 12月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix KSM memory corruption · b4a83900
      Paul Mackerras 提交于
      Testing with KSM active in the host showed occasional corruption of
      guest memory.  Typically a page that should have contained zeroes
      would contain values that look like the contents of a user process
      stack (values such as 0x0000_3fff_xxxx_xxx).
      
      Code inspection in kvmppc_h_protect revealed that there was a race
      condition with the possibility of granting write access to a page
      which is read-only in the host page tables.  The code attempts to keep
      the host mapping read-only if the host userspace PTE is read-only, but
      if that PTE had been temporarily made invalid for any reason, the
      read-only check would not trigger and the host HPTE could end up
      read-write.  Examination of the guest HPT in the failure situation
      revealed that there were indeed shared pages which should have been
      read-only that were mapped read-write.
      
      To close this race, we don't let a page go from being read-only to
      being read-write, as far as the real HPTE mapping the page is
      concerned (the guest view can go to read-write, but the actual mapping
      stays read-only).  When the guest tries to write to the page, we take
      an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a
      writable HPTE for the page.
      
      This eliminates the occasional corruption of shared pages
      that was previously seen with KSM active.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4a83900
  15. 28 7月, 2014 1 次提交
  16. 25 6月, 2014 1 次提交
  17. 30 5月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() · 55765483
      Paul Mackerras 提交于
      The global_invalidates() function contains a check that is intended
      to tell whether we are currently executing in the context of a hypercall
      issued by the guest.  The reason is that the optimization of using a
      local TLB invalidate instruction is only valid in that context.  The
      check was testing local_paca->kvm_hstate.kvm_vcore, which gets set
      when entering the guest but no longer gets cleared when exiting the
      guest.  To fix this, we use the kvm_vcpu field instead, which does
      get cleared when exiting the guest, by the kvmppc_release_hwthread()
      calls inside kvmppc_run_core().
      
      The effect of having the check wrong was that when kvmppc_do_h_remove()
      got called from htab_write() on the destination machine during a
      migration, it cleared the current cpu's bit in kvm->arch.need_tlb_flush.
      This meant that when the guest started running in the destination VM,
      it may miss out on doing a complete TLB flush, and therefore may end
      up using stale TLB entries from a previous guest that used the same
      LPID value.
      
      This should make migration more reliable.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      55765483
  18. 28 4月, 2014 1 次提交
    • P
      KVM: PPC: Book3S: HV: make _PAGE_NUMA take effect · 1ad9f238
      pingfank@linux.vnet.ibm.com 提交于
      Numa fault is a method which help to achieve auto numa balancing.
      When such a page fault takes place, the page fault handler will check
      whether the page is placed correctly. If not, migration should be
      involved to cut down the distance between the cpu and pages.
      
      A pte with _PAGE_NUMA help to implement numa fault. It means not to
      allow the MMU to access the page directly. So a page fault is triggered
      and numa fault handler gets the opportunity to run checker.
      
      As for the access of MMU, we need special handling for the powernv's guest.
      When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
      invalidate it in guest's htab, but when we tried to re-insert them,
      we firstly try to map it in real-mode. Only after this fails, we fallback
      to virt mode, and most of important, we run numa fault handler in virt
      mode.  This patch guards the way of real-mode to ensure that if a pte is
      marked with _PAGE_NUMA, it will NOT be mapped in real mode, instead, it will
      be mapped in virt mode and have the opportunity to be checked with placement.
      Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1ad9f238
  19. 29 3月, 2014 1 次提交
    • P
      KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode · 797f9c07
      Paul Mackerras 提交于
      With HV KVM, some high-frequency hypercalls such as H_ENTER are handled
      in real mode, and need to access the memslots array for the guest.
      Accessing the memslots array is safe, because we hold the SRCU read
      lock for the whole time that a guest vcpu is running.  However, the
      checks that kvm_memslots() does when lockdep is enabled are potentially
      unsafe in real mode, when only the linear mapping is available.
      Furthermore, kvm_memslots() can be called from a secondary CPU thread,
      which is an offline CPU from the point of view of the host kernel,
      and is not running the task which holds the SRCU read lock.
      
      To avoid false positives in the checks in kvm_memslots(), and to avoid
      possible side effects from doing the checks in real mode, this replaces
      kvm_memslots() with kvm_memslots_raw() in all the places that execute
      in real mode.  kvm_memslots_raw() is a new function that is like
      kvm_memslots() but uses rcu_dereference_raw_notrace() instead of
      kvm_dereference_check().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      797f9c07
  20. 09 1月, 2014 1 次提交
  21. 18 12月, 2013 1 次提交
  22. 19 11月, 2013 2 次提交
    • P
      powerpc: kvm: fix rare but potential deadlock scene · 91648ec0
      pingfan liu 提交于
      Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
      realmode, so it can trigger the deadlock.
      
      Suppose the following scene:
      
      Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of
      vcpus.
      
      If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched
      out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be
      caught in realmode for a long time.
      
      What makes things even worse if the following happens,
        On cpuM, bitlockX is hold, on cpuN, Y is hold.
        vcpu_B_2 try to lock Y on cpuM in realmode
        vcpu_A_2 try to lock X on cpuN in realmode
      
      Oops! deadlock happens
      Signed-off-by: NLiu Ping Fan <pingfank@linux.vnet.ibm.com>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      CC: stable@vger.kernel.org
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      91648ec0
    • P
      KVM: PPC: Book3S HV: Fix physical address calculations · caaa4c80
      Paul Mackerras 提交于
      This fixes a bug in kvmppc_do_h_enter() where the physical address
      for a page can be calculated incorrectly if transparent huge pages
      (THP) are active.  Until THP came along, it was true that if we
      encountered a large (16M) page in kvmppc_do_h_enter(), then the
      associated memslot must be 16M aligned for both its guest physical
      address and the userspace address, and the physical address
      calculations in kvmppc_do_h_enter() assumed that.  With THP, that
      is no longer true.
      
      In the case where we are using MMU notifiers and the page size that
      we get from the Linux page tables is larger than the page being mapped
      by the guest, we need to fill in some low-order bits of the physical
      address.  Without THP, these bits would be the same in the guest
      physical address (gpa) and the host virtual address (hva).  With THP,
      they can be different, and we need to use the bits from hva rather
      than gpa.
      
      In the case where we are not using MMU notifiers, the host physical
      address we get from the memslot->arch.slot_phys[] array already
      includes the low-order bits down to the PAGE_SIZE level, even if
      we are using large pages.  Thus we can simplify the calculation in
      this case to just add in the remaining bits in the case where
      PAGE_SIZE is 64k and the guest is mapping a 4k page.
      
      The same bug exists in kvmppc_book3s_hv_page_fault().  The basic fix
      is to use psize (the page size from the HPTE) rather than pte_size
      (the page size from the Linux PTE) when updating the HPTE low word
      in r.  That means that pfn needs to be computed to PAGE_SIZE
      granularity even if the Linux PTE is a huge page PTE.  That can be
      arranged simply by doing the page_to_pfn() before setting page to
      the head of the compound page.  If psize is less than PAGE_SIZE,
      then we need to make sure we only update the bits from PAGE_SIZE
      upwards, in order not to lose any sub-page offset bits in r.
      On the other hand, if psize is greater than PAGE_SIZE, we need to
      make sure we don't bring in non-zero low order bits in pfn, hence
      we mask (pfn << PAGE_SHIFT) with ~(psize - 1).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      caaa4c80
  23. 14 8月, 2013 1 次提交
  24. 10 7月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Correct tlbie usage · 54480501
      Paul Mackerras 提交于
      This corrects the usage of the tlbie (TLB invalidate entry) instruction
      in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
      On the PPC970, the bit to select large vs. small page is in the instruction,
      not in the RB register value.  This changes the code to use the correct
      form on PPC970.
      
      On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
      field of the RB value incorrectly for 64k pages.  This fixes it.
      
      Since we now have several cases to handle for the tlbie instruction, this
      factors out the code to do a sequence of tlbies into a new function,
      do_tlbies(), and calls that from the various places where the code was
      doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
      use the same global_invalidates() function for determining whether to do
      local or global TLB invalidations as is used in other places, for
      consistency, and also to make sure that kvm->arch.need_tlb_flush gets
      updated properly.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54480501
  25. 21 6月, 2013 2 次提交
  26. 27 4月, 2013 1 次提交
    • P
      KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes · a1b4a0f6
      Paul Mackerras 提交于
      At present, the code that determines whether a HPT entry has changed,
      and thus needs to be sent to userspace when it is copying the HPT,
      doesn't consider a hardware update to the reference and change bits
      (R and C) in the HPT entries to constitute a change that needs to
      be sent to userspace.  This adds code to check for changes in R and C
      when we are scanning the HPT to find changed entries, and adds code
      to set the changed flag for the HPTE when we update the R and C bits
      in the guest view of the HPTE.
      
      Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
      as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
      function into kvm_book3s_64.h.
      
      Current Linux guest kernels don't use the hardware updates of R and C
      in the HPT, so this change won't affect them.  Linux (or other) kernels
      might in future want to use the R and C bits and have them correctly
      transferred across when a guest is migrated, so it is better to correct
      this deficiency.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a1b4a0f6
  27. 06 12月, 2012 2 次提交
    • P
      KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations · 1b400ba0
      Paul Mackerras 提交于
      When we change or remove a HPT (hashed page table) entry, we can do
      either a global TLB invalidation (tlbie) that works across the whole
      machine, or a local invalidation (tlbiel) that only affects this core.
      Currently we do local invalidations if the VM has only one vcpu or if
      the guest requests it with the H_LOCAL flag, though the guest Linux
      kernel currently doesn't ever use H_LOCAL.  Then, to cope with the
      possibility that vcpus moving around to different physical cores might
      expose stale TLB entries, there is some code in kvmppc_hv_entry to
      flush the whole TLB of entries for this VM if either this vcpu is now
      running on a different physical core from where it last ran, or if this
      physical core last ran a different vcpu.
      
      There are a number of problems on POWER7 with this as it stands:
      
      - The TLB invalidation is done per thread, whereas it only needs to be
        done per core, since the TLB is shared between the threads.
      - With the possibility of the host paging out guest pages, the use of
        H_LOCAL by an SMP guest is dangerous since the guest could possibly
        retain and use a stale TLB entry pointing to a page that had been
        removed from the guest.
      - The TLB invalidations that we do when a vcpu moves from one physical
        core to another are unnecessary in the case of an SMP guest that isn't
        using H_LOCAL.
      - The optimization of using local invalidations rather than global should
        apply to guests with one virtual core, not just one vcpu.
      
      (None of this applies on PPC970, since there we always have to
      invalidate the whole TLB when entering and leaving the guest, and we
      can't support paging out guest memory.)
      
      To fix these problems and simplify the code, we now maintain a simple
      cpumask of which cpus need to flush the TLB on entry to the guest.
      (This is indexed by cpu, though we only ever use the bits for thread
      0 of each core.)  Whenever we do a local TLB invalidation, we set the
      bits for every cpu except the bit for thread 0 of the core that we're
      currently running on.  Whenever we enter a guest, we test and clear the
      bit for our core, and flush the TLB if it was set.
      
      On initial startup of the VM, and when resetting the HPT, we set all the
      bits in the need_tlb_flush cpumask, since any core could potentially have
      stale TLB entries from the previous VM to use the same LPID, or the
      previous contents of the HPT.
      
      Then, we maintain a count of the number of online virtual cores, and use
      that when deciding whether to use a local invalidation rather than the
      number of online vcpus.  The code to make that decision is extracted out
      into a new function, global_invalidates().  For multi-core guests on
      POWER7 (i.e. when we are using mmu notifiers), we now never do local
      invalidations regardless of the H_LOCAL flag.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1b400ba0
    • P
      KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages · 1cc8ed0b
      Paul Mackerras 提交于
      Currently, if the guest does an H_PROTECT hcall requesting that the
      permissions on a HPT entry be changed to allow writing, we make the
      requested change even if the page is marked read-only in the host
      Linux page tables.  This is a problem since it would for instance
      allow a guest to modify a page that KSM has decided can be shared
      between multiple guests.
      
      To fix this, if the new permissions for the page allow writing, we need
      to look up the memslot for the page, work out the host virtual address,
      and look up the Linux page tables to get the PTE for the page.  If that
      PTE is read-only, we reduce the HPTE permissions to read-only.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1cc8ed0b