1. 09 5月, 2017 1 次提交
  2. 07 4月, 2017 1 次提交
    • P
      kvm: nVMX: support EPT accessed/dirty bits · ae1e2d10
      Paolo Bonzini 提交于
      Now use bit 6 of EPTP to optionally enable A/D bits for EPTP.  Another
      thing to change is that, when EPT accessed and dirty bits are not in use,
      VMX treats accesses to guest paging structures as data reads.  When they
      are in use (bit 6 of EPTP is set), they are treated as writes and the
      corresponding EPT dirty bit is set.  The MMU didn't know this detail,
      so this patch adds it.
      
      We also have to fix up the exit qualification.  It may be wrong because
      KVM sets bit 6 but the guest might not.
      
      L1 emulates EPT A/D bits using write permissions, so in principle it may
      be possible for EPT A/D bits to be used by L1 even though not available
      in hardware.  The problem is that guest page-table walks will be treated
      as reads rather than writes, so they would not cause an EPT violation.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Fixed typo in walk_addr_generic() comment and changed bit clear +
       conditional-set pattern in handle_ept_violation() to conditional-clear]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ae1e2d10
  3. 02 3月, 2017 1 次提交
  4. 28 2月, 2017 1 次提交
  5. 27 1月, 2017 4 次提交
  6. 09 1月, 2017 7 次提交
  7. 25 11月, 2016 1 次提交
    • T
      kvm: svm: Add support for additional SVM NPF error codes · 14727754
      Tom Lendacky 提交于
      AMD hardware adds two additional bits to aid in nested page fault handling.
      
      Bit 32 - NPF occurred while translating the guest's final physical address
      Bit 33 - NPF occurred while translating the guest page tables
      
      The guest page tables fault indicator can be used as an aid for nested
      virtualization. Using V0 for the host, V1 for the first level guest and
      V2 for the second level guest, when both V1 and V2 are using nested paging
      there are currently a number of unnecessary instruction emulations. When
      V2 is launched shadow paging is used in V1 for the nested tables of V2. As
      a result, KVM marks these pages as RO in the host nested page tables. When
      V2 exits and we resume V1, these pages are still marked RO.
      
      Every nested walk for a guest page table is treated as a user-level write
      access and this causes a lot of NPFs because the V1 page tables are marked
      RO in the V0 nested tables. While executing V1, when these NPFs occur KVM
      sees a write to a read-only page, emulates the V1 instruction and unprotects
      the page (marking it RW). This patch looks for cases where we get a NPF due
      to a guest page table walk where the page was marked RO. It immediately
      unprotects the page and resumes the guest, leading to far fewer instruction
      emulations when nested virtualization is used.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      14727754
  8. 23 11月, 2016 1 次提交
  9. 04 11月, 2016 2 次提交
  10. 03 11月, 2016 2 次提交
  11. 19 8月, 2016 1 次提交
  12. 14 7月, 2016 5 次提交
    • P
      x86/kvm: Audit and remove any unnecessary uses of module.h · 1767e931
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  In the case of
      kvm where it is modular, we can extend that to also include files
      that are building basic support functionality but not related
      to loading or registering the final module; such files also have
      no need whatsoever for module.h
      
      The advantage in removing such instances is that module.h itself
      sources about 15 other headers; adding significantly to what we feed
      cpp, and it can obscure what headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each instance for the
      presence of either and replace as needed.
      
      Several instances got replaced with moduleparam.h since that was
      really all that was required for those particular files.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160714001901.31603-8-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1767e931
    • B
      kvm: mmu: track read permission explicitly for shadow EPT page tables · d95c5568
      Bandan Das 提交于
      To support execute only mappings on behalf of L1 hypervisors,
      reuse ACC_USER_MASK to signify if the L1 hypervisor has the R bit
      set.
      
      For the nested EPT case, we assumed that the U bit was always set
      since there was no equivalent in EPT page tables.  Strictly
      speaking, this was not necessary because handle_ept_violation
      never set PFERR_USER_MASK in the error code (uf=0 in the
      parlance of update_permission_bitmask).  We now have to set
      both U and UF correctly, respectively in FNAME(gpte_access)
      and in handle_ept_violation.
      
      Also in handle_ept_violation bit 3 of the exit qualification is
      not enough to detect a present PTE; all three bits 3-5 have to
      be checked.
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d95c5568
    • B
      kvm: mmu: don't set the present bit unconditionally · ffb128c8
      Bandan Das 提交于
      To support execute only mappings on behalf of L1
      hypervisors, we need to teach set_spte() to honor all three of
      L1's XWR bits.  As a start, add a new variable "shadow_present_mask"
      that will be set for non-EPT shadow paging and clear for EPT.
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ffb128c8
    • B
      kvm: mmu: remove is_present_gpte() · 812f30b2
      Bandan Das 提交于
      We have two versions of the above function.
      To prevent confusion and bugs in the future, remove
      the non-FNAME version entirely and replace all calls
      with the actual check.
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      812f30b2
    • B
      kvm: mmu: extend the is_present check to 32 bits · 8d5cf161
      Bandan Das 提交于
      This is safe because this function is called
      on host controlled page table and non-present/non-MMIO
      sptes never use bits 1..31. For the EPT case, this
      ensures that cases where only the execute bit is set
      is marked valid.
      Signed-off-by: NBandan Das <bsd@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8d5cf161
  13. 14 6月, 2016 1 次提交
  14. 02 6月, 2016 1 次提交
    • N
      KVM: x86: avoid write-tearing of TDP · b19ee2ff
      Nadav Amit 提交于
      In theory, nothing prevents the compiler from write-tearing PTEs, or
      split PTE writes. These partially-modified PTEs can be fetched by other
      cores and cause mayhem. I have not really encountered such case in
      real-life, but it does seem possible.
      
      For example, the compiler may try to do something creative for
      kvm_set_pte_rmapp() and perform multiple writes to the PTE.
      Signed-off-by: NNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b19ee2ff
  15. 06 5月, 2016 1 次提交
    • A
      mm: thp: kvm: fix memory corruption in KVM with THP enabled · 127393fb
      Andrea Arcangeli 提交于
      After the THP refcounting change, obtaining a compound pages from
      get_user_pages() no longer allows us to assume the entire compound page
      is immediately mappable from a secondary MMU.
      
      A secondary MMU doesn't want to call get_user_pages() more than once for
      each compound page, in order to know if it can map the whole compound
      page.  So a secondary MMU needs to know from a single get_user_pages()
      invocation when it can map immediately the entire compound page to avoid
      a flood of unnecessary secondary MMU faults and spurious
      atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
      users).
      
      Ideally instead of the page->_mapcount < 1 check, get_user_pages()
      should return the granularity of the "page" mapping in the "mm" passed
      to get_user_pages().  However it's non trivial change to pass the "pmd"
      status belonging to the "mm" walked by get_user_pages up the stack (up
      to the caller of get_user_pages).  So the fix just checks if there is
      not a single pte mapping on the page returned by get_user_pages, and in
      turn if the caller can assume that the whole compound page is mapped in
      the current "mm" (in a pmd_trans_huge()).  In such case the entire
      compound page is safe to map into the secondary MMU without additional
      get_user_pages() calls on the surrounding tail/head pages.  In addition
      of being faster, not having to run other get_user_pages() calls also
      reduces the memory footprint of the secondary MMU fault in case the pmd
      split happened as result of memory pressure.
      
      Without this fix after a MADV_DONTNEED (like invoked by QEMU during
      postcopy live migration or balloning) or after generic swapping (with a
      failure in split_huge_page() that would only result in pmd splitting and
      not a physical page split), KVM would map the whole compound page into
      the shadow pagetables, despite regular faults or userfaults (like
      UFFDIO_COPY) may map regular pages into the primary MMU as result of the
      pte faults, leading to the guest mode and userland mode going out of
      sync and not working on the same memory at all times.
      
      Any other secondary MMU notifier manager (KVM is just one of the many
      MMU notifier users) will need the same information if it doesn't want to
      run a flood of get_user_pages_fast and it can support multiple
      granularity in the secondary MMU mappings, so I think it is justified to
      be exposed not just to KVM.
      
      The other option would be to move transparent_hugepage_adjust to
      mm/huge_memory.c but that currently has all kind of KVM data structures
      in it, so it's definitely not a cut-and-paste work, so I couldn't do a
      fix as cleaner as this one for 4.6.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: "Li, Liang Z" <liang.z.li@intel.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      127393fb
  16. 20 4月, 2016 1 次提交
  17. 01 4月, 2016 1 次提交
  18. 31 3月, 2016 1 次提交
  19. 22 3月, 2016 3 次提交
  20. 10 3月, 2016 1 次提交
    • P
      KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 · 5f0b8199
      Paolo Bonzini 提交于
      KVM has special logic to handle pages with pte.u=1 and pte.w=0 when
      CR0.WP=1.  These pages' SPTEs flip continuously between two states:
      U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed)
      and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed).
      
      When SMEP is in effect, however, U=0 will enable kernel execution of
      this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
      with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1.
      When guest EFER has the NX bit cleared, the reserved bit check thinks
      that the latter state is invalid; teach it that the smep_andnot_wp case
      will also use the NX bit of SPTEs.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.inel.com>
      Fixes: c258b62bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5f0b8199
  21. 08 3月, 2016 3 次提交