1. 14 7月, 2016 3 次提交
  2. 14 6月, 2016 1 次提交
  3. 02 6月, 2016 1 次提交
    • N
      KVM: x86: avoid write-tearing of TDP · b19ee2ff
      Nadav Amit 提交于
      In theory, nothing prevents the compiler from write-tearing PTEs, or
      split PTE writes. These partially-modified PTEs can be fetched by other
      cores and cause mayhem. I have not really encountered such case in
      real-life, but it does seem possible.
      
      For example, the compiler may try to do something creative for
      kvm_set_pte_rmapp() and perform multiple writes to the PTE.
      Signed-off-by: NNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b19ee2ff
  4. 06 5月, 2016 1 次提交
    • A
      mm: thp: kvm: fix memory corruption in KVM with THP enabled · 127393fb
      Andrea Arcangeli 提交于
      After the THP refcounting change, obtaining a compound pages from
      get_user_pages() no longer allows us to assume the entire compound page
      is immediately mappable from a secondary MMU.
      
      A secondary MMU doesn't want to call get_user_pages() more than once for
      each compound page, in order to know if it can map the whole compound
      page.  So a secondary MMU needs to know from a single get_user_pages()
      invocation when it can map immediately the entire compound page to avoid
      a flood of unnecessary secondary MMU faults and spurious
      atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
      users).
      
      Ideally instead of the page->_mapcount < 1 check, get_user_pages()
      should return the granularity of the "page" mapping in the "mm" passed
      to get_user_pages().  However it's non trivial change to pass the "pmd"
      status belonging to the "mm" walked by get_user_pages up the stack (up
      to the caller of get_user_pages).  So the fix just checks if there is
      not a single pte mapping on the page returned by get_user_pages, and in
      turn if the caller can assume that the whole compound page is mapped in
      the current "mm" (in a pmd_trans_huge()).  In such case the entire
      compound page is safe to map into the secondary MMU without additional
      get_user_pages() calls on the surrounding tail/head pages.  In addition
      of being faster, not having to run other get_user_pages() calls also
      reduces the memory footprint of the secondary MMU fault in case the pmd
      split happened as result of memory pressure.
      
      Without this fix after a MADV_DONTNEED (like invoked by QEMU during
      postcopy live migration or balloning) or after generic swapping (with a
      failure in split_huge_page() that would only result in pmd splitting and
      not a physical page split), KVM would map the whole compound page into
      the shadow pagetables, despite regular faults or userfaults (like
      UFFDIO_COPY) may map regular pages into the primary MMU as result of the
      pte faults, leading to the guest mode and userland mode going out of
      sync and not working on the same memory at all times.
      
      Any other secondary MMU notifier manager (KVM is just one of the many
      MMU notifier users) will need the same information if it doesn't want to
      run a flood of get_user_pages_fast and it can support multiple
      granularity in the secondary MMU mappings, so I think it is justified to
      be exposed not just to KVM.
      
      The other option would be to move transparent_hugepage_adjust to
      mm/huge_memory.c but that currently has all kind of KVM data structures
      in it, so it's definitely not a cut-and-paste work, so I couldn't do a
      fix as cleaner as this one for 4.6.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: "Li, Liang Z" <liang.z.li@intel.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      127393fb
  5. 20 4月, 2016 1 次提交
  6. 01 4月, 2016 1 次提交
  7. 31 3月, 2016 1 次提交
  8. 22 3月, 2016 3 次提交
  9. 10 3月, 2016 1 次提交
    • P
      KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 · 5f0b8199
      Paolo Bonzini 提交于
      KVM has special logic to handle pages with pte.u=1 and pte.w=0 when
      CR0.WP=1.  These pages' SPTEs flip continuously between two states:
      U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed)
      and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed).
      
      When SMEP is in effect, however, U=0 will enable kernel execution of
      this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
      with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1.
      When guest EFER has the NX bit cleared, the reserved bit check thinks
      that the latter state is invalid; teach it that the smep_andnot_wp case
      will also use the NX bit of SPTEs.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.inel.com>
      Fixes: c258b62bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5f0b8199
  10. 08 3月, 2016 8 次提交
  11. 04 3月, 2016 3 次提交
    • X
      KVM: MMU: check kvm_mmu_pages and mmu_page_path indices · e23d3fef
      Xiao Guangrong 提交于
      Give a special invalid index to the root of the walk, so that we
      can check the consistency of kvm_mmu_pages and mmu_page_path.
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      [Extracted from a bigger patch proposed by Guangrong. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e23d3fef
    • P
      KVM: MMU: Fix ubsan warnings · 0a47cd85
      Paolo Bonzini 提交于
      kvm_mmu_pages_init is doing some really yucky stuff.  It is setting
      up a sentinel for mmu_page_clear_parents; however, because of a) the
      way levels are numbered starting from 1 and b) the way mmu_page_path
      sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
      out of bounds.  This is harmless because the code overwrites up to the
      first two elements of parents->idx and these are initialized, and
      because the sentinel is not needed in this case---mmu_page_clear_parents
      exits anyway when it gets to the end of the array.  However ubsan
      complains, and everyone else should too.
      
      This fix does three things.  First it makes the mmu_page_path arrays
      PT64_ROOT_LEVEL elements in size, so that we can write to them without
      checking the level in advance.  Second it disintegrates kvm_mmu_pages_init
      between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
      for_each_sp (to place the NULL sentinel at the end of the current path).
      This is okay because the mmu_page_path is only used in
      mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
      a for_each_sp iterator, and hence always after a call to mmu_pages_next.
      Third it changes mmu_pages_clear_parents to just use the sentinel to
      stop iteration, without checking the bounds on level.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0a47cd85
    • P
      KVM: MMU: cleanup handle_abnormal_pfn · 798e88b3
      Paolo Bonzini 提交于
      The goto and temporary variable are unnecessary, just use return
      statements.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      798e88b3
  12. 03 3月, 2016 8 次提交
  13. 24 2月, 2016 1 次提交
  14. 23 2月, 2016 3 次提交
  15. 16 1月, 2016 1 次提交
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
  16. 07 1月, 2016 1 次提交
  17. 19 12月, 2015 1 次提交
  18. 26 11月, 2015 1 次提交