1. 19 9月, 2017 1 次提交
  2. 29 8月, 2017 1 次提交
    • C
      s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangs · fa41ba0d
      Christian Borntraeger 提交于
      Right now there is a potential hang situation for postcopy migrations,
      if the guest is enabling storage keys on the target system during the
      postcopy process.
      
      For storage key virtualization, we have to forbid the empty zero page as
      the storage key is a property of the physical page frame.  As we enable
      storage key handling lazily we then drop all mappings for empty zero
      pages for lazy refaulting later on.
      
      This does not work with the postcopy migration, which relies on the
      empty zero page never triggering a fault again in the future. The reason
      is that postcopy migration will simply read a page on the target system
      if that page is a known zero page to fault in an empty zero page.  At
      the same time postcopy remembers that this page was already transferred
      - so any future userfault on that page will NOT be retransmitted again
      to avoid races.
      
      If now the guest enters the storage key mode while in postcopy, we will
      break this assumption of postcopy.
      
      The solution is to disable the empty zero page for KVM guests early on
      and not during storage key enablement. With this change, the postcopy
      migration process is guaranteed to start after no zero pages are left.
      
      As guest pages are very likely not empty zero pages anyway the memory
      overhead is also pretty small.
      
      While at it this also adds proper page table locking to the zero page
      removal.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      fa41ba0d
  3. 26 7月, 2017 4 次提交
  4. 25 7月, 2017 3 次提交
  5. 12 6月, 2017 3 次提交
  6. 20 4月, 2017 1 次提交
  7. 12 4月, 2017 1 次提交
    • C
      s390/mm: fix CMMA vs KSM vs others · a8f60d1f
      Christian Borntraeger 提交于
      On heavy paging with KSM I see guest data corruption. Turns out that
      KSM will add pages to its tree, where the mapping return true for
      pte_unused (or might become as such later).  KSM will unmap such pages
      and reinstantiate with different attributes (e.g. write protected or
      special, e.g. in replace_page or write_protect_page)). This uncovered
      a bug in our pagetable handling: We must remove the unused flag as
      soon as an entry becomes present again.
      
      Cc: stable@vger.kernel.org
      Signed-of-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      a8f60d1f
  8. 10 3月, 2017 1 次提交
  9. 23 2月, 2017 1 次提交
  10. 17 2月, 2017 1 次提交
  11. 08 2月, 2017 1 次提交
    • M
      s390: add no-execute support · 57d7f939
      Martin Schwidefsky 提交于
      Bit 0x100 of a page table, segment table of region table entry
      can be used to disallow code execution for the virtual addresses
      associated with the entry.
      
      There is one tricky bit, the system call to return from a signal
      is part of the signal frame written to the user stack. With a
      non-executable stack this would stop working. To avoid breaking
      things the protection fault handler checks the opcode that caused
      the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn)
      and injects a system call. This is preferable to the alternative
      solution with a stub function in the vdso because it works for
      vdso=off and statically linked binaries as well.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      57d7f939
  12. 24 8月, 2016 2 次提交
  13. 31 7月, 2016 1 次提交
    • G
      s390/mm: clean up pte/pmd encoding · bc29b7ac
      Gerald Schaefer 提交于
      The hugetlbfs pte<->pmd conversion functions currently assume that the pmd
      bit layout is consistent with the pte layout, which is not really true.
      
      The SW read and write bits are encoded as the sequence "wr" in a pte, but
      in a pmd it is "rw". The hugetlbfs conversion assumes that the sequence
      is identical in both cases, which results in swapped read and write bits
      in the pmd. In practice this is not a problem, because those pmd bits are
      only relevant for THP pmds and not for hugetlbfs pmds. The hugetlbfs code
      works on (fake) ptes, and the converted pte bits are correct.
      
      There is another variation in pte/pmd encoding which affects dirty
      prot-none ptes/pmds. In this case, a pmd has both its HW read-only and
      invalid bit set, while it is only the invalid bit for a pte. This also has
      no effect in practice, but it should better be consistent.
      
      This patch fixes both inconsistencies by changing the SW read/write bit
      layout for pmds as well as the PAGE_NONE encoding for ptes. It also makes
      the hugetlbfs conversion functions more robust by introducing a
      move_set_bit() macro that uses the pte/pmd bit #defines instead of
      constant shifts.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      bc29b7ac
  14. 06 7月, 2016 1 次提交
  15. 20 6月, 2016 3 次提交
    • D
      s390/mm: shadow pages with real guest requested protection · a9d23e71
      David Hildenbrand 提交于
      We really want to avoid manually handling protection for nested
      virtualization. By shadowing pages with the protection the guest asked us
      for, the SIE can handle most protection-related actions for us (e.g.
      special handling for MVPG) and we can directly forward protection
      exceptions to the guest.
      
      PTEs will now always be shadowed with the correct _PAGE_PROTECT flag.
      Unshadowing will take care of any guest changes to the parent PTE and
      any host changes to the host PTE. If the host PTE doesn't have the
      fitting access rights or is not available, we have to fix it up.
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      a9d23e71
    • M
      s390/mm: add shadow gmap support · 4be130a0
      Martin Schwidefsky 提交于
      For a nested KVM guest the outer KVM host needs to create shadow
      page tables for the nested guest. This patch adds the basic support
      to the guest address space (gmap) code.
      
      For each guest address space the inner KVM host creates, the first
      outer KVM host needs to create shadow page tables. The address space
      is identified by the ASCE loaded into the control register 1 at the
      time the inner SIE instruction for the second nested KVM guest is
      executed. The outer KVM host creates the shadow tables starting with
      the table identified by the ASCE on a on-demand basis. The outer KVM
      host will get repeated faults for all the shadow tables needed to
      run the second KVM guest.
      
      While a shadow page table for the second KVM guest is active the access
      to the origin region, segment and page tables needs to be restricted
      for the first KVM guest. For region and segment and page tables the first
      KVM guest may read the memory, but write attempt has to lead to an
      unshadow.  This is done using the page invalid and read-only bits in the
      page table of the first KVM guest. If the first guest re-accesses one of
      the origin pages of a shadow, it gets a fault and the affected parts of
      the shadow page table hierarchy needs to be removed again.
      
      PGSTE tables don't have to be shadowed, as all interpretation assist can't
      deal with the invalid bits in the shadow pte being set differently than
      the original ones provided by the first KVM guest.
      
      Many bug fixes and improvements by David Hildenbrand.
      Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      4be130a0
    • M
      s390/mm: extended gmap pte notifier · b2d73b2a
      Martin Schwidefsky 提交于
      The current gmap pte notifier forces a pte into to a read-write state.
      If the pte is invalidated the gmap notifier is called to inform KVM
      that the mapping will go away.
      
      Extend this approach to allow read-write, read-only and no-access
      as possible target states and call the pte notifier for any change
      to the pte.
      
      This mechanism is used to temporarily set specific access rights for
      a pte without doing the heavy work of a true mprotect call.
      Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      b2d73b2a
  16. 13 6月, 2016 7 次提交
  17. 10 6月, 2016 4 次提交
  18. 20 5月, 2016 1 次提交
    • H
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins 提交于
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      called).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd8cfd30
  19. 08 3月, 2016 3 次提交