1. 22 4月, 2014 3 次提交
  2. 03 4月, 2014 1 次提交
    • M
      s390/mm,tlb: optimize TLB flushing for zEC12 · 1b948d6c
      Martin Schwidefsky 提交于
      The zEC12 machines introduced the local-clearing control for the IDTE
      and IPTE instruction. If the control is set only the TLB of the local
      CPU is cleared of entries, either all entries of a single address space
      for IDTE, or the entry for a single page-table entry for IPTE.
      Without the local-clearing control the TLB flush is broadcasted to all
      CPUs in the configuration, which is expensive.
      
      The reset of the bit mask of the CPUs that need flushing after a
      non-local IDTE is tricky. As TLB entries for an address space remain
      in the TLB even if the address space is detached a new bit field is
      required to keep track of attached CPUs vs. CPUs in the need of a
      flush. After a non-local flush with IDTE the bit-field of attached CPUs
      is copied to the bit-field of CPUs in need of a flush. The ordering
      of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is
      such that an underindication in mm_cpumask(mm) is prevented but an
      overindication in mm_cpumask(mm) is possible.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1b948d6c
  3. 21 3月, 2014 2 次提交
  4. 21 2月, 2014 2 次提交
    • K
      s390/kvm: support collaborative memory management · b31288fa
      Konstantin Weitz 提交于
      This patch enables Collaborative Memory Management (CMM) for kvm
      on s390. CMM allows the guest to inform the host about page usage
      (see arch/s390/mm/cmm.c). The host uses this information to avoid
      swapping in unused pages in the page fault handler. Further, a CPU
      provided list of unused invalid pages is processed to reclaim swap
      space of not yet accessed unused pages.
      
      [ Martin Schwidefsky: patch reordering and cleanup ]
      Signed-off-by: NKonstantin Weitz <konstantin.weitz@gmail.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b31288fa
    • M
      s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries · 53e857f3
      Martin Schwidefsky 提交于
      Git commit 050eef36 "[S390] fix tlb flushing vs. concurrent
      /proc accesses" introduced the attach counter to avoid using the
      mm_users value to decide between IPTE for every PTE and lazy TLB
      flushing with IDTE. That fixed the problem with mm_users but it
      introduced another subtle race, fortunately one that is very hard
      to hit.
      The background is the requirement of the architecture that a valid
      PTE may not be changed while it can be used concurrently by another
      cpu. The decision between IPTE and lazy TLB flushing needs to be
      done while the PTE is still valid. Now if the virtual cpu is
      temporarily stopped after the decision to use lazy TLB flushing but
      before the invalid bit of the PTE has been set, another cpu can attach
      the mm, find that flush_mm is set, do the IDTE, return to userspace,
      and recreate a TLB that uses the PTE in question. When the first,
      stopped cpu continues it will change the PTE while it is attached on
      another cpu. The first cpu will do another IDTE shortly after the
      modification of the PTE which makes the race window quite short.
      
      To fix this race the CPU that wants to attach the address space of a
      user space thread needs to wait for the end of the PTE modification.
      The number of concurrent TLB flushers for an mm is tracked in the
      upper 16 bits of the attach_count and finish_arch_post_lock_switch
      is used to wait for the end of the flush operation if required.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      53e857f3
  5. 30 1月, 2014 1 次提交
  6. 15 10月, 2013 1 次提交
  7. 29 8月, 2013 1 次提交
    • M
      s390/mm: implement software referenced bits · 0944fe3f
      Martin Schwidefsky 提交于
      The last remaining use for the storage key of the s390 architecture
      is reference counting. The alternative is to make page table entries
      invalid while they are old. On access the fault handler marks the
      pte/pmd as young which makes the pte/pmd valid if the access rights
      allow read access. The pte/pmd invalidations required for software
      managed reference bits cost a bit of performance, on the other hand
      the RRBE/RRBM instructions to read and reset the referenced bits are
      quite expensive as well.
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      0944fe3f
  8. 28 8月, 2013 1 次提交
  9. 22 8月, 2013 4 次提交
  10. 29 7月, 2013 1 次提交
  11. 29 6月, 2013 1 次提交
  12. 20 6月, 2013 1 次提交
  13. 05 6月, 2013 3 次提交
  14. 28 5月, 2013 1 次提交
  15. 21 5月, 2013 3 次提交
  16. 17 5月, 2013 1 次提交
  17. 03 5月, 2013 1 次提交
  18. 30 4月, 2013 1 次提交
    • G
      mm/hugetlb: add more arch-defined huge_pte functions · 106c992a
      Gerald Schaefer 提交于
      Commit abf09bed ("s390/mm: implement software dirty bits")
      introduced another difference in the pte layout vs.  the pmd layout on
      s390, thoroughly breaking the s390 support for hugetlbfs.  This requires
      replacing some more pte_xxx functions in mm/hugetlbfs.c with a
      huge_pte_xxx version.
      
      This patch introduces those huge_pte_xxx functions and their generic
      implementation in asm-generic/hugetlb.h, which will now be included on
      all architectures supporting hugetlbfs apart from s390.  This change
      will be a no-op for those architectures.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Acked-by: Michal Hocko <mhocko@suse.cz>	[for !s390 parts]
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      106c992a
  19. 23 4月, 2013 1 次提交
  20. 17 4月, 2013 1 次提交
    • L
      s390: move dummy io_remap_pfn_range() to asm/pgtable.h · 4f2e2903
      Linus Torvalds 提交于
      Commit b4cbb197 ("vm: add vm_iomap_memory() helper function") added
      a helper function wrapper around io_remap_pfn_range(), and every other
      architecture defined it in <asm/pgtable.h>.
      
      The s390 choice of <asm/io.h> may make sense, but is not very convenient
      for this case, and gratuitous differences like that cause unexpected errors like this:
      
         mm/memory.c: In function 'vm_iomap_memory':
         mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration]
      
      Glory be the kbuild test robot who noticed this, bisected it, and
      reported it to the guilty parties (ie me).
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f2e2903
  21. 02 4月, 2013 2 次提交
    • H
      s390/mm: provide emtpy check_pgt_cache() function · 765a0cac
      Heiko Carstens 提交于
      All architectures need to provide a check_pgt_cache() function. The s390 one
      got lost somewhere.
      So reintroduce it to prevent future compile errors e.g. if Thomas Gleixner's
      idle loop rework patches get merged.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      765a0cac
    • H
      s390/uaccess: fix page table walk · ea81531d
      Heiko Carstens 提交于
      When translating user space addresses to kernel addresses the follow_table()
      function had two bugs:
      
      - PROT_NONE mappings could be read accessed via the kernel mapping. That is
        e.g. putting a filename into a user page, then protecting the page with
        PROT_NONE and afterwards issuing the "open" syscall with a pointer to
        the filename would incorrectly succeed.
      
      - when walking the page tables it used the pgd/pud/pmd/pte primitives which
        with dynamic page tables give no indication which real level of page tables
        is being walked (region2, region3, segment or page table). So in case of an
        exception the translation exception code passed to __handle_fault() is not
        necessarily correct.
        This is not really an issue since __handle_fault() doesn't evaluate the code.
        Only in case of e.g. a SIGBUS this code gets passed to user space. If user
        space can do something sane with the value is a different question though.
      
      To fix these issues don't use any Linux primitives. Only walk the page tables
      like the hardware would do it, however we leave quite some checks away since
      we know that we only have full size page tables and each index is within bounds.
      
      In theory this should fix all issues...
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      ea81531d
  22. 08 3月, 2013 1 次提交
  23. 28 2月, 2013 1 次提交
  24. 14 2月, 2013 2 次提交
    • M
      s390/mm: implement software dirty bits · abf09bed
      Martin Schwidefsky 提交于
      The s390 architecture is unique in respect to dirty page detection,
      it uses the change bit in the per-page storage key to track page
      modifications. All other architectures track dirty bits by means
      of page table entries. This property of s390 has caused numerous
      problems in the past, e.g. see git commit ef5d437f
      "mm: fix XFS oops due to dirty pages without buffers on s390".
      
      To avoid future issues in regard to per-page dirty bits convert
      s390 to a fault based software dirty bit detection mechanism. All
      user page table entries which are marked as clean will be hardware
      read-only, even if the pte is supposed to be writable. A write by
      the user process will trigger a protection fault which will cause
      the user pte to be marked as dirty and the hardware read-only bit
      is removed.
      
      With this change the dirty bit in the storage key is irrelevant
      for Linux as a host, but the storage key is still required for
      KVM guests. The effect is that page_test_and_clear_dirty and the
      related code can be removed. The referenced bit in the storage
      key is still used by the page_test_and_clear_young primitive to
      provide page age information.
      
      For page cache pages of mappings with mapping_cap_account_dirty
      there will not be any change in behavior as the dirty bit tracking
      already uses read-only ptes to control the amount of dirty pages.
      Only for swap cache pages and pages of mappings without
      mapping_cap_account_dirty there can be additional protection faults.
      To avoid an excessive number of additional faults the mk_pte
      primitive checks for PageDirty if the pgprot value allows for writes
      and pre-dirties the pte. That avoids all additional faults for
      tmpfs and shmem pages until these pages are added to the swap cache.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      abf09bed
    • H
      s390/mm: provide PAGE_SHARED define · bddb7ae2
      Heiko Carstens 提交于
      Only needed to make some drivers compile...
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      bddb7ae2
  25. 22 1月, 2013 1 次提交
  26. 13 1月, 2013 1 次提交
  27. 13 12月, 2012 1 次提交