1. 28 2月, 2013 1 次提交
  2. 14 2月, 2013 2 次提交
    • M
      s390/mm: implement software dirty bits · abf09bed
      Martin Schwidefsky 提交于
      The s390 architecture is unique in respect to dirty page detection,
      it uses the change bit in the per-page storage key to track page
      modifications. All other architectures track dirty bits by means
      of page table entries. This property of s390 has caused numerous
      problems in the past, e.g. see git commit ef5d437f
      "mm: fix XFS oops due to dirty pages without buffers on s390".
      
      To avoid future issues in regard to per-page dirty bits convert
      s390 to a fault based software dirty bit detection mechanism. All
      user page table entries which are marked as clean will be hardware
      read-only, even if the pte is supposed to be writable. A write by
      the user process will trigger a protection fault which will cause
      the user pte to be marked as dirty and the hardware read-only bit
      is removed.
      
      With this change the dirty bit in the storage key is irrelevant
      for Linux as a host, but the storage key is still required for
      KVM guests. The effect is that page_test_and_clear_dirty and the
      related code can be removed. The referenced bit in the storage
      key is still used by the page_test_and_clear_young primitive to
      provide page age information.
      
      For page cache pages of mappings with mapping_cap_account_dirty
      there will not be any change in behavior as the dirty bit tracking
      already uses read-only ptes to control the amount of dirty pages.
      Only for swap cache pages and pages of mappings without
      mapping_cap_account_dirty there can be additional protection faults.
      To avoid an excessive number of additional faults the mk_pte
      primitive checks for PageDirty if the pgprot value allows for writes
      and pre-dirties the pte. That avoids all additional faults for
      tmpfs and shmem pages until these pages are added to the swap cache.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      abf09bed
    • H
      s390/mm: provide PAGE_SHARED define · bddb7ae2
      Heiko Carstens 提交于
      Only needed to make some drivers compile...
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      bddb7ae2
  3. 22 1月, 2013 1 次提交
  4. 13 1月, 2013 1 次提交
  5. 13 12月, 2012 1 次提交
  6. 23 11月, 2012 2 次提交
  7. 26 10月, 2012 1 次提交
  8. 09 10月, 2012 6 次提交
  9. 20 7月, 2012 1 次提交
    • H
      s390/comments: unify copyright messages and remove file names · a53c8fab
      Heiko Carstens 提交于
      Remove the file name from the comment at top of many files. In most
      cases the file name was wrong anyway, so it's rather pointless.
      
      Also unify the IBM copyright statement. We did have a lot of sightly
      different statements and wanted to change them one after another
      whenever a file gets touched. However that never happened. Instead
      people start to take the old/"wrong" statements to use as a template
      for new files.
      So unify all of them in one go.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a53c8fab
  10. 24 5月, 2012 1 次提交
  11. 27 12月, 2011 1 次提交
    • M
      [S390] add support for physical memory > 4TB · 14045ebf
      Martin Schwidefsky 提交于
      The kernel address space of a 64 bit kernel currently uses a three level
      page table and the vmemmap array has a fixed address and a fixed maximum
      size. A three level page table is good enough for systems with less than
      3.8TB of memory, for bigger systems four page table levels need to be
      used. Each page table level costs a bit of performance, use 3 levels for
      normal systems and 4 levels only for the really big systems.
      To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
      maximum of 64TB of memory.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      14045ebf
  12. 01 12月, 2011 1 次提交
  13. 14 11月, 2011 1 次提交
  14. 30 10月, 2011 2 次提交
  15. 20 9月, 2011 1 次提交
  16. 24 7月, 2011 1 次提交
    • M
      [S390] kvm guest address space mapping · e5992f2e
      Martin Schwidefsky 提交于
      Add code that allows KVM to control the virtual memory layout that
      is seen by a guest. The guest address space uses a second page table
      that shares the last level pte-tables with the process page table.
      If a page is unmapped from the process page table it is automatically
      unmapped from the guest page table as well.
      
      The guest address space mapping starts out empty, KVM can map any
      individual 1MB segments from the process virtual memory to any 1MB
      aligned location in the guest virtual memory. If a target segment in
      the process virtual memory does not exist or is unmapped while a
      guest mapping exists the desired target address is stored as an
      invalid segment table entry in the guest page table.
      The population of the guest page table is fault driven.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e5992f2e
  17. 06 6月, 2011 1 次提交
    • M
      [S390] fix kvm defines for 31 bit compile · 6c61cfe9
      Martin Schwidefsky 提交于
      KVM is not available for 31 bit but the KVM defines cause warnings:
      
      arch/s390/include/asm/pgtable.h: In function 'ptep_test_and_clear_user_dirty':
      arch/s390/include/asm/pgtable.h:817: warning: integer constant is too large for 'unsigned long' type
      arch/s390/include/asm/pgtable.h:818: warning: integer constant is too large for 'unsigned long' type
      arch/s390/include/asm/pgtable.h: In function 'ptep_test_and_clear_user_young':
      arch/s390/include/asm/pgtable.h:837: warning: integer constant is too large for 'unsigned long' type
      arch/s390/include/asm/pgtable.h:838: warning: integer constant is too large for 'unsigned long' type
      
      Add 31 bit versions of the KVM defines to remove the warnings.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6c61cfe9
  18. 29 5月, 2011 1 次提交
    • H
      [S390] mm: fix storage key handling · a43a9d93
      Heiko Carstens 提交于
      page_get_storage_key() and page_set_storage_key() expect a page address
      and not its page frame number. This got inconsistent with 2d42552d
      "[S390] merge page_test_dirty and page_clear_dirty".
      
      Result is that we read/write storage keys from random pages and do not
      have a working dirty bit tracking at all.
      E.g. SetPageUpdate() doesn't clear the dirty bit of requested pages, which
      for example ext4 doesn't like very much and panics after a while.
      
      Unable to handle kernel paging request at virtual user address (null)
      Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in:
      CPU: 1 Not tainted 2.6.39-07551-g139f37f5-dirty #152
      Process flush-94:0 (pid: 1576, task: 000000003eb34538, ksp: 000000003c287b70)
      Krnl PSW : 0704c00180000000 0000000000316b12 (jbd2_journal_file_inode+0x10e/0x138)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
      Krnl GPRS: 0000000000000000 0000000000000000 0000000000000000 0700000000000000
                 0000000000316a62 000000003eb34cd0 0000000000000025 000000003c287b88
                 0000000000000001 000000003c287a70 000000003f1ec678 000000003f1ec000
                 0000000000000000 000000003e66ec00 0000000000316a62 000000003c287988
      Krnl Code: 0000000000316b04: f0a0000407f4       srp     4(11,%r0),2036,0
                 0000000000316b0a: b9020022           ltgr    %r2,%r2
                 0000000000316b0e: a7740015           brc     7,316b38
                >0000000000316b12: e3d0c0000024       stg     %r13,0(%r12)
                 0000000000316b18: 4120c010           la      %r2,16(%r12)
                 0000000000316b1c: 4130d060           la      %r3,96(%r13)
                 0000000000316b20: e340d0600004       lg      %r4,96(%r13)
                 0000000000316b26: c0e50002b567       brasl   %r14,36d5f4
      Call Trace:
      ([<0000000000316a62>] jbd2_journal_file_inode+0x5e/0x138)
       [<00000000002da13c>] mpage_da_map_and_submit+0x2e8/0x42c
       [<00000000002daac2>] ext4_da_writepages+0x2da/0x504
       [<00000000002597e8>] writeback_single_inode+0xf8/0x268
       [<0000000000259f06>] writeback_sb_inodes+0xd2/0x18c
       [<000000000025a700>] writeback_inodes_wb+0x80/0x168
       [<000000000025aa92>] wb_writeback+0x2aa/0x324
       [<000000000025abde>] wb_do_writeback+0xd2/0x274
       [<000000000025ae3a>] bdi_writeback_thread+0xba/0x1c4
       [<00000000001737be>] kthread+0xa6/0xb0
       [<000000000056c1da>] kernel_thread_starter+0x6/0xc
       [<000000000056c1d4>] kernel_thread_starter+0x0/0xc
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
       [<0000000000316a8a>] jbd2_journal_file_inode+0x86/0x138
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a43a9d93
  19. 23 5月, 2011 3 次提交
    • M
      [S390] refactor page table functions for better pgste support · b2fa47e6
      Martin Schwidefsky 提交于
      Rework the architecture page table functions to access the bits in the
      page table extension array (pgste). There are a number of changes:
      1) Fix missing pgste update if the attach_count for the mm is <= 1.
      2) For every operation that affects the invalid bit in the pte or the
         rcp byte in the pgste the pcl lock needs to be acquired. The function
         pgste_get_lock gets the pcl lock and returns the current pgste value
         for a pte pointer. The function pgste_set_unlock stores the pgste
         and releases the lock. Between these two calls the bits in the pgste
         can be shuffled.
      3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid
         calling SetPageDirty and SetPageReferenced from pgtable.h. If the
         host reference backup bit or the host change backup bit has been
         set the dirty/referenced state is transfered to the pte. The common
         code will pick up the state from the pte.
      4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect.
      5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel
         pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate.
      6) Rename kvm_s390_test_and_clear_page_dirty to
         ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young.
      7) Define mm_exclusive() and mm_has_pgste() helper to improve readability.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b2fa47e6
    • M
      [S390] merge page_test_dirty and page_clear_dirty · 2d42552d
      Martin Schwidefsky 提交于
      The page_clear_dirty primitive always sets the default storage key
      which resets the access control bits and the fetch protection bit.
      That will surprise a KVM guest that sets non-zero access control
      bits or the fetch protection bit. Merge page_test_dirty and
      page_clear_dirty back to a single function and only clear the
      dirty bit from the storage key.
      
      In addition move the function page_test_and_clear_dirty and
      page_test_and_clear_young to page.h where they belong. This
      requires to change the parameter from a struct page * to a page
      frame number.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      2d42552d
    • M
      [S390] Remove data execution protection · 043d0708
      Martin Schwidefsky 提交于
      The noexec support on s390 does not rely on a bit in the page table
      entry but utilizes the secondary space mode to distinguish between
      memory accesses for instructions vs. data. The noexec code relies
      on the assumption that the cpu will always use the secondary space
      page table for data accesses while it is running in the secondary
      space mode. Up to the z9-109 class machines this has been the case.
      Unfortunately this is not true anymore with z10 and later machines.
      The load-relative-long instructions lrl, lgrl and lgfrl access the
      memory operand using the same addressing-space mode that has been
      used to fetch the instruction.
      This breaks the noexec mode for all user space binaries compiled
      with march=z10 or later. The only option is to remove the current
      noexec support.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      043d0708
  20. 27 10月, 2010 1 次提交
  21. 25 10月, 2010 5 次提交
  22. 24 8月, 2010 1 次提交
    • M
      [S390] fix tlb flushing vs. concurrent /proc accesses · 050eef36
      Martin Schwidefsky 提交于
      The tlb flushing code uses the mm_users field of the mm_struct to
      decide if each page table entry needs to be flushed individually with
      IPTE or if a global flush for the mm_struct is sufficient after all page
      table updates have been done. The comment for mm_users says "How many
      users with user space?" but the /proc code increases mm_users after it
      found the process structure by pid without creating a new user process.
      Which makes mm_users useless for the decision between the two tlb
      flusing methods. The current code can be confused to not flush tlb
      entries by a concurrent access to /proc files if e.g. a fork is in
      progres. The solution for this problem is to make the tlb flushing
      logic independent from the mm_users field.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      050eef36
  23. 09 4月, 2010 1 次提交
  24. 21 2月, 2010 1 次提交
    • R
      MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself · 4b3073e1
      Russell King 提交于
      On VIVT ARM, when we have multiple shared mappings of the same file
      in the same MM, we need to ensure that we have coherency across all
      copies.  We do this via make_coherent() by making the pages
      uncacheable.
      
      This used to work fine, until we allowed highmem with highpte - we
      now have a page table which is mapped as required, and is not available
      for modification via update_mmu_cache().
      
      Ralf Beache suggested getting rid of the PTE value passed to
      update_mmu_cache():
      
        On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
        to construct a pointer to the pte again.  Passing a pte_t * is much
        more elegant.  Maybe we might even replace the pte argument with the
        pte_t?
      
      Ben Herrenschmidt would also like the pte pointer for PowerPC:
      
        Passing the ptep in there is exactly what I want.  I want that
        -instead- of the PTE value, because I have issue on some ppc cases,
        for I$/D$ coherency, where set_pte_at() may decide to mask out the
        _PAGE_EXEC.
      
      So, pass in the mapped page table pointer into update_mmu_cache(), and
      remove the PTE value, updating all implementations and call sites to
      suit.
      
      Includes a fix from Stephen Rothwell:
      
        sparc: fix fallout from update_mmu_cache API change
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      4b3073e1
  25. 07 12月, 2009 1 次提交
  26. 12 6月, 2009 1 次提交