1. 09 10月, 2012 1 次提交
    • G
      thp, s390: thp splitting backend for s390 · 75077afb
      Gerald Schaefer 提交于
      This patch is part of the architecture backend for thp on s390.  It
      provides the functions related to thp splitting, including serialization
      against gup.  Unlike other archs, pmdp_splitting_flush() cannot use a tlb
      flushing operation to serialize against gup on s390, because that wouldn't
      be stopped by the disabled IRQs.  So instead, smp_call_function() is
      called with an empty function, which will have the expected effect.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75077afb
  2. 26 9月, 2012 2 次提交
  3. 30 7月, 2012 1 次提交
  4. 26 7月, 2012 1 次提交
  5. 20 7月, 2012 1 次提交
    • H
      s390/comments: unify copyright messages and remove file names · a53c8fab
      Heiko Carstens 提交于
      Remove the file name from the comment at top of many files. In most
      cases the file name was wrong anyway, so it's rather pointless.
      
      Also unify the IBM copyright statement. We did have a lot of sightly
      different statements and wanted to change them one after another
      whenever a file gets touched. However that never happened. Instead
      people start to take the old/"wrong" statements to use as a template
      for new files.
      So unify all of them in one go.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a53c8fab
  6. 16 5月, 2012 1 次提交
  7. 11 4月, 2012 1 次提交
    • M
      [S390] fix tlb flushing for page table pages · cd94154c
      Martin Schwidefsky 提交于
      Git commit 36409f63 "use generic RCU
      page-table freeing code" introduced a tlb flushing bug. Partially revert
      the above git commit and go back to s390 specific page table flush code.
      
      For s390 the TLB can contain three types of entries, "normal" TLB
      page-table entries, TLB combined region-and-segment-table (CRST) entries
      and real-space entries. Linux does not use real-space entries which
      leaves normal TLB entries and CRST entries. The CRST entries are
      intermediate steps in the page-table translation called translation paths.
      For example a 4K page access in a three-level page table setup will
      create two CRST TLB entries and one page-table TLB entry. The advantage
      of that approach is that a page access next to the previous one can reuse
      the CRST entries and needs just a single read from memory to create the
      page-table TLB entry. The disadvantage is that the TLB flushing rules are
      more complicated, before any page-table may be freed the TLB needs to be
      flushed.
      
      In short: the generic RCU page-table freeing code is incorrect for the
      CRST entries, in particular the check for mm_users < 2 is troublesome.
      
      This is applicable to 3.0+ kernels.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      cd94154c
  8. 29 3月, 2012 1 次提交
  9. 17 2月, 2012 1 次提交
  10. 27 12月, 2011 2 次提交
    • M
      [S390] add support for physical memory > 4TB · 14045ebf
      Martin Schwidefsky 提交于
      The kernel address space of a 64 bit kernel currently uses a three level
      page table and the vmemmap array has a fixed address and a fixed maximum
      size. A three level page table is good enough for systems with less than
      3.8TB of memory, for bigger systems four page table levels need to be
      used. Each page table level costs a bit of performance, use 3 levels for
      normal systems and 4 levels only for the really big systems.
      To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
      maximum of 64TB of memory.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      14045ebf
    • C
      [S390] kvm: fix sleeping function ... at mm/page_alloc.c:2260 · c86cce2a
      Christian Borntraeger 提交于
      commit cc772456
          [S390] fix list corruption in gmap reverse mapping
      
      added a potential dead lock:
      
      BUG: sleeping function called from invalid context at mm/page_alloc.c:2260
      in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39
      3 locks held by qemu-system-s39/1108:
       #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
       #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
       #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298
      CPU: 0 Not tainted 3.1.3 #45
      Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0)
      00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000
             00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96
             0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000
             000000000000000d 000000000000000c 00000004fd597988 0000000000000000
             0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960
      Call Trace:
      ([<0000000000100926>] show_trace+0xee/0x144)
       [<0000000000131f3a>] __might_sleep+0x12a/0x158
       [<0000000000217fb4>] __alloc_pages_nodemask+0x224/0xadc
       [<0000000000123086>] gmap_alloc_table+0x46/0x114
       [<000000000012395c>] gmap_map_segment+0x268/0x298
       [<000003e00486b014>] kvm_arch_commit_memory_region+0x44/0x6c [kvm]
       [<000003e004866414>] __kvm_set_memory_region+0x3b0/0x4a4 [kvm]
       [<000003e004866554>] kvm_set_memory_region+0x4c/0x6c [kvm]
       [<000003e004867c7a>] kvm_vm_ioctl+0x14a/0x314 [kvm]
       [<0000000000292100>] do_vfs_ioctl+0x94/0x588
       [<0000000000292688>] SyS_ioctl+0x94/0xac
       [<000000000061e124>] sysc_noemu+0x22/0x28
       [<000003fffcd5e7ca>] 0x3fffcd5e7ca
      3 locks held by qemu-system-s39/1108:
       #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
       #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
       #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298
      
      Fix this by freeing the lock on the alloc path. This is ok, since the
      gmap table is never freed until we call gmap_free, so the table we are
      walking cannot go.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c86cce2a
  11. 30 10月, 2011 5 次提交
  12. 26 9月, 2011 1 次提交
  13. 20 9月, 2011 1 次提交
  14. 03 8月, 2011 1 次提交
  15. 24 7月, 2011 1 次提交
    • M
      [S390] kvm guest address space mapping · e5992f2e
      Martin Schwidefsky 提交于
      Add code that allows KVM to control the virtual memory layout that
      is seen by a guest. The guest address space uses a second page table
      that shares the last level pte-tables with the process page table.
      If a page is unmapped from the process page table it is automatically
      unmapped from the guest page table as well.
      
      The guest address space mapping starts out empty, KVM can map any
      individual 1MB segments from the process virtual memory to any 1MB
      aligned location in the guest virtual memory. If a target segment in
      the process virtual memory does not exist or is unmapped while a
      guest mapping exists the desired target address is stored as an
      invalid segment table entry in the guest page table.
      The population of the guest page table is fault driven.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e5992f2e
  16. 06 6月, 2011 1 次提交
    • M
      [S390] use generic RCU page-table freeing code · 36409f63
      Martin Schwidefsky 提交于
      Replace the s390 specific rcu page-table freeing code with the
      generic variant. This requires to duplicate the definition for the
      struct mmu_table_batch as s390 does not use the generic tlb flush
      code.
      
      While we are at it remove the restriction that page table fragments
      can not be reused after a single fragment has been freed with rcu
      and split out allocation and freeing of page tables with pgstes.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      36409f63
  17. 29 5月, 2011 1 次提交
    • H
      [S390] mm: fix mmu_gather rework · 3c5cffb6
      Heiko Carstens 提交于
      Quite a few functions that get called from the tlb gather code require that
      preemption must be disabled. So disable preemption inside of the called
      functions instead.
      The only drawback is that rcu_table_freelist_finish() doesn't get necessarily
      called on the cpu(s) that filled the free lists. So we may see a delay, until
      we finally see an rcu callback. However over time this shouldn't matter.
      
      So we get rid of lots of "BUG: using smp_processor_id() in preemptible"
      messages.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      3c5cffb6
  18. 25 5月, 2011 1 次提交
  19. 23 5月, 2011 1 次提交
    • M
      [S390] Remove data execution protection · 043d0708
      Martin Schwidefsky 提交于
      The noexec support on s390 does not rely on a bit in the page table
      entry but utilizes the secondary space mode to distinguish between
      memory accesses for instructions vs. data. The noexec code relies
      on the assumption that the cpu will always use the secondary space
      page table for data accesses while it is running in the secondary
      space mode. Up to the z9-109 class machines this has been the case.
      Unfortunately this is not true anymore with z10 and later machines.
      The load-relative-long instructions lrl, lgrl and lgfrl access the
      memory operand using the same addressing-space mode that has been
      used to fetch the instruction.
      This breaks the noexec mode for all user space binaries compiled
      with march=z10 or later. The only option is to remove the current
      noexec support.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      043d0708
  20. 31 1月, 2011 1 次提交
  21. 25 10月, 2010 2 次提交
  22. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  23. 07 12月, 2009 1 次提交
    • M
      [S390] Improve address space mode selection. · b11b5334
      Martin Schwidefsky 提交于
      Introduce user_mode to replace the two variables switch_amode and
      s390_noexec. There are three valid combinations of the old values:
        1) switch_amode == 0 && s390_noexec == 0
        2) switch_amode == 1 && s390_noexec == 0
        3) switch_amode == 1 && s390_noexec == 1
      They get replaced by
        1) user_mode == HOME_SPACE_MODE
        2) user_mode == PRIMARY_SPACE_MODE
        3) user_mode == SECONDARY_SPACE_MODE
      The new kernel parameter user_mode=[primary,secondary,home] lets
      you choose the address space mode the user space processes should
      use. In addition the CONFIG_S390_SWITCH_AMODE config option
      is removed.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b11b5334
  24. 06 10月, 2009 1 次提交
  25. 23 9月, 2009 1 次提交
  26. 11 9月, 2009 1 次提交
    • M
      [S390] fix recursive locking on page_table_lock · 50aa98ba
      Martin Schwidefsky 提交于
      Suzuki Poulose reported the following recursive locking bug on s390:
      
      Here is the stack trace : (see Appendix I for more info)
      
        [<0000000000406ed6>] _spin_lock+0x52/0x94
        [<0000000000103bde>] crst_table_free+0x14e/0x1a4
        [<00000000001ba684>] __pmd_alloc+0x114/0x1ec
        [<00000000001be8d0>] handle_mm_fault+0x2cc/0xb80
        [<0000000000407d62>] do_dat_exception+0x2b6/0x3a0
        [<0000000000114f8c>] sysc_return+0x0/0x8
        [<00000200001642b2>] 0x200001642b2
      
      The page_table_lock is already acquired in __pmd_alloc (mm/memory.c) and
      it tries to populate the pud/pgd with a new pmd allocated. If another
      thread populates it before we get a chance, we free the pmd using
      pmd_free().
      
      On s390x, pmd_free(even pud_free ) is #defined to crst_table_free(),
      which acquires the page_table_lock to protect the crst_table index updates.
      
      Hence this ends up in a recursive locking of the page_table_lock.
      
      The solution suggested by Dave Hansen is to use a new spin lock in the mmu
      context to protect the access to the crst_list and the pgtable_list.
      Reported-by: NSuzuki Poulose <suzuki@in.ibm.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      50aa98ba
  27. 16 6月, 2009 1 次提交
  28. 12 6月, 2009 1 次提交
  29. 26 3月, 2009 2 次提交
  30. 18 3月, 2009 1 次提交
    • M
      [S390] make page table walking more robust · f481bfaf
      Martin Schwidefsky 提交于
      Make page table walking on s390 more robust. The current code requires
      that the pgd/pud/pmd/pte loop is only done for address ranges that are
      below the end address of the last vma of the address space. But this
      is not always true, e.g. the generic page table walker does not guarantee
      this. Change TASK_SIZE/TASK_SIZE_OF to reflect the current size of the
      address space. This makes the generic page table walker happy but it
      breaks the upgrade of a 3 level page table to a 4 level page table.
      To make the upgrade work again another fix is required.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f481bfaf
  31. 29 12月, 2008 1 次提交
    • J
      aio: make the lookup_ioctx() lockless · abf137dd
      Jens Axboe 提交于
      The mm->ioctx_list is currently protected by a reader-writer lock,
      so we always grab that lock on the read side for doing ioctx
      lookups. As the workload is extremely reader biased, turn this into
      an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
      the rwlock and use a spinlock for providing update side exclusion.
      
      There's usually only 1 entry on this list, so it doesn't make sense
      to look into fancier data structures.
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      abf137dd
  32. 28 10月, 2008 1 次提交
    • C
      [S390] pgtables: Fix race in enable_sie vs. page table ops · 250cf776
      Christian Borntraeger 提交于
      The current enable_sie code sets the mm->context.pgstes bit to tell
      dup_mm that the new mm should have extended page tables. This bit is also
      used by the s390 specific page table primitives to decide about the page
      table layout - which means context.pgstes has two meanings. This can cause
      any kind of bugs. For example  - e.g. shrink_zone can call
      ptep_clear_flush_young while enable_sie is running. ptep_clear_flush_young
      will test for context.pgstes. Since enable_sie changed that value of the old
      struct mm without changing the page table layout ptep_clear_flush_young will
      do the wrong thing.
      The solution is to split pgstes into two bits
      - one for the allocation
      - one for the current state
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      250cf776