1. 14 1月, 2011 1 次提交
    • M
      mlock: do not hold mmap_sem for extended periods of time · 53a7706d
      Michel Lespinasse 提交于
      __get_user_pages gets a new 'nonblocking' parameter to signal that the
      caller is prepared to re-acquire mmap_sem and retry the operation if
      needed.  This is used to split off long operations if they are going to
      block on a disk transfer, or when we detect contention on the mmap_sem.
      
      [akpm@linux-foundation.org: remove ref to rwsem_is_contended()]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53a7706d
  2. 24 12月, 2010 2 次提交
  3. 25 11月, 2010 1 次提交
  4. 30 10月, 2010 1 次提交
    • A
      audit mmap · 120a795d
      Al Viro 提交于
      Normal syscall audit doesn't catch 5th argument of syscall.  It also
      doesn't catch the contents of userland structures pointed to be
      syscall argument, so for both old and new mmap(2) ABI it doesn't
      record the descriptor we are mapping.  For old one it also misses
      flags.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      120a795d
  5. 27 10月, 2010 1 次提交
  6. 21 8月, 2010 1 次提交
  7. 14 8月, 2010 1 次提交
  8. 26 5月, 2010 1 次提交
  9. 26 3月, 2010 2 次提交
  10. 25 3月, 2010 1 次提交
  11. 13 3月, 2010 1 次提交
  12. 07 3月, 2010 2 次提交
    • S
      nommu: get_user_pages(): pin last page on non-page-aligned start · c08c6e1f
      Steven J. Magnani 提交于
      The noMMU version of get_user_pages() fails to pin the last page when the
      start address isn't page-aligned.  The patch fixes this in a way that
      makes find_extend_vma() congruent to its MMU cousin.
      Signed-off-by: NSteven J. Magnani <steve@digidescorp.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c08c6e1f
    • R
      mm: change anon_vma linking to fix multi-process server scalability issue · 5beb4930
      Rik van Riel 提交于
      The old anon_vma code can lead to scalability issues with heavily forking
      workloads.  Specifically, each anon_vma will be shared between the parent
      process and all its child processes.
      
      In a workload with 1000 child processes and a VMA with 1000 anonymous
      pages per process that get COWed, this leads to a system with a million
      anonymous pages in the same anon_vma, each of which is mapped in just one
      of the 1000 processes.  However, the current rmap code needs to walk them
      all, leading to O(N) scanning complexity for each page.
      
      This can result in systems where one CPU is walking the page tables of
      1000 processes in page_referenced_one, while all other CPUs are stuck on
      the anon_vma lock.  This leads to catastrophic failure for a benchmark
      like AIM7, where the total number of processes can reach in the tens of
      thousands.  Real workloads are still a factor 10 less process intensive
      than AIM7, but they are catching up.
      
      This patch changes the way anon_vmas and VMAs are linked, which allows us
      to associate multiple anon_vmas with a VMA.  At fork time, each child
      process gets its own anon_vmas, in which its COWed pages will be
      instantiated.  The parents' anon_vma is also linked to the VMA, because
      non-COWed pages could be present in any of the children.
      
      This reduces rmap scanning complexity to O(1) for the pages of the 1000
      child processes, with O(N) complexity for at most 1/N pages in the system.
       This reduces the average scanning cost in heavily forking workloads from
      O(N) to 2.
      
      The only real complexity in this patch stems from the fact that linking a
      VMA to anon_vmas now involves memory allocations.  This means vma_adjust
      can fail, if it needs to attach a VMA to anon_vma structures.  This in
      turn means error handling needs to be added to the calling functions.
      
      A second source of complexity is that, because there can be multiple
      anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
      "the" anon_vma lock.  To prevent the rmap code from walking up an
      incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag.  This bit
      flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
      to make sure it is impossible to compile a kernel that needs both symbolic
      values for the same bitflag.
      
      Some test results:
      
      Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
      box with 16GB RAM and not quite enough IO), the system ends up running
      >99% in system time, with every CPU on the same anon_vma lock in the
      pageout code.
      
      With these changes, AIM7 hits the cross-over point around 29.7k users.
      This happens with ~99% IO wait time, there never seems to be any spike in
      system time.  The anon_vma lock contention appears to be resolved.
      
      [akpm@linux-foundation.org: cleanups]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5beb4930
  13. 17 1月, 2010 4 次提交
  14. 07 1月, 2010 2 次提交
    • J
      NOMMU: Use copy_*_user_page() in access_process_vm() · 7959722b
      Jie Zhang 提交于
      The MMU code uses the copy_*_user_page() variants in access_process_vm()
      rather than copy_*_user() as the former includes an icache flush.  This
      is important when doing things like setting software breakpoints with
      gdb.  So switch the NOMMU code over to do the same.
      
      This patch makes the reasonable assumption that copy_from_user_page()
      won't fail - which is probably fine, as we've checked the VMA from which
      we're copying is usable, and the copy is not allowed to cross VMAs.  The
      one case where it might go wrong is if the VMA is a device rather than
      RAM, and that device returns an error which - in which case rubbish will
      be returned rather than EIO.
      Signed-off-by: NJie Zhang <jie.zhang@analog.com>
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NDavid McCullough <david_mccullough@mcafee.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NGreg Ungerer <gerg@uclinux.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7959722b
    • M
      NOMMU: Avoiding duplicate icache flushes of shared maps · cfe79c00
      Mike Frysinger 提交于
      When working with FDPIC, there are many shared mappings of read-only
      code regions between applications (the C library, applet packages like
      busybox, etc.), but the current do_mmap_pgoff() function will issue an
      icache flush whenever a VMA is added to an MM instead of only doing it
      when the map is initially created.
      
      The flush can instead be done when a region is first mmapped PROT_EXEC.
      Note that we may not rely on the first mapping of a region being
      executable - it's possible for it to be PROT_READ only, so we have to
      remember whether we've flushed the region or not, and then flush the
      entire region when a bit of it is made executable.
      
      However, this also affects the brk area.  That will no longer be
      executable.  We can mprotect() it to PROT_EXEC on MPU-mode kernels, but
      for NOMMU mode kernels, when it increases the brk allocation, making
      sys_brk() flush the extra from the icache should suffice.  The brk area
      probably isn't used by NOMMU programs since the brk area can only use up
      the leavings from the stack allocation, where the stack allocation is
      larger than requested.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfe79c00
  15. 31 12月, 2009 1 次提交
  16. 16 12月, 2009 1 次提交
  17. 01 11月, 2009 1 次提交
  18. 28 9月, 2009 1 次提交
  19. 25 9月, 2009 2 次提交
    • D
      NOMMU: Ignore mmap() address param as it is a hint · 06aab5a3
      David Howells 提交于
      Ignore the address parameter given to NOMMU mmap() as it is a hint, rather
      than giving an error if it's non-zero.  MAP_FIXED still gets an error.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06aab5a3
    • D
      NOMMU: Fix MAP_PRIVATE mmap() of objects where the data can be mapped directly · 645d83c5
      David Howells 提交于
      Fix MAP_PRIVATE mmap() of files and devices where the data in the backing store
      might be mapped directly.  Use the BDI_CAP_MAP_DIRECT capability flag to govern
      whether or not we should be trying to map a file directly.  This can be used to
      determine whether or not a region has been filled in at the point where we call
      do_mmap_shared() or do_mmap_private().
      
      The BDI_CAP_MAP_DIRECT capability flag is cleared by validate_mmap_request() if
      there's any reason we can't use it.  It's also cleared in do_mmap_pgoff() if
      f_op->get_unmapped_area() fails.
      
      Without this fix, attempting to run a program from a RomFS image on a
      non-mappable MTD partition results in a BUG as the kernel attempts XIP, and
      this can be caught in gdb:
      
      Program received signal SIGABRT, Aborted.
      0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547
      (gdb) bt
      #0  0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547
      #1  0xc005f168 in do_mmap_pgoff (file=0xc31a6620, addr=<value optimized out>, len=3808, prot=3, flags=6146, pgoff=0) at mm/nommu.c:1373
      #2  0xc00a96b8 in elf_fdpic_map_file (params=0xc33fbbec, file=0xc31a6620, mm=0xc31bef60, what=0xc0213144 "executable") at mm.h:1145
      #3  0xc00aa8b4 in load_elf_fdpic_binary (bprm=0xc316cb00, regs=<value optimized out>) at fs/binfmt_elf_fdpic.c:343
      #4  0xc006b588 in search_binary_handler (bprm=0x6, regs=0xc33fbce0) at fs/exec.c:1234
      #5  0xc006c648 in do_execve (filename=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460, regs=0xc33fbce0) at fs/exec.c:1356
      #6  0xc0008cf0 in sys_execve (name=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460) at arch/frv/kernel/process.c:263
      #7  0xc00075dc in __syscall_call () at arch/frv/kernel/entry.S:897
      
      Note that this fix does the following commit differently:
      
      	commit a190887b
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Sat Sep 5 11:17:07 2009 -0700
      	nommu: fix error handling in do_mmap_pgoff()
      Reported-by: NGraff Yang <graff.yang@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      645d83c5
  20. 24 9月, 2009 2 次提交
  21. 22 9月, 2009 4 次提交
  22. 06 9月, 2009 1 次提交
  23. 19 8月, 2009 1 次提交
  24. 17 8月, 2009 1 次提交
    • E
      Security/SELinux: seperate lsm specific mmap_min_addr · 788084ab
      Eric Paris 提交于
      Currently SELinux enforcement of controls on the ability to map low memory
      is determined by the mmap_min_addr tunable.  This patch causes SELinux to
      ignore the tunable and instead use a seperate Kconfig option specific to how
      much space the LSM should protect.
      
      The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
      permissions will always protect the amount of low memory designated by
      CONFIG_LSM_MMAP_MIN_ADDR.
      
      This allows users who need to disable the mmap_min_addr controls (usual reason
      being they run WINE as a non-root user) to do so and still have SELinux
      controls preventing confined domains (like a web server) from being able to
      map some area of low memory.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      788084ab
  25. 06 8月, 2009 1 次提交
    • E
      Security/SELinux: seperate lsm specific mmap_min_addr · a2551df7
      Eric Paris 提交于
      Currently SELinux enforcement of controls on the ability to map low memory
      is determined by the mmap_min_addr tunable.  This patch causes SELinux to
      ignore the tunable and instead use a seperate Kconfig option specific to how
      much space the LSM should protect.
      
      The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
      permissions will always protect the amount of low memory designated by
      CONFIG_LSM_MMAP_MIN_ADDR.
      
      This allows users who need to disable the mmap_min_addr controls (usual reason
      being they run WINE as a non-root user) to do so and still have SELinux
      controls preventing confined domains (like a web server) from being able to
      map some area of low memory.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      a2551df7
  26. 26 6月, 2009 2 次提交
  27. 10 6月, 2009 1 次提交