1. 23 8月, 2007 1 次提交
    • A
      fix NULL pointer dereference in __vm_enough_memory() · 34b4e4aa
      Alan Cox 提交于
      The new exec code inserts an accounted vma into an mm struct which is not
      current->mm.  The existing memory check code has a hard coded assumption
      that this does not happen as does the security code.
      
      As the correct mm is known we pass the mm to the security method and the
      helper function.  A new security test is added for the case where we need
      to pass the mm and the existing one is modified to pass current->mm to
      avoid the need to change large amounts of code.
      
      (Thanks to Tobias for fixing rejects and testing)
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Cc: James Morris <jmorris@redhat.com>
      Cc: Tobias Diedrich <ranma+kernel@tdiedrich.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34b4e4aa
  2. 22 7月, 2007 1 次提交
  3. 20 7月, 2007 2 次提交
    • N
      mm: fault feedback #1 · d0217ac0
      Nick Piggin 提交于
      Change ->fault prototype.  We now return an int, which contains
      VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
       FAULT_RET_ code tells the VM whether a page was found, whether it has been
      locked, and potentially other things.  This is not quite the way he wanted
      it yet, but that's changed in the next patch (which requires changes to
      arch code).
      
      This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
      that a page is locked which requires filemap_nopage to go away (because we
      can no longer remain backward compatible without that flag), but we were
      going to do that anyway.
      
      struct fault_data is renamed to struct vm_fault as Linus asked. address
      is now a void __user * that we should firmly encourage drivers not to use
      without really good reason.
      
      The page is now returned via a page pointer in the vm_fault struct.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0217ac0
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
  4. 17 7月, 2007 1 次提交
  5. 12 7月, 2007 1 次提交
    • E
      security: Protection for exploiting null dereference using mmap · ed032189
      Eric Paris 提交于
      Add a new security check on mmap operations to see if the user is attempting
      to mmap to low area of the address space.  The amount of space protected is
      indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
      0, preserving existing behavior.
      
      This patch uses a new SELinux security class "memprotect."  Policy already
      contains a number of allow rules like a_t self:process * (unconfined_t being
      one of them) which mean that putting this check in the process class (its
      best current fit) would make it useless as all user processes, which we also
      want to protect against, would be allowed. By taking the memprotect name of
      the new class it will also make it possible for us to move some of the other
      memory protect permissions out of 'process' and into the new class next time
      we bump the policy version number (which I also think is a good future idea)
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      ed032189
  6. 09 5月, 2007 1 次提交
    • C
      move die notifier handling to common code · 1eeb66a1
      Christoph Hellwig 提交于
      This patch moves the die notifier handling to common code.  Previous
      various architectures had exactly the same code for it.  Note that the new
      code is compiled unconditionally, this should be understood as an appel to
      the other architecture maintainer to implement support for it aswell (aka
      sprinkling a notify_die or two in the proper place)
      
      arm had a notifiy_die that did something totally different, I renamed it to
      arm_notify_die as part of the patch and made it static to the file it's
      declared and used at.  avr32 used to pass slightly less information through
      this interface and I brought it into line with the other architectures.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
      [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NBryan Wu <bryan.wu@analog.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eeb66a1
  7. 13 4月, 2007 1 次提交
  8. 23 3月, 2007 2 次提交
    • D
      [PATCH] NOMMU: make SYSV SHM nattch work correctly · 165b2392
      David Howells 提交于
      Make the SYSV SHM nattch counter work correctly by forcing multiple VMAs to
      be produced to represent MAP_SHARED segments, even if they overlap exactly.
      
      Using this test program:
      
      	http://people.redhat.com/~dhowells/doshm.c
      
      Run as:
      
      	doshm sysv
      
      I can see nattch going from one before the patch:
      
      	# /doshm sysv
      	Command: sysv
      	shmid: 65536
      	memory: 0xc3700000
      	c0b00000-c0b04000 rw-p 00000000 00:00 0
      	c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3180000-c31dede4 r-xs 00000000 00:0b 14582179  /lib/libuClibc-0.9.28.so
      	c3520000-c352278c rw-p 00000000 00:0b 13763417  /doshm
      	c3584000-c35865e8 r-xs 00000000 00:0b 13763417  /doshm
      	c3588000-c358aa00 rw-p 00008000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3590000-c359b6c0 rw-p 00000000 00:00 0
      	c3620000-c3640000 rwxp 00000000 00:00 0
      	c3700000-c37fa000 rw-S 00000000 00:06 1411      /SYSV00000000 (deleted)
      	c3700000-c37fa000 rw-S 00000000 00:06 1411      /SYSV00000000 (deleted)
      	nattch 1
      
      To two after the patch:
      
      	# /doshm sysv
      	Command: sysv
      	shmid: 0
      	memory: 0xc3700000
      	c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3180000-c31dede4 r-xs 00000000 00:0b 14582179  /lib/libuClibc-0.9.28.so
      	c3320000-c3340000 rwxp 00000000 00:00 0
      	c3530000-c35325e8 r-xs 00000000 00:0b 13763417  /doshm
      	c3534000-c353678c rw-p 00000000 00:0b 13763417  /doshm
      	c3538000-c353aa00 rw-p 00008000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3590000-c359b6c0 rw-p 00000000 00:00 0
      	c35a4000-c35a8000 rw-p 00000000 00:00 0
      	c3700000-c37fa000 rw-S 00000000 00:06 1369      /SYSV00000000 (deleted)
      	c3700000-c37fa000 rw-S 00000000 00:06 1369      /SYSV00000000 (deleted)
      	nattch 2
      
      That's +1 to nattch for each shmat() made.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      165b2392
    • D
      [PATCH] NOMMU: supply get_unmapped_area() to fix NOMMU SYSV SHM · d56e03cd
      David Howells 提交于
      Supply a get_unmapped_area() to fix NOMMU SYSV SHM support.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d56e03cd
  9. 09 12月, 2006 1 次提交
  10. 08 12月, 2006 1 次提交
  11. 06 12月, 2006 1 次提交
    • M
      [PATCH] uclinux: fix mmap() of directory for nommu case · f81cff0d
      Mike Frysinger 提交于
      I was playing with blackfin when i hit a neat bug ... doing an open() on a
      directory and then passing that fd to mmap() would cause the kernel to hang
      
      after poking into the code a bit more, i found that
      mm/nommu.c:validate_mmap_request() checks the length and if it is 0, just
      returns the address ... this is in stark contrast to mmu's
      mm/mmap.c:do_mmap_pgoff() where it returns -EINVAL for 0 length requests ...
      i then noticed that some other parts of the logic is out of date between the
      two funcs, so perhaps that's the easy fix ?
      Signed-off-by: NGreg Ungerer <gerg@uclinux.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f81cff0d
  12. 04 10月, 2006 1 次提交
  13. 01 10月, 2006 1 次提交
  14. 27 9月, 2006 8 次提交
  15. 26 9月, 2006 1 次提交
  16. 15 7月, 2006 1 次提交
  17. 01 7月, 2006 1 次提交
  18. 11 4月, 2006 1 次提交
  19. 22 3月, 2006 1 次提交
  20. 01 3月, 2006 1 次提交
  21. 21 2月, 2006 1 次提交
  22. 07 1月, 2006 1 次提交
    • D
      [PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMU · b0e15190
      David Howells 提交于
      The attached patch makes the SYSV IPC shared memory facilities use the new
      ramfs facilities on a no-MMU kernel.
      
      The following changes are made:
      
       (1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to
           allow the IPC SHM facilities to commune with the tiny-shmem and shmem
           code.
      
       (2) ramfs files now need resizing using do_truncate() rather than by modifying
           the inode size directly (see shmem_file_setup()). This causes ramfs to
           attempt to bind a block of pages of sufficient size to the inode.
      
       (3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b0e15190
  23. 29 11月, 2005 1 次提交
    • L
      mm: re-architect the VM_UNPAGED logic · 6aab341e
      Linus Torvalds 提交于
      This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
      explicit support for a "remapped page range" aka VM_PFNMAP.  It allows a
      VM area to contain an arbitrary range of page table entries that the VM
      never touches, and never considers to be normal pages.
      
      Any user of "remap_pfn_range()" automatically gets this new
      functionality, and doesn't even have to mark the pages reserved or
      indeed mark them any other way.  It just works.  As a side effect, doing
      mmap() on /dev/mem works for arbitrary ranges.
      
      Sparc update from David in the next commit.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6aab341e
  24. 07 11月, 2005 1 次提交
  25. 30 10月, 2005 3 次提交
    • H
      [PATCH] mm: follow_page with inner ptlock · deceb6cd
      Hugh Dickins 提交于
      Final step in pushing down common core's page_table_lock.  follow_page no
      longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself;
      and so no page_table_lock is taken in get_user_pages itself.
      
      But get_user_pages (and get_futex_key) do then need follow_page to pin the
      page for them: take Daniel's suggestion of bitflags to follow_page.
      
      Need one for WRITE, another for TOUCH (it was the accessed flag before:
      vanished along with check_user_page_readable, but surely get_numa_maps is
      wrong to mark every page it finds as accessed), another for GET.
      
      And another, ANON to dispose of untouched_anonymous_page: it seems silly for
      that to descend a second time, let follow_page observe if there was no page
      table and return ZERO_PAGE if so.  Fix minor bug in that: check VM_LOCKED -
      make_pages_present ought to make readonly anonymous present.
      
      Give get_numa_maps a cond_resched while we're there.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      deceb6cd
    • H
      [PATCH] mm: update_hiwaters just in time · 365e9c87
      Hugh Dickins 提交于
      update_mem_hiwater has attracted various criticisms, in particular from those
      concerned with mm scalability.  Originally it was called whenever rss or
      total_vm got raised.  Then many of those callsites were replaced by a timer
      tick call from account_system_time.  Now Frank van Maarseveen reports that to
      be found inadequate.  How about this?  Works for Frank.
      
      Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
      update_hiwater_rss and update_hiwater_vm.  Don't attempt to keep
      mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
      by 1): those are hot paths.  Do the opposite, update only when about to lower
      rss (usually by many), or just before final accounting in do_exit.  Handle
      mm->hiwater_vm in the same way, though it's much less of an issue.  Demand
      that whoever collects these hiwater statistics do the work of taking the
      maximum with rss or total_vm.
      
      And there has been no collector of these hiwater statistics in the tree.  The
      new convention needs an example, so match Frank's usage by adding a VmPeak
      line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
      (High-Water-Mark or High-Water-Memory).
      
      There was a particular anomaly during mremap move, that hiwater_vm might be
      captured too high.  A fleeting such anomaly remains, but it's quickly
      corrected now, whereas before it would stick.
      
      What locking?  None: if the app is racy then these statistics will be racy,
      it's not worth any overhead to make them exact.  But whenever it suits,
      hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
      page_table_lock (for now) or with preemption disabled (later on): without
      going to any trouble, minimize the time between reading current values and
      updating, to minimize those occasions when a racing thread bumps a count up
      and back down in between.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      365e9c87
    • H
      [PATCH] mm: rss = file_rss + anon_rss · 4294621f
      Hugh Dickins 提交于
      I was lazy when we added anon_rss, and chose to change as few places as
      possible.  So currently each anonymous page has to be counted twice, in rss
      and in anon_rss.  Which won't be so good if those are atomic counts in some
      configurations.
      
      Change that around: keep file_rss and anon_rss separately, and add them
      together (with get_mm_rss macro) when the total is needed - reading two
      atomics is much cheaper than updating two atomics.  And update anon_rss
      upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4294621f
  26. 09 10月, 2005 1 次提交
  27. 12 9月, 2005 1 次提交
    • G
      [PATCH] uclinux: add NULL check, 0 end valid check and some more exports to nommu.c · 66aa2b4b
      Greg Ungerer 提交于
      Move call to get_mm_counter() in update_mem_hiwater() to be
      inside the check for tsk->mm being null. Otherwise you can be
      following a null pointer here. This patch submitted by
      Javier Herrero <jherrero@hvsistemas.es>.
      
      Modify the end check for munmap regions to allow for the
      legacy behavior of 0 being valid. Pretty much all current
      uClinux system libc malloc's pass in 0 as the end point.
      A hard check will fail on these, so change the check so
      that if it is non-zero it must be valid otherwise it fails.
      A passed in value will always succeed (as it used too).
      
      Also export a few more mm system functions - to be consistent
      with the VM code exports.
      Signed-off-by: NGreg Ungerer <gerg@uclinux.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      66aa2b4b
  28. 05 8月, 2005 1 次提交
    • S
      [PATCH] __vm_enough_memory() signedness fix · 2f60f8d3
      Simon Derr 提交于
      We have found what seems to be a small bug in __vm_enough_memory() when
      sysctl_overcommit_memory is set to OVERCOMMIT_NEVER.
      
      When this bug occurs the systems fails to boot, with /sbin/init whining
      about fork() returning ENOMEM.
      
      We hunted down the problem to this:
      
      The deferred update mecanism used in vm_acct_memory(), on a SMP system,
      allows the vm_committed_space counter to have a negative value.
      
      This should not be a problem since this counter is known to be inaccurate.
      
      But in __vm_enough_memory() this counter is compared to the `allowed'
      variable, which is an unsigned long.  This comparison is broken since it
      will consider the negative values of vm_committed_space to be huge positive
      values, resulting in a memory allocation failure.
      
      Signed-off-by: <Jean-Marc.Saffroy@ext.bull.net>
      Signed-off-by: <Simon.Derr@bull.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2f60f8d3
  29. 22 6月, 2005 1 次提交
    • W
      [PATCH] Avoiding mmap fragmentation · 1363c3cd
      Wolfgang Wander 提交于
      Ingo recently introduced a great speedup for allocating new mmaps using the
      free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
      causes huge performance increases in thread creation.
      
      The downside of this patch is that it does lead to fragmentation in the
      mmap-ed areas (visible via /proc/self/maps), such that some applications
      that work fine under 2.4 kernels quickly run out of memory on any 2.6
      kernel.
      
      The problem is twofold:
      
        1) the free_area_cache is used to continue a search for memory where
           the last search ended.  Before the change new areas were always
           searched from the base address on.
      
           So now new small areas are cluttering holes of all sizes
           throughout the whole mmap-able region whereas before small holes
           tended to close holes near the base leaving holes far from the base
           large and available for larger requests.
      
        2) the free_area_cache also is set to the location of the last
           munmap-ed area so in scenarios where we allocate e.g.  five regions of
           1K each, then free regions 4 2 3 in this order the next request for 1K
           will be placed in the position of the old region 3, whereas before we
           appended it to the still active region 1, placing it at the location
           of the old region 2.  Before we had 1 free region of 2K, now we only
           get two free regions of 1K -> fragmentation.
      
      The patch addresses thes issues by introducing yet another cache descriptor
      cached_hole_size that contains the largest known hole size below the
      current free_area_cache.  If a new request comes in the size is compared
      against the cached_hole_size and if the request can be filled with a hole
      below free_area_cache the search is started from the base instead.
      
      The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
      (earlier posted) leakme.c test program terminates after 50000+ iterations
      with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
      (as expected) with thread creation, Ingo's test_str02 with 20000 threads
      requires 0.7s system time.
      
      Taking out Ingo's patch (un-patch available per request) by basically
      deleting all mentions of free_area_cache from the kernel and starting the
      search for new memory always at the respective bases we observe: leakme
      terminates successfully with 11 distinctive hardly fragmented areas in
      /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
      time for Ingo's test_str02 with 20000 threads.
      
      Now - drumroll ;-) the appended patch works fine with leakme: it ends with
      only 7 distinct areas in /proc/self/maps and also thread creation seems
      sufficiently fast with 0.71s for 20000 threads.
      Signed-off-by: NWolfgang Wander <wwc@rentec.com>
      Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1363c3cd