1. 15 9月, 2005 1 次提交
    • H
      [PATCH] error path in setup_arg_pages() misses vm_unacct_memory() · 2fd4ef85
      Hugh Dickins 提交于
      Pavel Emelianov and Kirill Korotaev observe that fs and arch users of
      security_vm_enough_memory tend to forget to vm_unacct_memory when a
      failure occurs further down (typically in setup_arg_pages variants).
      
      These are all users of insert_vm_struct, and that reservation will only
      be unaccounted on exit if the vma is marked VM_ACCOUNT: which in some
      cases it is (hidden inside VM_STACK_FLAGS) and in some cases it isn't.
      
      So x86_64 32-bit and ppc64 vDSO ELFs have been leaking memory into
      Committed_AS each time they're run.  But don't add VM_ACCOUNT to them,
      it's inappropriate to reserve against the very unlikely case that gdb
      be used to COW a vDSO page - we ought to do something about that in
      do_wp_page, but there are yet other inconsistencies to be resolved.
      
      The safe and economical way to fix this is to let insert_vm_struct do
      the security_vm_enough_memory check when it finds VM_ACCOUNT is set.
      
      And the MIPS irix_brk has been calling security_vm_enough_memory before
      calling do_brk which repeats it, doubly accounting and so also leaking.
      Remove that, and all the fs and arch calls to security_vm_enough_memory:
      give it a less misleading name later on.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-Off-By: NKirill Korotaev <dev@sw.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2fd4ef85
  2. 08 9月, 2005 2 次提交
  3. 05 8月, 2005 1 次提交
    • S
      [PATCH] __vm_enough_memory() signedness fix · 2f60f8d3
      Simon Derr 提交于
      We have found what seems to be a small bug in __vm_enough_memory() when
      sysctl_overcommit_memory is set to OVERCOMMIT_NEVER.
      
      When this bug occurs the systems fails to boot, with /sbin/init whining
      about fork() returning ENOMEM.
      
      We hunted down the problem to this:
      
      The deferred update mecanism used in vm_acct_memory(), on a SMP system,
      allows the vm_committed_space counter to have a negative value.
      
      This should not be a problem since this counter is known to be inaccurate.
      
      But in __vm_enough_memory() this counter is compared to the `allowed'
      variable, which is an unsigned long.  This comparison is broken since it
      will consider the negative values of vm_committed_space to be huge positive
      values, resulting in a memory allocation failure.
      
      Signed-off-by: <Jean-Marc.Saffroy@ext.bull.net>
      Signed-off-by: <Simon.Derr@bull.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2f60f8d3
  4. 22 6月, 2005 2 次提交
    • C
      [PATCH] mmap topdown fix for large stack limit, large allocation · 73219d17
      Chris Wright 提交于
      The topdown changes in 2.6.12-rc1 can cause large allocations with large
      stack limit to fail, despite there being space available.  The
      mmap_base-len is only valid when len >= mmap_base.  However, nothing in
      topdown allocator checks this.  It's only (now) caught at higher level,
      which will cause allocation to simply fail.  The following change restores
      the fallback to bottom-up path, which will allow large allocations with
      large stack limit to potentially still succeed.
      Signed-off-by: NChris Wright <chrisw@osdl.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      73219d17
    • W
      [PATCH] Avoiding mmap fragmentation · 1363c3cd
      Wolfgang Wander 提交于
      Ingo recently introduced a great speedup for allocating new mmaps using the
      free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
      causes huge performance increases in thread creation.
      
      The downside of this patch is that it does lead to fragmentation in the
      mmap-ed areas (visible via /proc/self/maps), such that some applications
      that work fine under 2.4 kernels quickly run out of memory on any 2.6
      kernel.
      
      The problem is twofold:
      
        1) the free_area_cache is used to continue a search for memory where
           the last search ended.  Before the change new areas were always
           searched from the base address on.
      
           So now new small areas are cluttering holes of all sizes
           throughout the whole mmap-able region whereas before small holes
           tended to close holes near the base leaving holes far from the base
           large and available for larger requests.
      
        2) the free_area_cache also is set to the location of the last
           munmap-ed area so in scenarios where we allocate e.g.  five regions of
           1K each, then free regions 4 2 3 in this order the next request for 1K
           will be placed in the position of the old region 3, whereas before we
           appended it to the still active region 1, placing it at the location
           of the old region 2.  Before we had 1 free region of 2K, now we only
           get two free regions of 1K -> fragmentation.
      
      The patch addresses thes issues by introducing yet another cache descriptor
      cached_hole_size that contains the largest known hole size below the
      current free_area_cache.  If a new request comes in the size is compared
      against the cached_hole_size and if the request can be filled with a hole
      below free_area_cache the search is started from the base instead.
      
      The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
      (earlier posted) leakme.c test program terminates after 50000+ iterations
      with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
      (as expected) with thread creation, Ingo's test_str02 with 20000 threads
      requires 0.7s system time.
      
      Taking out Ingo's patch (un-patch available per request) by basically
      deleting all mentions of free_area_cache from the kernel and starting the
      search for new memory always at the respective bases we observe: leakme
      terminates successfully with 11 distinctive hardly fragmented areas in
      /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
      time for Ingo's test_str02 with 20000 threads.
      
      Now - drumroll ;-) the appended patch works fine with leakme: it ends with
      only 7 distinct areas in /proc/self/maps and also thread creation seems
      sufficiently fast with 0.71s for 20000 threads.
      Signed-off-by: NWolfgang Wander <wwc@rentec.com>
      Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1363c3cd
  5. 20 5月, 2005 1 次提交
    • L
      Fix get_unmapped_area sanity tests · 07ab67c8
      Linus Torvalds 提交于
      As noted by Chris Wright, we need to do the full range of tests regardless
      of whether MAP_FIXED is set or not, so re-organize get_unmapped_area()
      slightly to do the sanity checks unconditionally.
      07ab67c8
  6. 19 5月, 2005 1 次提交
    • L
      [PATCH] prevent NULL mmap in topdown model · 49a43876
      Linus Torvalds 提交于
      Prevent the topdown allocator from allocating mmap areas all the way
      down to address zero.
      
      We still allow a MAP_FIXED mapping of page 0 (needed for various things,
      ranging from Wine and DOSEMU to people who want to allow speculative
      loads off a NULL pointer).
      
      Tested by Chris Wright.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      49a43876
  7. 01 5月, 2005 2 次提交
  8. 20 4月, 2005 5 次提交
    • H
      [PATCH] freepgt: remove FIRST_USER_ADDRESS hack · 561bbe32
      Hugh Dickins 提交于
      Once all the MMU architectures define FIRST_USER_ADDRESS, remove hack from
      mmap.c which derived it from FIRST_USER_PGD_NR.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      561bbe32
    • H
      [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS · e2cdef8c
      Hugh Dickins 提交于
      The patches to free_pgtables by vma left problems on any architectures which
      leave some user address page table entries unencapsulated by vma.  Andi has
      fixed the 32-bit vDSO on x86_64 to use a vma.  Now fix arm (and arm26), whose
      first PAGE_SIZE is reserved (perhaps) for machine vectors.
      
      Our calls to free_pgtables must not touch that area, and exit_mmap's
      BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have
      allocated an extra page table, which its free_pgd_slow would free later.
      
      FIRST_USER_PGD_NR has misled me and others: until all the arches define
      FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other.  This
      patch fixes the bugs, the remaining patches just clean it up.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e2cdef8c
    • H
      [PATCH] freepgt: mpnt to vma cleanup · 146425a3
      Hugh Dickins 提交于
      While dabbling here in mmap.c, clean up mysterious "mpnt"s to "vma"s.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      146425a3
    • H
      [PATCH] freepgt: remove MM_VM_SIZE(mm) · ee39b37b
      Hugh Dickins 提交于
      There's only one usage of MM_VM_SIZE(mm) left, and it's a troublesome macro
      because mm doesn't contain the (32-bit emulation?) info needed.  But it too is
      only needed because we ignore the end from the vma list.
      
      We could make flush_pgtables return that end, or unmap_vmas.  Choose the
      latter, since it's a natural fit with unmap_mapping_range_vma needing to know
      its restart addr.  This does make more than minimal change, but if unmap_vmas
      had returned the end before, this is how we'd have done it, rather than
      storing the break_addr in zap_details.
      
      unmap_vmas used to return count of vmas scanned, but that's just debug which
      hasn't been useful in a while; and if we want the map_count 0 on exit check
      back, it can easily come from the final remove_vm_struct loop.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee39b37b
    • H
      [PATCH] freepgt: free_pgtables use vma list · e0da382c
      Hugh Dickins 提交于
      Recent woes with some arches needing their own pgd_addr_end macro; and 4-level
      clear_page_range regression since 2.6.10's clear_page_tables; and its
      long-standing well-known inefficiency in searching throughout the higher-level
      page tables for those few entries to clear and free: all can be blamed on
      ignoring the list of vmas when we free page tables.
      
      Replace exit_mmap's clear_page_range of the total user address space by
      free_pgtables operating on the mm's vma list; unmap_region use it in the same
      way, giving floor and ceiling beyond which it may not free tables.  This
      brings lmbench fork/exec/sh numbers back to 2.6.10 (unless preempt is enabled,
      in which case latency fixes spoil unmap_vmas throughput).
      
      Beware: the do_mmap_pgoff driver failure case must now use unmap_region
      instead of zap_page_range, since a page table might have been allocated, and
      can only be freed while it is touched by some vma.
      
      Move free_pgtables from mmap.c to memory.c, where its lower levels are adapted
      from the clear_page_range levels.  (Most of free_pgtables' old code was
      actually for a non-existent case, prev not properly set up, dating from before
      hch gave us split_vma.) Pass mmu_gather** in the public interfaces, since we
      might want to add latency lockdrops later; but no attempt to do so yet, going
      by vma should itself reduce latency.
      
      But what if is_hugepage_only_range?  Those ia64 and ppc64 cases need careful
      examination: put that off until a later patch of the series.
      
      What of x86_64's 32bit vdso page __map_syscall32 maps outside any vma?
      
      And the range to sparc64's flush_tlb_pgtables?  It's less clear to me now that
      we need to do more than is done here - every PMD_SIZE ever occupied will be
      flushed, do we really have to flush every PGDIR_SIZE ever partially occupied? 
      A shame to complicate it unnecessarily.
      
      Special thanks to David Miller for time spent repairing my ceilings.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e0da382c
  9. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4