1. 30 10月, 2005 5 次提交
    • A
      [PATCH] Convert mempolicies to nodemask_t · dfcd3c0d
      Andi Kleen 提交于
      The NUMA policy code predated nodemask_t so it used open coded bitmaps.
      Convert everything to nodemask_t.  Big patch, but shouldn't have any actual
      behaviour changes (except I removed one unnecessary check against
      node_online_map and one unnecessary BUG_ON)
      Signed-off-by: N"Andi Kleen" <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dfcd3c0d
    • S
      [PATCH] mm: set per-cpu-pages lower threshold to zero · e46a5e28
      Seth, Rohit 提交于
      Set the low water mark for hot pages in pcp to zero.
      
      (akpm: for the life of me I cannot remember why we created pcp->low.  Neither
      can Martin and the changelog is silent.  Maybe it was just a brainfart, but I
      have this feeling that there was a reason.  If not, we should remove the
      fields completely.  We'll see.)
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e46a5e28
    • S
      [PATCH] mm: page_alloc: increase size of per-cpu-pages · ba56e91c
      Seth, Rohit 提交于
      Increase the page allocator's per-cpu magazines from 1/4MB to 1/2MB.
      
      Over 100+ runs for a workload, the difference in mean is about 2%.  The best
      results for both are almost same.  Though the max variation in results with
      1/2MB is only 2.2%, whereas with 1/4MB it is 12%.
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ba56e91c
    • R
      [PATCH] swaptoken tuning · fcdae29a
      Rik Van Riel 提交于
      It turns out that the original swap token implementation, by Song Jiang, only
      enforced the swap token while the task holding the token is handling a page
      fault.  This patch approximates that, without adding an additional flag to the
      mm_struct, by checking whether the mm->mmap_sem is held for reading, like the
      page fault code does.
      
      This patch has the effect of automatically, and gradually, disabling the
      enforcement of the swap token when there is little or no paging going on, and
      "turning up" the intensity of the swap token code the more the task holding
      the token is thrashing.
      
      Thanks to Song Jiang for pointing out this aspect of the token based thrashing
      control concept.
      
      The new code shows a slight degradation over the old swap token code, but
      still a big win over running without the swap token.
      
      2.6.12+ swap token disabled
      
      $ for i in `seq 10` ; do /usr/bin/time ./qsbench -n 30000000 -p 3 ; done
      101.74user 23.13system 8:26.91elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (38597major+430315minor)pagefaults 0swaps
      101.98user 24.91system 8:03.06elapsed 26%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (33939major+430457minor)pagefaults 0swaps
      101.93user 22.12system 7:34.90elapsed 27%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (33166major+421267minor)pagefaults 0swaps
      101.82user 22.38system 8:31.40elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (39338major+433262minor)pagefaults 0swaps
      
      2.6.12+ swap token enabled, timeout 300 seconds
      
      $ for i in `seq 4` ; do /usr/bin/time ./qsbench -n 30000000 -p 3 ; done
      102.58user 16.08system 3:41.44elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (19707major+285786minor)pagefaults 0swaps
      102.07user 19.56system 4:00.64elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (19012major+299259minor)pagefaults 0swaps
      102.64user 18.25system 4:07.31elapsed 48%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (21990major+304831minor)pagefaults 0swaps
      101.39user 19.41system 5:15.81elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (24850major+323321minor)pagefaults 0swaps
      
      2.6.12+ with new swap token code, timeout 300 seconds
      
      $ for i in `seq 4` ; do /usr/bin/time ./qsbench -n 30000000 -p 3 ; done
      101.87user 24.66system 5:53.20elapsed 35%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (26848major+363497minor)pagefaults 0swaps
      102.83user 19.95system 4:17.25elapsed 47%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (19946major+305722minor)pagefaults 0swaps
      102.09user 19.46system 5:12.57elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (25461major+334994minor)pagefaults 0swaps
      101.67user 20.61system 4:52.97elapsed 41%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (22190major+329508minor)pagefaults 0swaps
      Signed-off-by: NRik Van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fcdae29a
    • C
      [PATCH] vmalloc_node · 930fc45a
      Christoph Lameter 提交于
      This patch adds
      
      vmalloc_node(size, node)	-> Allocate necessary memory on the specified node
      
      and
      
      get_vm_area_node(size, flags, node)
      
      and the other functions that it depends on.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      930fc45a
  2. 28 10月, 2005 3 次提交
  3. 27 10月, 2005 1 次提交
  4. 21 10月, 2005 1 次提交
  5. 20 10月, 2005 3 次提交
    • Y
      [PATCH] swiotlb: make sure initial DMA allocations really are in DMA memory · 281dd25c
      Yasunori Goto 提交于
      This introduces a limit parameter to the core bootmem allocator; The new
      parameter indicates that physical memory allocated by the bootmem
      allocator should be within the requested limit.
      
      We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
      alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
      is the only api used for swiotlb.
      
      The existing alloc_bootmem_low_pages() api could instead have been
      changed and made to pass right limit to the core allocator.  But that
      would make the patch more intrusive for 2.6.14, as other arches use
      alloc_bootmem_low_pages().  We may be done that post 2.6.14 as a
      cleanup.
      
      With this, swiotlb gets memory within 4G for both x86_64 and ia64
      arches.
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      281dd25c
    • H
      [PATCH] mm: hugetlb truncation fixes · 1c59827d
      Hugh Dickins 提交于
      hugetlbfs allows truncation of its files (should it?), but hugetlb.c often
      forgets that: crashes and misaccounting ensue.
      
      copy_hugetlb_page_range better grab the src page_table_lock since we don't
      want to guess what happens if concurrently truncated.  unmap_hugepage_range
      rss accounting must not assume the full range was mapped.  follow_hugetlb_page
      must guard with page_table_lock and be prepared to exit early.
      
      Restyle copy_hugetlb_page_range with a for loop like the others there.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1c59827d
    • S
      [PATCH] Handle spurious page fault for hugetlb region · 3359b54c
      Seth, Rohit 提交于
      The hugetlb pages are currently pre-faulted.  At the time of mmap of
      hugepages, we populate the new PTEs.  It is possible that HW has already
      cached some of the unused PTEs internally.  These stale entries never
      get a chance to be purged in existing control flow.
      
      This patch extends the check in page fault code for hugepages.  Check if
      a faulted address falls with in size for the hugetlb file backing it.
      We return VM_FAULT_MINOR for these cases (assuming that the arch
      specific page-faulting code purges the stale entry for the archs that
      need it).
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      
      [ This is apparently arguably an ia64 port bug. But the code won't
        hurt, and for now it fixes a real problem on some ia64 machines ]
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3359b54c
  6. 17 10月, 2005 1 次提交
    • L
      Fix memory ordering bug in page reclaim · 3d80636a
      Linus Torvalds 提交于
      As noticed by Nick Piggin, we need to make sure that we check the page
      count before we check for PageDirty, since the dirty check is only valid
      if the count implies that we're the only possible ones holding the page.
      
      We always did do this, but the code needs a read-memory-barrier to make
      sure that the orderign is also honored by the CPU.
      
      (The writer side is ordered due to the atomic decrement and test on the
      page count, see the discussion on linux-kernel)
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3d80636a
  7. 12 10月, 2005 2 次提交
  8. 09 10月, 2005 1 次提交
  9. 01 10月, 2005 1 次提交
  10. 28 9月, 2005 2 次提交
  11. 24 9月, 2005 1 次提交
  12. 23 9月, 2005 4 次提交
    • R
      [PATCH] Fix bd_claim() error code. · f7b3a435
      Rob Landley 提交于
      Problem: In some circumstances, bd_claim() is returning the wrong error
      code.
      
      If we try to swapon an unused block device that isn't swap formatted, we
      get -EINVAL.  But if that same block device is already mounted, we instead
      get -EBUSY, even though it still isn't a valid swap device.
      
      This issue came up on the busybox list trying to get the error message
      from "swapon -a" right.  If a swap device is already enabled, we get -EBUSY,
      and we shouldn't report this as an error.  But we can't distinguish the two
      -EBUSY conditions, which are very different errors.
      
      In the code, bd_claim() returns either 0 or -EBUSY, but in this case busy
      means "somebody other than sys_swapon has already claimed this", and
      _that_ means this block device can't be a valid swap device.  So return
      -EINVAL there.
      Signed-off-by: NRob Landley <rob@landley.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b3a435
    • C
      [PATCH] __kmalloc: Generate BUG if size requested is too large. · eafb4270
      Christoph Lameter 提交于
      I had an issue on ia64 where I got a bug in kernel/workqueue because
      kzalloc returned a NULL pointer due to the task structure getting too big
      for the slab allocator.  Usually these cases are caught by the kmalloc
      macro in include/linux/slab.h.
      
      Compilation will fail if a too big value is passed to kmalloc.
      
      However, kzalloc uses __kmalloc which has no check for that.  This patch
      makes __kmalloc bug if a too large entity is requested.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      eafb4270
    • C
      [PATCH] slab: fix handling of pages from foreign NUMA nodes · ff69416e
      Christoph Lameter 提交于
      The numa slab allocator may allocate pages from foreign nodes onto the
      lists for a particular node if a node runs out of memory.  Inspecting the
      slab->nodeid field will not reflect that the page is now in use for the
      slabs of another node.
      
      This patch fixes that issue by adding a node field to free_block so that
      the caller can indicate which node currently uses a slab.
      
      Also removes the check for the current node from kmalloc_cache_node since
      the process may shift later to another node which may lead to an allocation
      on another node than intended.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ff69416e
    • I
      [PATCH] slab: alpha inlining fix · 7243cc05
      Ivan Kokshaysky 提交于
      It is essential that index_of() be inlined.  But alpha undoes the gcc
      inlining hackery and index_of() ends up out-of-line.  So fiddle with things
      to make that function inline again.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7243cc05
  13. 22 9月, 2005 2 次提交
  14. 18 9月, 2005 1 次提交
  15. 15 9月, 2005 2 次提交
    • A
      [PATCH] Fix slab BUG_ON() triggered by change in array cache size · c7e43c78
      Alok Kataria 提交于
      With the new changes that we made in the initialization of the slab
      allocator, we first setup the cache from which array caches are allocated,
      and then the cache, from which kmem_list3's are allocated.
      
      Now if the array cache comes from a cache in which objsize > 32, (in this
      instance size-64) then, first size-64 cache will be allocated and then the
      size-128 (if this is the cache from which kmem_list3's are going to be
      allocated).
      
      So with these new changes, we are not guaranteed that we will be
      initializing the malloc_sizes array in a serialized order. Thus there is
      a bug in __find_general_cachep, as we are checking whether the first
      cache_sizes ptr is NULL.
      
      This is replaced by checking whether the array-cache cache is initialized.
      Attached is a patch which does that.  Boots fine on a x86-64, with
      DEBUG_SPIN, DEBUG_SLAB, and preempt.
      
      Attached is a patch which does that.  Boots fine on a x86-64, with
      DEBUG_SPIN, DEBUG_SLAB, and preempt.Thanks & Regards, Alok
      Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
      Signed-off-by: Shobhit Dayal <shobhitdayal.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c7e43c78
    • H
      [PATCH] error path in setup_arg_pages() misses vm_unacct_memory() · 2fd4ef85
      Hugh Dickins 提交于
      Pavel Emelianov and Kirill Korotaev observe that fs and arch users of
      security_vm_enough_memory tend to forget to vm_unacct_memory when a
      failure occurs further down (typically in setup_arg_pages variants).
      
      These are all users of insert_vm_struct, and that reservation will only
      be unaccounted on exit if the vma is marked VM_ACCOUNT: which in some
      cases it is (hidden inside VM_STACK_FLAGS) and in some cases it isn't.
      
      So x86_64 32-bit and ppc64 vDSO ELFs have been leaking memory into
      Committed_AS each time they're run.  But don't add VM_ACCOUNT to them,
      it's inappropriate to reserve against the very unlikely case that gdb
      be used to COW a vDSO page - we ought to do something about that in
      do_wp_page, but there are yet other inconsistencies to be resolved.
      
      The safe and economical way to fix this is to let insert_vm_struct do
      the security_vm_enough_memory check when it finds VM_ACCOUNT is set.
      
      And the MIPS irix_brk has been calling security_vm_enough_memory before
      calling do_brk which repeats it, doubly accounting and so also leaking.
      Remove that, and all the fs and arch calls to security_vm_enough_memory:
      give it a less misleading name later on.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-Off-By: NKirill Korotaev <dev@sw.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2fd4ef85
  16. 13 9月, 2005 4 次提交
  17. 12 9月, 2005 1 次提交
    • G
      [PATCH] uclinux: add NULL check, 0 end valid check and some more exports to nommu.c · 66aa2b4b
      Greg Ungerer 提交于
      Move call to get_mm_counter() in update_mem_hiwater() to be
      inside the check for tsk->mm being null. Otherwise you can be
      following a null pointer here. This patch submitted by
      Javier Herrero <jherrero@hvsistemas.es>.
      
      Modify the end check for munmap regions to allow for the
      legacy behavior of 0 being valid. Pretty much all current
      uClinux system libc malloc's pass in 0 as the end point.
      A hard check will fail on these, so change the check so
      that if it is non-zero it must be valid otherwise it fails.
      A passed in value will always succeed (as it used too).
      
      Also export a few more mm system functions - to be consistent
      with the VM code exports.
      Signed-off-by: NGreg Ungerer <gerg@uclinux.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      66aa2b4b
  18. 11 9月, 2005 5 次提交