1. 05 9月, 2005 21 次提交
    • N
      [PATCH] mm: cleanup rmap · 4d7670e0
      Nick Piggin 提交于
      Thanks to Bill Irwin for pointing this out.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4d7670e0
    • N
      [PATCH] mm: micro-optimise rmap · 2822c1aa
      Nick Piggin 提交于
      Microoptimise page_add_anon_rmap.  Although these expressions are used only in
      the taken branch of the if() statement, the compiler can't reorder them inside
      because atomic_inc_and_test is a barrier.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2822c1aa
    • N
      [PATCH] mm: comment rmap · c3dce2d8
      Nick Piggin 提交于
      Just be clear that VM_RESERVED pages here are a bug, and the test is not there
      because they are expected.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c3dce2d8
    • C
      [PATCH] /proc/<pid>/numa_maps to show on which nodes pages reside · 6e21c8f1
      Christoph Lameter 提交于
      This patch was recently discussed on linux-mm:
      http://marc.theaimsgroup.com/?t=112085728500002&r=1&w=2
      
      I inherited a large code base from Ray for page migration.  There was a
      small patch in there that I find to be very useful since it allows the
      display of the locality of the pages in use by a process.  I reworked that
      patch and came up with a /proc/<pid>/numa_maps that gives more information
      about the vma's of a process.  numa_maps is indexes by the start address
      found in /proc/<pid>/maps.  F.e.  with this patch you can see the page use
      of the "getty" process:
      
      margin:/proc/12008 # cat maps
      00000000-00004000 r--p 00000000 00:00 0
      2000000000000000-200000000002c000 r-xp 00000000 08:04 516                /lib/ld-2.3.3.so
      2000000000038000-2000000000040000 rw-p 00028000 08:04 516                /lib/ld-2.3.3.so
      2000000000040000-2000000000044000 rw-p 2000000000040000 00:00 0
      2000000000058000-2000000000260000 r-xp 00000000 08:04 54707842           /lib/tls/libc.so.6.1
      2000000000260000-2000000000268000 ---p 00208000 08:04 54707842           /lib/tls/libc.so.6.1
      2000000000268000-2000000000274000 rw-p 00200000 08:04 54707842           /lib/tls/libc.so.6.1
      2000000000274000-2000000000280000 rw-p 2000000000274000 00:00 0
      2000000000280000-20000000002b4000 r--p 00000000 08:04 9126923            /usr/lib/locale/en_US.utf8/LC_CTYPE
      2000000000300000-2000000000308000 r--s 00000000 08:04 60071467           /usr/lib/gconv/gconv-modules.cache
      2000000000318000-2000000000328000 rw-p 2000000000318000 00:00 0
      4000000000000000-4000000000008000 r-xp 00000000 08:04 29576399           /sbin/mingetty
      6000000000004000-6000000000008000 rw-p 00004000 08:04 29576399           /sbin/mingetty
      6000000000008000-600000000002c000 rw-p 6000000000008000 00:00 0          [heap]
      60000fff7fffc000-60000fff80000000 rw-p 60000fff7fffc000 00:00 0
      60000ffffff44000-60000ffffff98000 rw-p 60000ffffff44000 00:00 0          [stack]
      a000000000000000-a000000000020000 ---p 00000000 00:00 0                  [vdso]
      
      cat numa_maps
      2000000000000000 default MaxRef=43 Pages=11 Mapped=11 N0=4 N1=3 N2=2 N3=2
      2000000000038000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
      2000000000040000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      2000000000058000 default MaxRef=43 Pages=61 Mapped=61 N0=14 N1=15 N2=16 N3=16
      2000000000268000 default MaxRef=1 Pages=2 Mapped=2 Anon=2 N0=2
      2000000000274000 default MaxRef=1 Pages=3 Mapped=3 Anon=3 N0=3
      2000000000280000 default MaxRef=8 Pages=3 Mapped=3 N0=3
      2000000000300000 default MaxRef=8 Pages=2 Mapped=2 N0=2
      2000000000318000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N2=1
      4000000000000000 default MaxRef=6 Pages=2 Mapped=2 N1=2
      6000000000004000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      6000000000008000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      60000fff7fffc000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      60000ffffff44000 default MaxRef=1 Pages=1 Mapped=1 Anon=1 N0=1
      
      getty uses ld.so.  The first vma is the code segment which is used by 43
      other processes and the pages are evenly distributed over the 4 nodes.
      
      The second vma is the process specific data portion for ld.so.  This is
      only one page.
      
      The display format is:
      
      <startaddress>	 Links to information in /proc/<pid>/map
      <memory policy>  This can be "default" "interleave={}", "prefer=<node>" or "bind={<zones>}"
      MaxRef=		<maximum reference to a page in this vma>
      Pages=		<Nr of pages in use>
      Mapped=		<Nr of pages with mapcount >
      Anon=		<nr of anonymous pages>
      Nx=		<Nr of pages on Node x>
      
      The content of the proc-file is self-evident.  If this would be tied into
      the sparsemem system then the contents of this file would not be too
      useful.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e21c8f1
    • H
      [PATCH] rmap: don't test rss · 839b9685
      Hugh Dickins 提交于
      Remove the three get_mm_counter(mm, rss) tests from rmap.c: there was a
      time when testing rss was important to avoid a particular race between
      dup_mmap and the anonmm rmap; but now it's just a rather silly pseudo-
      optimization, made even more obscure by the get_mm_counter macro.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      839b9685
    • H
      [PATCH] delete from_swap_cache BUG_ONs · 3279ffd9
      Hugh Dickins 提交于
      Three of the four BUG_ONs in delete_from_swap_cache are immediately
      repeated in __delete_from_swap_cache: delete those and add the one.  But
      perhaps mm/ is altogether overprovisioned with historic BUGs?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3279ffd9
    • H
      [PATCH] swap: swap_lock replace list+device · 5d337b91
      Hugh Dickins 提交于
      The idea of a swap_device_lock per device, and a swap_list_lock over them all,
      is appealing; but in practice almost every holder of swap_device_lock must
      already hold swap_list_lock, which defeats the purpose of the split.
      
      The only exceptions have been swap_duplicate, valid_swaphandles and an
      untrodden path in try_to_unuse (plus a few places added in this series).
      valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
      demand attention.  However, with the hold time in get_swap_pages so much
      reduced, I've not yet found a load and set of swap device priorities to show
      even swap_duplicate benefitting from the split.  Certainly the split is mere
      overhead in the common case of a single swap device.
      
      So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
      (generally we seem to prefer an _ in the name, and not hide in a macro).
      
      If someone can show a regression in swap_duplicate, then probably we should
      add a hashlock for the swap_map entries alone (shorts being anatomic), so as
      to help the case of the single swap device too.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d337b91
    • H
      [PATCH] swap: scan_swap_map latency breaks · 048c27fd
      Hugh Dickins 提交于
      The get_swap_page/scan_swap_map latency can be so bad that even those without
      preemption configured deserve relief: periodically cond_resched.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      048c27fd
    • H
      [PATCH] swap: scan_swap_map drop swap_device_lock · 52b7efdb
      Hugh Dickins 提交于
      get_swap_page has often shown up on latency traces, doing lengthy scans while
      holding two spinlocks.  swap_list_lock is already dropped, now scan_swap_map
      drop swap_device_lock before scanning the swap_map.
      
      While scanning for an empty cluster, don't worry that racing tasks may
      allocate what was free and free what was allocated; but when allocating an
      entry, check it's still free after retaking the lock.  Avoid dropping the lock
      in the expected common path.  No barriers beyond the locks, just let the
      cookie crumble; highest_bit limit is volatile, but benign.
      
      Guard against swapoff: must check SWP_WRITEOK before allocating, must raise
      SWP_SCANNING reference count while in scan_swap_map, swapoff wait for that to
      fall - just use schedule_timeout, we don't want to burden scan_swap_map
      itself, and it's very unlikely that anyone can really still be in
      scan_swap_map once swapoff gets this far.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      52b7efdb
    • H
      [PATCH] swap: scan_swap_map restyled · 7dfad418
      Hugh Dickins 提交于
      Rewrite scan_swap_map to allocate in just the same way as before (taking the
      next free entry SWAPFILE_CLUSTER-1 times, then restarting at the lowest wholly
      empty cluster, falling back to lowest entry if none), but with a view towards
      dropping the lock in the next patch.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7dfad418
    • H
      [PATCH] swap: get_swap_page drop swap_list_lock · fb4f88dc
      Hugh Dickins 提交于
      Rewrite get_swap_page to allocate in just the same sequence as before, but
      without holding swap_list_lock across its scan_swap_map.  Decrement
      nr_swap_pages and update swap_list.next in advance, while still holding
      swap_list_lock.  Skip full devices by testing highest_bit.  Swapoff hold
      swap_device_lock as well as swap_list_lock to clear SWP_WRITEOK.  Reduces lock
      contention when there are parallel swap devices of the same priority.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb4f88dc
    • H
      [PATCH] swap: freeing update swap_list.next · 89d09a2c
      Hugh Dickins 提交于
      This makes negligible difference in practice: but swap_list.next should not be
      updated to a higher prio in the general helper swap_info_get, but rather in
      swap_entry_free; and then only in the case when entry is actually freed.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      89d09a2c
    • H
      [PATCH] swap: swap unsigned int consistency · 6eb396dc
      Hugh Dickins 提交于
      The swap header's unsigned int last_page determines the range of swap pages,
      but swap_info has been using int or unsigned long in some cases: use unsigned
      int throughout (except, in several places a local unsigned long is useful to
      avoid overflows when adding).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6eb396dc
    • H
      [PATCH] swap: show span of swap extents · 53092a74
      Hugh Dickins 提交于
      The "Adding %dk swap" message shows the number of swap extents, as a guide to
      how fragmented the swapfile may be.  But a useful further guide is what total
      extent they span across (sometimes scarily large).
      
      And there's no need to keep nr_extents in swap_info: it's unused after the
      initial message, so save a little space by keeping it on stack.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      53092a74
    • H
      [PATCH] swap: swap extent list is ordered · 11d31886
      Hugh Dickins 提交于
      There are several comments that swap's extent_list.prev points to the lowest
      extent: that's not so, it's extent_list.next which points to it, as you'd
      expect.  And a couple of loops in add_swap_extent which go all the way through
      the list, when they should just add to the other end.
      
      Fix those up, and let map_swap_page search the list forwards: profiles shows
      it to be twice as quick that way - because prefetch works better on how the
      structs are typically kmalloc'ed?  or because usually more is written to than
      read from swap, and swap is allocated ascendingly?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      11d31886
    • H
      [PATCH] swap: move destroy_swap_extents calls · 4cd3bb10
      Hugh Dickins 提交于
      sys_swapon's call to destroy_swap_extents on failure is made after the final
      swap_list_unlock, which is faintly unsafe: another sys_swapon might already be
      setting up that swap_info_struct.  Calling it earlier, before taking
      swap_list_lock, is safe.  sys_swapoff's call to destroy_swap_extents was safe,
      but likewise move it earlier, before taking the locks (once try_to_unuse has
      completed, nothing can be needing the swap extents).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4cd3bb10
    • H
      [PATCH] swap: correct swapfile nr_good_pages · e2244ec2
      Hugh Dickins 提交于
      If a regular swapfile lies on a filesystem whose blocksize is less than
      PAGE_SIZE, then setup_swap_extents may have to cut the number of usable swap
      pages; but sys_swapon's nr_good_pages was not expecting that.  Also,
      setup_swap_extents takes no account of badpages listed in the swap header: not
      worth doing so, but ensure nr_badpages is 0 for a regular swapfile.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e2244ec2
    • H
      [PATCH] swap: update swapfile i_sem comment · b0d9bcd4
      Hugh Dickins 提交于
      Update swap extents comment: nowadays we guard with S_SWAPFILE not i_sem.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b0d9bcd4
    • D
      [PATCH] sparsemem extreme: hotplug preparation · 28ae55c9
      Dave Hansen 提交于
      This splits up sparse_index_alloc() into two pieces.  This is needed
      because we'll allocate the memory for the second level in a different place
      from where we actually consume it to keep the allocation from happening
      underneath a lock
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      28ae55c9
    • B
      [PATCH] sparsemem extreme implementation · 3e347261
      Bob Picco 提交于
      With cleanups from Dave Hansen <haveblue@us.ibm.com>
      
      SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
      mem_sections.  This two level layout scheme is able to achieve smaller
      memory requirements for SPARSEMEM with the tradeoff of an additional shift
      and load when fetching the memory section.  The current SPARSEMEM
      implementation is a one dimensional array of mem_sections which is the
      default SPARSEMEM configuration.  The patch attempts isolates the
      implementation details of the physical layout of the sparsemem section
      array.
      
      SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
      memory_present() calls.  This is not always feasible, so architectures
      which do not need it may allocate everything statically by using
      SPARSEMEM_STATIC.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3e347261
    • B
      [PATCH] SPARSEMEM EXTREME · 802f192e
      Bob Picco 提交于
      A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME.  Architecture
      platforms with a very sparse physical address space would likely want to
      select this option.  For those architecture platforms that don't select the
      option, the code generated is equivalent to SPARSEMEM currently in -mm.
      I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.
      
      ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
      pointers to mem_sections.  This two level layout scheme is able to achieve
      smaller memory requirements for SPARSEMEM with the tradeoff of an
      additional shift and load when fetching the memory section.  The current
      SPARSEMEM -mm implementation is a one dimensional array of mem_sections
      which is the default SPARSEMEM configuration.  The patch attempts isolates
      the implementation details of the physical layout of the sparsemem section
      array.
      
      ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.
      
      I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
       I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
      and tested with aim.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      802f192e
  2. 30 8月, 2005 1 次提交
    • N
      [PATCH] Lazy page table copies in fork() · d992895b
      Nick Piggin 提交于
      Defer copying of ptes until fault time when it is possible to reconstruct
      the pte from backing store. Idea from Andi Kleen and Nick Piggin.
      
      Thanks to input from Rik van Riel and Linus and to Hugh for correcting
      my blundering.
      
      Ray Fucillo <fucillo@intersystems.com> reports:
      
        "I applied this latest patch to a 2.6.12 kernel and found that it does
         resolve the problem.  Prior to the patch on this machine, I was
         seeing about 23ms spent in fork for ever 100MB of shared memory
         segment.
      
         After applying the patch, fork is taking about 1ms regardless of the
         shared memory size."
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d992895b
  3. 20 8月, 2005 1 次提交
    • L
      Fix nasty ncpfs symlink handling bug. · cc314eef
      Linus Torvalds 提交于
      This bug could cause oopses and page state corruption, because ncpfs
      used the generic page-cache symlink handlign functions.  But those
      functions only work if the page cache is guaranteed to be "stable", ie a
      page that was installed when the symlink walk was started has to still
      be installed in the page cache at the end of the walk.
      
      We could have fixed ncpfs to not use the generic helper routines, but it
      is in many ways much cleaner to instead improve on the symlink walking
      helper routines so that they don't require that absolute stability.
      
      We do this by allowing "follow_link()" to return a error-pointer as a
      cookie, which is fed back to the cleanup "put_link()" routine.  This
      also simplifies NFS symlink handling.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cc314eef
  4. 06 8月, 2005 1 次提交
  5. 05 8月, 2005 2 次提交
    • S
      [PATCH] __vm_enough_memory() signedness fix · 2f60f8d3
      Simon Derr 提交于
      We have found what seems to be a small bug in __vm_enough_memory() when
      sysctl_overcommit_memory is set to OVERCOMMIT_NEVER.
      
      When this bug occurs the systems fails to boot, with /sbin/init whining
      about fork() returning ENOMEM.
      
      We hunted down the problem to this:
      
      The deferred update mecanism used in vm_acct_memory(), on a SMP system,
      allows the vm_committed_space counter to have a negative value.
      
      This should not be a problem since this counter is known to be inaccurate.
      
      But in __vm_enough_memory() this counter is compared to the `allowed'
      variable, which is an unsigned long.  This comparison is broken since it
      will consider the negative values of vm_committed_space to be huge positive
      values, resulting in a memory allocation failure.
      
      Signed-off-by: <Jean-Marc.Saffroy@ext.bull.net>
      Signed-off-by: <Simon.Derr@bull.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2f60f8d3
    • H
      [PATCH] fix VmSize and VmData after mremap · 1c5ad845
      Hugh Dickins 提交于
      mremap's move_vma is applying __vm_stat_account to the old vma which may
      have already been freed: move it to just before the do_munmap.
      
      mremapping to and fro with CONFIG_DEBUG_SLAB=y showed /proc/<pid>/status
      VmSize and VmData wrapping just like in kernel bugzilla #4842, and fixed by
      this patch - worth including in 2.6.13, though not yet confirmed that it
      fixes that specific report from Frank van Maarseveen.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1c5ad845
  6. 04 8月, 2005 2 次提交
    • L
      Fix up recent get_user_pages() handling · a68d2ebc
      Linus Torvalds 提交于
      The VM_FAULT_WRITE thing is an extra bit, not a valid return value, and
      has to be treated as such by get_user_pages().
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a68d2ebc
    • N
      [PATCH] fix get_user_pages bug · f33ea7f4
      Nick Piggin 提交于
      Checking pte_dirty instead of pte_write in __follow_page is problematic
      for s390, and for copy_one_pte which leaves dirty when clearing write.
      
      So revert __follow_page to check pte_write as before, and make
      do_wp_page pass back a special extra VM_FAULT_WRITE bit to say it has
      done its full job: once get_user_pages receives this value, it no longer
      requires pte_write in __follow_page.
      
      But most callers of handle_mm_fault, in the various architectures, have
      switch statements which do not expect this new case.  To avoid changing
      them all in a hurry, make an inline wrapper function (using the old
      name) that masks off the new bit, and use the extended interface with
      double underscores.
      
      Yes, we do have a call to do_wp_page from do_swap_page, but no need to
      change that: in rare case it's needed, another do_wp_page will follow.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      [ Cleanups by Nick Piggin ]
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f33ea7f4
  7. 02 8月, 2005 3 次提交
    • E
      [PATCH] sys_set_mempolicy() doesnt check if mode < 0 · ba17101b
      Eric Dumazet 提交于
      A kernel BUG() is triggered by a call to set_mempolicy() with a negative
      first argument.  This is because the mode is declared as an int, and the
      validity check doesnt check < 0 values.  Alternatively, mode could be
      declared as unsigned int or unsigned long.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ba17101b
    • H
      [PATCH] x86_64: access of some bad address · 690dbe1c
      Hugh Dickins 提交于
      x86_64 has a large sparse gate area between VSYSCALL_START and
      VSYSCALL_END, not all of it presently backed by pmds.  Alexander Nyberg has
      found that in some circumstances gdb may try to ptrace here, and hit
      get_user_pages BUG_ON.  It seems odd that gdb should be accessing here, but
      it certainly shouldn't crash in this way: relax BUG_ON to -EFAULT.  Fixes
      kernel bugzilla #4801.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      690dbe1c
    • L
      Fix get_user_pages() race for write access · 4ceb5db9
      Linus Torvalds 提交于
      There's no real guarantee that handle_mm_fault() will always be able to
      break a COW situation - if an update from another thread ends up
      modifying the page table some way, handle_mm_fault() may end up
      requiring us to re-try the operation.
      
      That's normally fine, but get_user_pages() ended up re-trying it as a
      read, and thus a write access could in theory end up losing the dirty
      bit or be done on a page that had not been properly COW'ed.
      
      This makes get_user_pages() always retry write accesses as write
      accesses by making "follow_page()" require that a writable follow has
      the dirty bit set.  That simplifies the code and solves the race: if the
      COW break fails for some reason, we'll just loop around and try again.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ceb5db9
  8. 31 7月, 2005 1 次提交
  9. 28 7月, 2005 4 次提交
    • A
      [PATCH] Remove bogus warning in page_alloc.c · 12b1c5f3
      Andy Whitcroft 提交于
      Originally __free_pages_bulk used the relative page number within a zone to
      define its buddies.  This meant that to maintain the "maximally aligned"
      requirements (that an allocation of size N will be aligned at least to N
      physically) zones had to also be aligned to 1<<MAX_ORDER pages.  When
      __free_pages_bulk was updated to use the relative page frame numbers of the
      free'd pages to pair buddies this released the alignment constraint on the
      'left' edge of the zone.  This allows _either_ edge of the zone to contain
      partial MAX_ORDER sized buddies.  These simply never will have matching
      buddies and thus will never make it to the 'top' of the pyramid.
      
      The patch below removes a now redundant check ensuring that the mem_map was
      aligned to MAX_ORDER.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      12b1c5f3
    • S
      [PATCH] madvise() does not always return -EBADF on non-file mapped area · 165cd402
      suzuki 提交于
      The madvise() system call returns -EBADF for areas which does not map to
      files, only for *behaviour* request MADV_WILLNEED.
      
      According to man pages, madvise returns :
      
      EBADF - the map exists, but the area maps something that isn't a file.
      
      Fixes bug 2995.
      Signed-off-by: NSuzuki K P <suzuki@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      165cd402
    • A
      [PATCH] check_user_page_readable() deadlock fix · 1aaf18ff
      Andrew Morton 提交于
      Fix bug identifued by Richard Purdie <rpurdie@rpsys.net>.
      
      oprofile calls check_user_page_readable() from interrupt context, so we
      deadlock over various VFS locks.
      
      But check_user_page_readable() doesn't imply either a read or a write of the
      page's contents.  Change __follow_page() so that check_user_page_readable()
      can tell __follow_page() that we're not accessing the page's contents, and use
      that info to avoid the troublesome lock-takings.
      
      Also, make follow_page() inline for the single callsite in memory.c to save a
      bit of stack space.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1aaf18ff
    • A
      [PATCH] Undo mempolicy shared policy rbtree microoptimization · 90c5029e
      Andi Kleen 提交于
      All mempolicy changes must be inside the spinlock and readding the rb_erase
      prevents a crash while doing:
      
      > echo "1" > /tmp/numatest
      > numactl --length=0x4000 --shm /tmp/numatest --localalloc
      > numactl --length=0x2000 --offset=0 --shm /tmp/numatest --membind=0
      > numactl --length=0x2000 --offset=0x2000 --shm /tmp/numatest --membind=1
      > ipcs
      > ipcrm -M "the_key_value_of_this_shm_area"
      
      Based on a patch by John Blackwood
      
      Cc: <john.blackwood@ccur.com>
      Cc: <andrea@suse.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      90c5029e
  10. 16 7月, 2005 1 次提交
    • C
      [PATCH] execute-in-place fixes · afa597ba
      Carsten Otte 提交于
      This patch includes feedback from Andrew and Christoph. Thanks for
      taking time to review.
      
      Use of empty_zero_page was eliminated to fix compilation for architectures
      that don't have it.
      
      This patch removes setting pages up-to-date in ext2_get_xip_page and all
      bug checks to verify that the page is indeed up to date.  Setting the page
      state on mapping to userland is bogus.  None of the code patchs involved
      with these pages in mm cares about the page state.
      
      still on my ToDo list: identify a place outside second extended where
      __inode_direct_access should reside
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      afa597ba
  11. 13 7月, 2005 1 次提交
  12. 08 7月, 2005 2 次提交