1. 06 2月, 2008 8 次提交
    • H
      swapin_readahead: move and rearrange args · 46017e95
      Hugh Dickins 提交于
      swapin_readahead has never sat well in mm/memory.c: move it to mm/swap_state.c
      beside its kindred read_swap_cache_async.  Why were its args in a different
      order?  rearrange them.  And since it was always followed by a
      read_swap_cache_async of the target page, fold that in and return struct
      page*.  Then CONFIG_SWAP=n no longer needs valid_swaphandles and
      read_swap_cache_async stubs.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46017e95
    • H
      swapin_readahead: excise NUMA bogosity · c4cc6d07
      Hugh Dickins 提交于
      For three years swapin_readahead has been cluttered with fanciful CONFIG_NUMA
      code, advancing addr, and stepping on to the next vma at the boundary, to line
      up the mempolicy for each page allocation.
      
      It _might_ be a good idea to allocate swap more according to vma layout; but
      the fact is, that's not how we do it at all, 2.6 even less than 2.4: swap is
      allocated as needed for pages as they sink to the bottom of the inactive LRUs.
       Sometimes that may match vma layout, but not so often that it's worth going
      to these misleading vma->vm_next lengths: rip all that out.
      
      Originally I intended to retain the incrementation of addr, but correct its
      initial value: valid_swaphandles generally supplies an offset below the target
      addr (this is readaround rather than readahead), but addr has not been
      adjusted accordingly, so in the interleave case it has usually been allocating
      the target page from the "wrong" node (though that may not matter very much).
      
      But look at the equivalent shmem_swapin code: either by oversight or by
      design, though it has all the apparatus for choosing a new mempolicy per page,
      it uses the same idx throughout, choosing the same mempolicy and interleave
      node for each page of the cluster.
      
      Which is actually a much better strategy: each node has its own LRUs and its
      own kswapd, so if you're betting on any particular relationship between swap
      and node, the best bet is that nearby swap entries belong to pages from the
      same node - even when the mempolicy of the target page is to interleave.  And
      examining a map of nodes corresponding to swap entries on a numa=fake system
      bears this out.  (We could later tweak swap allocation to make it even more
      likely, but this patch is merely about removing cruft.)
      
      So, neither adjust nor increment addr in swapin_readahead, and then
      shmem_swapin can use it too; the pseudo-vma to pass policy need only be set up
      once per cluster, and so few fields of pvma are used, let's skip the memset -
      from shmem_alloc_page also.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4cc6d07
    • C
      vmalloc: clean up page array indexing · bf53d6f8
      Christoph Lameter 提交于
      The page array is repeatedly indexed both in vunmap and vmalloc_area_node().
      Add a temporary variable to make it easier to read (and easier to patch
      later).
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf53d6f8
    • C
      is_vmalloc_addr(): Check if an address is within the vmalloc boundaries · 9e2779fa
      Christoph Lameter 提交于
      Checking if an address is a vmalloc address is done in a couple of places.
      Define a common version in mm.h and replace the other checks.
      
      Again the include structures suck.  The definition of VMALLOC_START and
      VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
      there.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e2779fa
    • C
      vmalloc: add const to void* parameters · b3bdda02
      Christoph Lameter 提交于
      Make vmalloc functions work the same way as kfree() and friends that
      take a const void * argument.
      
      [akpm@linux-foundation.org: fix consts, coding-style]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3bdda02
    • C
      Move vmalloc_to_page() to mm/vmalloc. · 48667e7a
      Christoph Lameter 提交于
      We already have page table manipulation for vmalloc in vmalloc.c. Move the
      vmalloc_to_page() function there as well.
      
      Move the definitions for vmalloc related functions in mm.h to a newly created
      section.  A better place would be vmalloc.h but mm.h is basic and may depend
      on these functions.  An alternative would be to include vmalloc.h in mm.h
      (like done for vmstat.h).
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48667e7a
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
    • O
      sys_remap_file_pages: fix ->vm_file accounting · 8a459e44
      Oleg Nesterov 提交于
      Fix ->vm_file accounting, mmap_region() may do do_munmap().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a459e44
  2. 05 2月, 2008 7 次提交
  3. 04 2月, 2008 2 次提交
  4. 03 2月, 2008 1 次提交
    • N
      fix writev regression: pan hanging unkillable and un-straceable · 124d3b70
      Nick Piggin 提交于
      Frederik Himpe reported an unkillable and un-straceable pan process.
      
      Zero length iovecs can go into an infinite loop in writev, because the
      iovec iterator does not always advance over them.
      
      The sequence required to trigger this is not trivial. I think it
      requires that a zero-length iovec be followed by a non-zero-length iovec
      which causes a pagefault in the atomic usercopy. This causes the writev
      code to drop back into single-segment copy mode, which then tries to
      copy the 0 bytes of the zero-length iovec; a zero length copy looks like
      a failure though, so it loops.
      
      Put a test into iov_iter_advance to catch zero-length iovecs. We could
      just put the test in the fallback path, but I feel it is more robust to
      skip over zero-length iovecs throughout the code (iovec iterator may be
      used in filesystems too, so it should be robust).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      124d3b70
  5. 30 1月, 2008 3 次提交
    • A
      x86: print which shared library/executable faulted in segfault etc. messages v3 · 03252919
      Andi Kleen 提交于
      They now look like:
      
      hal-resmgr[13791]: segfault at 3c rip 2b9c8caec182 rsp 7fff1e825d30 error 4 in libacl.so.1.1.0[2b9c8caea000+6000]
      
      This makes it easier to pinpoint bugs to specific libraries.
      
      And printing the offset into a mapping also always allows to find the
      correct fault point in a library even with randomized mappings. Previously
      there was no way to actually find the correct code address inside
      the randomized mapping.
      
      Relies on earlier patch to shorten the printk formats.
      
      They are often now longer than 80 characters, but I think that's worth it.
      
      [includes fix from Eric Dumazet to check d_path error value]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      03252919
    • N
      spinlock: lockbreak cleanup · 95c354fe
      Nick Piggin 提交于
      The break_lock data structure and code for spinlocks is quite nasty.
      Not only does it double the size of a spinlock but it changes locking to
      a potentially less optimal trylock.
      
      Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
      __raw_spin_is_contended that uses the lock data itself to determine whether
      there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
      not set.
      
      Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
      decouple it from the spinlock implementation, and make it typesafe (rwlocks
      do not have any need_lockbreak sites -- why do they even get bloated up
      with that break_lock then?).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      95c354fe
    • J
      x86: randomize brk · c1d171a0
      Jiri Kosina 提交于
      Randomize the location of the heap (brk) for i386 and x86_64.  The range is
      randomized in the range starting at current brk location up to 0x02000000
      offset for both architectures.  This, together with
      pie-executable-randomization.patch and
      pie-executable-randomization-fix.patch, should make the address space
      randomization on i386 and x86_64 complete.
      
      Arjan says:
      
      This is known to break older versions of some emacs variants, whose dumper
      code assumed that the last variable declared in the program is equal to the
      start of the dynamically allocated memory region.
      
      (The dumper is the code where emacs effectively dumps core at the end of it's
      compilation stage; this coredump is then loaded as the main program during
      normal use)
      
      iirc this was 5 years or so; we found this way back when I was at RH and we
      first did the security stuff there (including this brk randomization).  It
      wasn't all variants of emacs, and it got fixed as a result (I vaguely remember
      that emacs already had code to deal with it for other archs/oses, just
      ifdeffed wrongly).
      
      It's a rare and wrong assumption as a general thing, just on x86 it mostly
      happened to be true (but to be honest, it'll break too if gcc does
      something fancy or if the linker does a non-standard order).  Still its
      something we should at least document.
      
      Note 2: afaik it only broke the emacs *build*.  I'm not 100% sure about that
      (it IS 5 years ago) though.
      
      [ akpm@linux-foundation.org: deuglification ]
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c1d171a0
  6. 28 1月, 2008 1 次提交
  7. 26 1月, 2008 3 次提交
  8. 25 1月, 2008 8 次提交
  9. 24 1月, 2008 1 次提交
  10. 18 1月, 2008 2 次提交
  11. 15 1月, 2008 3 次提交
  12. 09 1月, 2008 1 次提交