1. 27 1月, 2007 4 次提交
  2. 23 1月, 2007 1 次提交
  3. 13 1月, 2007 1 次提交
  4. 12 1月, 2007 2 次提交
    • T
      [PATCH] NFS: Fix race in nfs_release_page() · e3db7691
      Trond Myklebust 提交于
          NFS: Fix race in nfs_release_page()
      
          invalidate_inode_pages2() may find the dirty bit has been set on a page
          owing to the fact that the page may still be mapped after it was locked.
          Only after the call to unmap_mapping_range() are we sure that the page
          can no longer be dirtied.
          In order to fix this, NFS has hooked the releasepage() method and tries
          to write the page out between the call to unmap_mapping_range() and the
          call to remove_mapping(). This, however leads to deadlocks in the page
          reclaim code, where the page may be locked without holding a reference
          to the inode or dentry.
      
          Fix is to add a new address_space_operation, launder_page(), which will
          attempt to write out a dirty page without releasing the page lock.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      
          Also, the bare SetPageDirty() can skew all sort of accounting leading to
          other nasties.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e3db7691
    • D
      [PATCH] Fix sparsemem on Cell · a2f3aa02
      Dave Hansen 提交于
      Fix an oops experienced on the Cell architecture when init-time functions,
      early_*(), are called at runtime.  It alters the call paths to make sure
      that the callers explicitly say whether the call is being made on behalf of
      a hotplug even, or happening at boot-time.
      
      It has been compile tested on ppc64, ia64, s390, i386 and x86_64.
      Acked-by: NArnd Bergmann <arndb@de.ibm.com>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Acked-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a2f3aa02
  5. 09 1月, 2007 1 次提交
    • R
      [ARM] pass vma for flush_anon_page() · a6f36be3
      Russell King 提交于
      Since get_user_pages() may be used with processes other than the
      current process and calls flush_anon_page(), flush_anon_page() has to
      cope in some way with non-current processes.
      
      It may not be appropriate, or even desirable to flush a region of
      virtual memory cache in the current process when that is different to
      the process that we want the flush to occur for.
      
      Therefore, pass the vma into flush_anon_page() so that the architecture
      can work out whether the 'vmaddr' is for the current process or not.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      a6f36be3
  6. 06 1月, 2007 6 次提交
  7. 31 12月, 2006 4 次提交
  8. 30 12月, 2006 1 次提交
    • L
      VM: Fix nasty and subtle race in shared mmap'ed page writeback · 7658cc28
      Linus Torvalds 提交于
      The VM layer (on the face of it, fairly reasonably) expected that when
      it does a ->writepage() call to the filesystem, it would write out the
      full page at that point in time.  Especially since it had earlier marked
      the whole page dirty with "set_page_dirty()".
      
      But that isn't actually the case: ->writepage() does not actually write
      a page, it writes the parts of the page that have been explicitly marked
      dirty before, *and* that had not got written out for other reasons since
      the last time we told it they were dirty.
      
      That last caveat is the important one.
      
      Which _most_ of the time ends up being the whole page (since we had
      called "set_page_dirty()" on the page earlier), but if the filesystem
      had done any dirty flushing of its own (for example, to honor some
      internal write ordering guarantees), it might end up doing only a
      partial page IO (or none at all) when ->writepage() is actually called.
      
      That is the correct thing in general (since we actually often _want_
      only the known-dirty parts of the page to be written out), but the
      shared dirty page handling had implicitly forgotten about these details,
      and had a number of cases where it was doing just the "->writepage()"
      part, without telling the low-level filesystem that the whole page might
      have been re-dirtied as part of being mapped writably into user space.
      
      Since most of the time the FS did actually write out the full page, we
      didn't notice this for a loong time, and this needed some really odd
      patterns to trigger.  But it caused occasional corruption with rtorrent
      and with the Debian "apt" database, because both use shared mmaps to
      update the end result.
      
      This fixes it. Finally. After way too much hair-pulling.
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Acked-by: NMartin J. Bligh <mbligh@google.com>
      Acked-by: NMartin Michlmayr <tbm@cyrius.com>
      Acked-by: NMartin Johansson <martin@fatbob.nu>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NAndrei Popa <andrei.popa@i-neo.ro>
      Cc: High Dickins <hugh@veritas.com>
      Cc: Andrew Morton <akpm@osdl.org>,
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Gordon Farquharson <gordonfarquharson@gmail.com>
      Cc: Guillaume Chazarain <guichaz@yahoo.fr>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Kenneth Cheng <kenneth.w.chen@intel.com>
      Cc: Tobias Diedrich <ranma@tdiedrich.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7658cc28
  9. 24 12月, 2006 1 次提交
  10. 23 12月, 2006 7 次提交
  11. 22 12月, 2006 2 次提交
    • A
      [PATCH] truncate: clear page dirtiness before running try_to_free_buffers() · 3e67c098
      Andrew Morton 提交于
      truncate presently invalidates the dirty page's buffer_heads then shoots down
      the page.  But try_to_free_buffers() will now bale out because the page is
      dirty.
      
      Net effect: the LRU gets filled with dirty pages which have invalidated
      buffer_heads attached.  They have no ->mapping and hence cannot be cleaned.
      The machine leaks memory at an enormous rate.
      
      Fix this by cleaning the page before running try_to_free_buffers(), so
      try_to_free_buffers() can do its work.
      
      Also, remember to do dirty-page-acoounting in cancel_dirty_page() so the
      machine won't wedge up trying to write non-existent dirty pages.
      
      Probably still wrong, but now less so.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3e67c098
    • L
      VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions · fba2591b
      Linus Torvalds 提交于
      They were horribly easy to mis-use because of their tempting naming, and
      they also did way more than any users of them generally wanted them to
      do.
      
      A dirty page can become clean under two circumstances:
      
       (a) when we write it out.  We have "clear_page_dirty_for_io()" for
           this, and that function remains unchanged.
      
           In the "for IO" case it is not sufficient to just clear the dirty
           bit, you also have to mark the page as being under writeback etc.
      
       (b) when we actually remove a page due to it becoming inaccessible to
           users, notably because it was truncate()'d away or the file (or
           metadata) no longer exists, and we thus want to cancel any
           outstanding dirty state.
      
      For the (b) case, we now introduce "cancel_dirty_page()", which only
      touches the page state itself, and verifies that the page is not mapped
      (since cancelling writes on a mapped page would be actively wrong as it
      is still accessible to users).
      
      Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
      ReiserFS, XFS all use the old confusing functions, and will be fixed
      separately in subsequent commits (with some of them just removing the
      offending logic, and others using clear_page_dirty_for_io()).
      
      This was confirmed by Martin Michlmayr to fix the apt database
      corruption on ARM.
      
      Cc: Martin Michlmayr <tbm@cyrius.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Andrei Popa <andrei.popa@i-neo.ro>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Gordon Farquharson <gordonfarquharson@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fba2591b
  12. 18 12月, 2006 1 次提交
  13. 17 12月, 2006 2 次提交
    • L
      Fix up mm/mincore.c error value cases · 4fb23e43
      Linus Torvalds 提交于
      Hugh Dickins correctly points out that mincore() is actually _supposed_
      to fail on an unmapped hole in the user address space, rather than
      return valid ("empty") information about the hole.  This just simplifies
      the problem further (I had been misled by our previous confusing and
      complicated way of doing mincore()).
      
      Also, in the unlikely situation that we can't allocate a temporary
      kernel buffer, we should actually return EAGAIN, not ENOMEM, to keep the
      "unmapped hole" and "allocation failure" error cases separate.
      
      Finally, add a comment about our stupid historical lack of support for
      anonymous mappings.  I'll fix that if somebody reminds me after 2.6.20
      is out.
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4fb23e43
    • L
      Fix incorrect user space access locking in mincore() · 2f77d107
      Linus Torvalds 提交于
      Doug Chapman noticed that mincore() will doa "copy_to_user()" of the
      result while holding the mmap semaphore for reading, which is a big
      no-no.  While a recursive read-lock on a semaphore in the case of a page
      fault happens to work, we don't actually allow them due to deadlock
      schenarios with writers due to fairness issues.
      
      Doug and Marcel sent in a patch to fix it, but I decided to just rewrite
      the mess instead - not just fixing the locking problem, but making the
      code smaller and (imho) much easier to understand.
      
      Cc: Doug Chapman <dchapman@redhat.com>
      Cc: Marcel Holtmann <holtmann@redhat.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2f77d107
  14. 14 12月, 2006 6 次提交
    • A
      [PATCH] Pass vma argument to copy_user_highpage(). · 9de455b2
      Atsushi Nemoto 提交于
      To allow a more effective copy_user_highpage() on certain architectures,
      a vma argument is added to the function and cow_user_page() allowing
      the implementation of these functions to check for the VM_EXEC bit.
      
      The main part of this patch was originally written by Ralf Baechle;
      Atushi Nemoto did the the debugging.
      Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9de455b2
    • E
      [PATCH] SLAB: use a multiply instead of a divide in obj_to_index() · 6a2d7a95
      Eric Dumazet 提交于
      When some objects are allocated by one CPU but freed by another CPU we can
      consume lot of cycles doing divides in obj_to_index().
      
      (Typical load on a dual processor machine where network interrupts are
      handled by one particular CPU (allocating skbufs), and the other CPU is
      running the application (consuming and freeing skbufs))
      
      Here on one production server (dual-core AMD Opteron 285), I noticed this
      divide took 1.20 % of CPU_CLK_UNHALTED events in kernel.  But Opteron are
      quite modern cpus and the divide is much more expensive on oldest
      architectures :
      
      On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1
      cycle for a multiply.
      
      Doing some math, we can use a reciprocal multiplication instead of a divide.
      
      If we want to compute V = (A / B)  (A and B being u32 quantities)
      we can instead use :
      
      V = ((u64)A * RECIPROCAL(B)) >> 32 ;
      
      where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B
      
      Note :
      
      I wrote pure C code for clarity. gcc output for i386 is not optimal but
      acceptable :
      
      mull   0x14(%ebx)
      mov    %edx,%eax // part of the >> 32
      xor     %edx,%edx // useless
      mov    %eax,(%esp) // could be avoided
      mov    %edx,0x4(%esp) // useless
      mov    (%esp),%ebx
      
      [akpm@osdl.org: small cleanups]
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6a2d7a95
    • P
      [PATCH] cpuset: rework cpuset_zone_allowed api · 02a0e53d
      Paul Jackson 提交于
      Elaborate the API for calling cpuset_zone_allowed(), so that users have to
      explicitly choose between the two variants:
      
        cpuset_zone_allowed_hardwall()
        cpuset_zone_allowed_softwall()
      
      Until now, whether or not you got the hardwall flavor depended solely on
      whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
      argument.
      
      If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
      version.
      
      Unfortunately, this meant that users would end up with the softwall version
      without thinking about it.  Since only the softwall version might sleep,
      this led to bugs with possible sleeping in interrupt context on more than
      one occassion.
      
      The hardwall version requires that the current tasks mems_allowed allows
      the node of the specified zone (or that you're in interrupt or that
      __GFP_THISNODE is set or that you're on a one cpuset system.)
      
      The softwall version, depending on the gfp_mask, might allow a node if it
      was allowed in the nearest enclusing cpuset marked mem_exclusive (which
      requires taking the cpuset lock 'callback_mutex' to evaluate.)
      
      This patch removes the cpuset_zone_allowed() call, and forces the caller to
      explicitly choose between the hardwall and the softwall case.
      
      If the caller wants the gfp_mask to determine this choice, they should (1)
      be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
      cpuset_zone_allowed_softwall() routine.
      
      This adds another 100 or 200 bytes to the kernel text space, due to the few
      lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
      routines.  It should save a few instructions executed for the calls that
      turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
      set (before the call) then check (within the call) the __GFP_HARDWALL flag.
      
      For the most critical call, from get_page_from_freelist(), the same
      instructions are executed as before -- the old cpuset_zone_allowed()
      routine it used to call is the same code as the
      cpuset_zone_allowed_softwall() routine that it calls now.
      
      Not a perfect win, but seems worth it, to reduce this chance of hitting a
      sleeping with irq off complaint again.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      02a0e53d
    • C
      [PATCH] More slab.h cleanups · 55935a34
      Christoph Lameter 提交于
      More cleanups for slab.h
      
      1. Remove tabs from weird locations as suggested by Pekka
      
      2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
         as suggested by Pekka.
      
      3. Uses static inline for the fallback defs as also suggested by Pekka.
      
      4. Make kmem_ptr_valid take a const * argument.
      
      5. Separate the NUMA fallback definitions from the kmalloc_track fallback
         definitions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55935a34
    • C
      [PATCH] Cleanup slab headers / API to allow easy addition of new slab allocators · 2e892f43
      Christoph Lameter 提交于
      This is a response to an earlier discussion on linux-mm about splitting
      slab.h components per allocator.  Patch is against 2.6.19-git11.  See
      http://marc.theaimsgroup.com/?l=linux-mm&m=116469577431008&w=2
      
      This patch cleans up the slab header definitions.  We define the common
      functions of slob and slab in slab.h and put the extra definitions needed
      for slab's kmalloc implementations in <linux/slab_def.h>.  In order to get
      a greater set of common functions we add several empty functions to slob.c
      and also rename slob's kmalloc to __kmalloc.
      
      Slob does not need any special definitions since we introduce a fallback
      case.  If there is no need for a slab implementation to provide its own
      kmalloc mess^H^H^Hacros then we simply fall back to __kmalloc functions.
      That is sufficient for SLOB.
      
      Sort the function in slab.h according to their functionality.  First the
      functions operating on struct kmem_cache * then the kmalloc related
      functions followed by special debug and fallback definitions.
      
      Also redo a lot of comments.
      
      Signed-off-by: Christoph Lameter <clameter@sgi.com>?
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2e892f43
    • C
      [PATCH] slab: fix sleeping in atomic bug · dd47ea75
      Christoph Lameter 提交于
      Fallback_alloc() does not do the check for GFP_WAIT as done in
      cache_grow().  Thus interrupts are disabled when we call kmem_getpages()
      which results in the failure.
      
      Duplicate the handling of GFP_WAIT in cache_grow().
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Jay Cliburn <jacliburn@bellsouth.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dd47ea75
  15. 11 12月, 2006 1 次提交