1. 17 10月, 2007 4 次提交
  2. 20 7月, 2007 5 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
    • N
      mm: fault feedback #2 · 83c54070
      Nick Piggin 提交于
      This patch completes Linus's wish that the fault return codes be made into
      bit flags, which I agree makes everything nicer.  This requires requires
      all handle_mm_fault callers to be modified (possibly the modifications
      should go further and do things like fault accounting in handle_mm_fault --
      however that would be for another patch).
      
      [akpm@linux-foundation.org: fix alpha build]
      [akpm@linux-foundation.org: fix s390 build]
      [akpm@linux-foundation.org: fix sparc build]
      [akpm@linux-foundation.org: fix sparc64 build]
      [akpm@linux-foundation.org: fix ia64 build]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Still apparently needs some ARM and PPC loving - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c54070
    • N
      mm: fault feedback #1 · d0217ac0
      Nick Piggin 提交于
      Change ->fault prototype.  We now return an int, which contains
      VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
       FAULT_RET_ code tells the VM whether a page was found, whether it has been
      locked, and potentially other things.  This is not quite the way he wanted
      it yet, but that's changed in the next patch (which requires changes to
      arch code).
      
      This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
      that a page is locked which requires filemap_nopage to go away (because we
      can no longer remain backward compatible without that flag), but we were
      going to do that anyway.
      
      struct fault_data is renamed to struct vm_fault as Linus asked. address
      is now a void __user * that we should firmly encourage drivers not to use
      without really good reason.
      
      The page is now returned via a page pointer in the vm_fault struct.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0217ac0
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
    • N
      mm: fix fault vs invalidate race for linear mappings · d00806b1
      Nick Piggin 提交于
      Fix the race between invalidate_inode_pages and do_no_page.
      
      Andrea Arcangeli identified a subtle race between invalidation of pages from
      pagecache with userspace mappings, and do_no_page.
      
      The issue is that invalidation has to shoot down all mappings to the page,
      before it can be discarded from the pagecache.  Between shooting down ptes to
      a particular page, and actually dropping the struct page from the pagecache,
      do_no_page from any process might fault on that page and establish a new
      mapping to the page just before it gets discarded from the pagecache.
      
      The most common case where such invalidation is used is in file truncation.
      This case was catered for by doing a sort of open-coded seqlock between the
      file's i_size, and its truncate_count.
      
      Truncation will decrease i_size, then increment truncate_count before
      unmapping userspace pages; do_no_page will read truncate_count, then find the
      page if it is within i_size, and then check truncate_count under the page
      table lock and back out and retry if it had subsequently been changed (ptl
      will serialise against unmapping, and ensure a potentially updated
      truncate_count is actually visible).
      
      Complexity and documentation issues aside, the locking protocol fails in the
      case where we would like to invalidate pagecache inside i_size.  do_no_page
      can come in anytime and filemap_nopage is not aware of the invalidation in
      progress (as it is when it is outside i_size).  The end result is that
      dangling (->mapping == NULL) pages that appear to be from a particular file
      may be mapped into userspace with nonsense data.  Valid mappings to the same
      place will see a different page.
      
      Andrea implemented two working fixes, one using a real seqlock, another using
      a page->flags bit.  He also proposed using the page lock in do_no_page, but
      that was initially considered too heavyweight.  However, it is not a global or
      per-file lock, and the page cacheline is modified in do_no_page to increment
      _count and _mapcount anyway, so a further modification should not be a large
      performance hit.  Scalability is not an issue.
      
      This patch implements this latter approach.  ->nopage implementations return
      with the page locked if it is possible for their underlying file to be
      invalidated (in that case, they must set a special vm_flags bit to indicate
      so).  do_no_page only unlocks the page after setting up the mapping
      completely.  invalidation is excluded because it holds the page lock during
      invalidation of each page (and ensures that the page is not mapped while
      holding the lock).
      
      This also allows significant simplifications in do_no_page, because we have
      the page locked in the right place in the pagecache from the start.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d00806b1
  3. 18 7月, 2007 2 次提交
    • C
      knfsd: exportfs: add exportfs.h header · a5694255
      Christoph Hellwig 提交于
      currently the export_operation structure and helpers related to it are in
      fs.h.  fs.h is already far too large and there are very few places needing the
      export bits, so split them off into a separate header.
      
      [akpm@linux-foundation.org: fix cifs build]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: Steven French <sfrench@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5694255
    • M
      Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated · 769848c0
      Mel Gorman 提交于
      It is often known at allocation time whether a page may be migrated or not.
      This patch adds a flag called __GFP_MOVABLE and a new mask called
      GFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated
      using the page migration mechanism or reclaimed by syncing with backing
      storage and discarding.
      
      An API function very similar to alloc_zeroed_user_highpage() is added for
      __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The
      flags used by alloc_zeroed_user_highpage() are not changed because it would
      change the semantics of an existing API.  After this patch is applied there
      are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
      be marked deprecated if this patch is merged.
      
      Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
      shmem.c to keep all flag modifications to inode->mapping in the
      shmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of
      Hugh Dickens.
      
      Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
      concept.  Credit to Hugh Dickens for catching issues with shmem swap vector
      and ramfs allocations.
      
      [akpm@linux-foundation.org: build fix]
      [hugh@veritas.com: __GFP_ZERO cleanup]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      769848c0
  4. 10 7月, 2007 1 次提交
  5. 09 6月, 2007 1 次提交
  6. 17 5月, 2007 1 次提交
    • C
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter 提交于
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  7. 08 5月, 2007 1 次提交
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
  8. 29 3月, 2007 3 次提交
    • H
      [PATCH] holepunch: fix disconnected pages after second truncate · 16a10019
      Hugh Dickins 提交于
      shmem_truncate_range has its own truncate_inode_pages_range, to free any pages
      racily instantiated while it was in progress: a SHMEM_PAGEIN flag is set when
      this might have happened.  But holepunching gets no chance to clear that flag
      at the start of vmtruncate_range, so it's always set (unless a truncate came
      just before), so holepunch almost always does this second
      truncate_inode_pages_range.
      
      shmem holepunch has unlikely swap<->file races hereabouts whatever we do
      (without a fuller rework than is fit for this release): I was going to skip
      the second truncate in the punch_hole case, but Miklos points out that would
      make holepunch correctness more vulnerable to swapoff.  So keep the second
      truncate, but follow it by an unmap_mapping_range to eliminate the
      disconnected pages (freed from pagecache while still mapped in userspace) that
      it might have left behind.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16a10019
    • H
      [PATCH] holepunch: fix shmem_truncate_range punch locking · 1ae70006
      Hugh Dickins 提交于
      Miklos Szeredi observes that during truncation of shmem page directories,
      info->lock is released to improve latency (after lowering i_size and
      next_index to exclude races); but this is quite wrong for holepunching, which
      receives no such protection from i_size or next_index, and is left vulnerable
      to races with shmem_unuse, shmem_getpage and shmem_writepage.
      
      Hold info->lock throughout when holepunching?  No, any user could prevent
      rescheduling for far too long.  Instead take info->lock just when needed: in
      shmem_free_swp when removing the swap entries, and whenever removing a
      directory page from the level above.  But so long as we remove before
      scanning, we can safely skip taking the lock at the lower levels, except at
      misaligned start and end of the hole.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ae70006
    • H
      [PATCH] holepunch: fix shmem_truncate_range punching too far · a2646d1e
      Hugh Dickins 提交于
      Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered in rare
      circumstances, because shmem_truncate_range() erroneously removes partially
      truncated directory pages at the end of the range: later reclaim on pages
      pointing to these removed directories triggers the BUG.  Indeed, and it can
      also cause data loss beyond the hole.
      
      Fix this as in the patch proposed by Miklos, but distinguish between "limit"
      (how far we need to search: ignore truncation's next_index optimization in the
      holepunch case - if there are races it's more consistent to act on the whole
      range specified) and "upper_limit" (how far we can free directory pages:
      generally we must be careful to keep partially punched pages, but can relax at
      end of file - i_size being held stable by i_mutex).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cs>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2646d1e
  9. 05 3月, 2007 1 次提交
  10. 02 3月, 2007 1 次提交
  11. 13 2月, 2007 1 次提交
  12. 12 2月, 2007 1 次提交
    • K
      [PATCH] simplify shmem_aops.set_page_dirty() method · 76719325
      Ken Chen 提交于
      shmem backed file does not have page writeback, nor it participates in
      backing device's dirty or writeback accounting.  So using generic
      __set_page_dirty_nobuffers() for its .set_page_dirty aops method is a bit
      overkill.  It unnecessarily prolongs shm unmap latency.
      
      For example, on a densely populated large shm segment (sevearl GBs), the
      unmapping operation becomes painfully long.  Because at unmap, kernel
      transfers dirty bit in PTE into page struct and to the radix tree tag.  The
      operation of tagging the radix tree is particularly expensive because it
      has to traverse the tree from the root to the leaf node on every dirty
      page.  What's bothering is that radix tree tag is used for page write back.
       However, shmem is memory backed and there is no page write back for such
      file system.  And in the end, we spend all that time tagging radix tree and
      none of that fancy tagging will be used.  So let's simplify it by introduce
      a new aops __set_page_dirty_no_writeback and this will speed up shm unmap.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76719325
  13. 23 12月, 2006 1 次提交
  14. 09 12月, 2006 1 次提交
  15. 08 12月, 2006 3 次提交
  16. 21 10月, 2006 1 次提交
    • A
      [PATCH] separate bdi congestion functions from queue congestion functions · 3fcfab16
      Andrew Morton 提交于
      Separate out the concept of "queue congestion" from "backing-dev congestion".
      Congestion is a backing-dev concept, not a queue concept.
      
      The blk_* congestion functions are retained, as wrappers around the core
      backing-dev congestion functions.
      
      This proper layering is needed so that NFS can cleanly use the congestion
      functions, and so that CONFIG_BLOCK=n actually links.
      
      Cc: "Thomas Maier" <balagi@justmail.de>
      Cc: "Jens Axboe" <jens.axboe@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3fcfab16
  17. 17 10月, 2006 1 次提交
    • D
      [PATCH] knfsd: add nfs-export support to tmpfs · 91828a40
      David M. Grimes 提交于
      We need to encode a decode the 'file' part of a handle.  We simply use the
      inode number and generation number to construct the filehandle.
      
      The generation number is the time when the file was created.  As inode numbers
      cycle through the full 32 bits before being reused, there is no real chance of
      the same inum being allocated to different files in the same second so this is
      suitably unique.  Using time-of-day rather than e.g.  jiffies makes it less
      likely that the same filehandle can be created after a reboot.
      
      In order to be able to decode a filehandle we need to be able to lookup by
      inum, which means that the inode needs to be added to the inode hash table
      (tmpfs doesn't currently hash inodes as there is never a need to lookup by
      inum).  To avoid overhead when not exporting, we only hash an inode when it is
      first exported.  This requires a lock to ensure it isn't hashed twice.
      
      This code is separate from the patch posted in June06 from Atal Shargorodsky
      which provided the same functionality, but does borrow slightly from it.
      
      Locking comment: Most filesystems that hash their inodes do so at the point
      where the 'struct inode' is initialised, and that has suitable locking
      (I_NEW).  Here in shmem, we are hashing the inode later, the first time we
      need an NFS file handle for it.  We no longer have I_NEW to ensure only one
      thread tries to add it to the hash table.
      
      Cc: Atal Shargorodsky <atal@codefidence.com>
      Cc: Gilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NDavid M. Grimes <dgrimes@navisite.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      91828a40
  18. 01 10月, 2006 2 次提交
  19. 30 9月, 2006 1 次提交
  20. 27 9月, 2006 2 次提交
  21. 26 9月, 2006 1 次提交
  22. 01 7月, 2006 2 次提交
  23. 29 6月, 2006 1 次提交
  24. 27 6月, 2006 2 次提交