1. 10 7月, 2007 1 次提交
  2. 09 6月, 2007 1 次提交
  3. 17 5月, 2007 1 次提交
    • C
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter 提交于
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  4. 08 5月, 2007 1 次提交
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
  5. 29 3月, 2007 3 次提交
    • H
      [PATCH] holepunch: fix disconnected pages after second truncate · 16a10019
      Hugh Dickins 提交于
      shmem_truncate_range has its own truncate_inode_pages_range, to free any pages
      racily instantiated while it was in progress: a SHMEM_PAGEIN flag is set when
      this might have happened.  But holepunching gets no chance to clear that flag
      at the start of vmtruncate_range, so it's always set (unless a truncate came
      just before), so holepunch almost always does this second
      truncate_inode_pages_range.
      
      shmem holepunch has unlikely swap<->file races hereabouts whatever we do
      (without a fuller rework than is fit for this release): I was going to skip
      the second truncate in the punch_hole case, but Miklos points out that would
      make holepunch correctness more vulnerable to swapoff.  So keep the second
      truncate, but follow it by an unmap_mapping_range to eliminate the
      disconnected pages (freed from pagecache while still mapped in userspace) that
      it might have left behind.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16a10019
    • H
      [PATCH] holepunch: fix shmem_truncate_range punch locking · 1ae70006
      Hugh Dickins 提交于
      Miklos Szeredi observes that during truncation of shmem page directories,
      info->lock is released to improve latency (after lowering i_size and
      next_index to exclude races); but this is quite wrong for holepunching, which
      receives no such protection from i_size or next_index, and is left vulnerable
      to races with shmem_unuse, shmem_getpage and shmem_writepage.
      
      Hold info->lock throughout when holepunching?  No, any user could prevent
      rescheduling for far too long.  Instead take info->lock just when needed: in
      shmem_free_swp when removing the swap entries, and whenever removing a
      directory page from the level above.  But so long as we remove before
      scanning, we can safely skip taking the lock at the lower levels, except at
      misaligned start and end of the hole.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ae70006
    • H
      [PATCH] holepunch: fix shmem_truncate_range punching too far · a2646d1e
      Hugh Dickins 提交于
      Miklos Szeredi observes BUG_ON(!entry) in shmem_writepage() triggered in rare
      circumstances, because shmem_truncate_range() erroneously removes partially
      truncated directory pages at the end of the range: later reclaim on pages
      pointing to these removed directories triggers the BUG.  Indeed, and it can
      also cause data loss beyond the hole.
      
      Fix this as in the patch proposed by Miklos, but distinguish between "limit"
      (how far we need to search: ignore truncation's next_index optimization in the
      holepunch case - if there are races it's more consistent to act on the whole
      range specified) and "upper_limit" (how far we can free directory pages:
      generally we must be careful to keep partially punched pages, but can relax at
      end of file - i_size being held stable by i_mutex).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Miklos Szeredi <mszeredi@suse.cs>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2646d1e
  6. 05 3月, 2007 1 次提交
  7. 02 3月, 2007 1 次提交
  8. 13 2月, 2007 1 次提交
  9. 12 2月, 2007 1 次提交
    • K
      [PATCH] simplify shmem_aops.set_page_dirty() method · 76719325
      Ken Chen 提交于
      shmem backed file does not have page writeback, nor it participates in
      backing device's dirty or writeback accounting.  So using generic
      __set_page_dirty_nobuffers() for its .set_page_dirty aops method is a bit
      overkill.  It unnecessarily prolongs shm unmap latency.
      
      For example, on a densely populated large shm segment (sevearl GBs), the
      unmapping operation becomes painfully long.  Because at unmap, kernel
      transfers dirty bit in PTE into page struct and to the radix tree tag.  The
      operation of tagging the radix tree is particularly expensive because it
      has to traverse the tree from the root to the leaf node on every dirty
      page.  What's bothering is that radix tree tag is used for page write back.
       However, shmem is memory backed and there is no page write back for such
      file system.  And in the end, we spend all that time tagging radix tree and
      none of that fancy tagging will be used.  So let's simplify it by introduce
      a new aops __set_page_dirty_no_writeback and this will speed up shm unmap.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76719325
  10. 23 12月, 2006 1 次提交
  11. 09 12月, 2006 1 次提交
  12. 08 12月, 2006 3 次提交
  13. 21 10月, 2006 1 次提交
    • A
      [PATCH] separate bdi congestion functions from queue congestion functions · 3fcfab16
      Andrew Morton 提交于
      Separate out the concept of "queue congestion" from "backing-dev congestion".
      Congestion is a backing-dev concept, not a queue concept.
      
      The blk_* congestion functions are retained, as wrappers around the core
      backing-dev congestion functions.
      
      This proper layering is needed so that NFS can cleanly use the congestion
      functions, and so that CONFIG_BLOCK=n actually links.
      
      Cc: "Thomas Maier" <balagi@justmail.de>
      Cc: "Jens Axboe" <jens.axboe@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3fcfab16
  14. 17 10月, 2006 1 次提交
    • D
      [PATCH] knfsd: add nfs-export support to tmpfs · 91828a40
      David M. Grimes 提交于
      We need to encode a decode the 'file' part of a handle.  We simply use the
      inode number and generation number to construct the filehandle.
      
      The generation number is the time when the file was created.  As inode numbers
      cycle through the full 32 bits before being reused, there is no real chance of
      the same inum being allocated to different files in the same second so this is
      suitably unique.  Using time-of-day rather than e.g.  jiffies makes it less
      likely that the same filehandle can be created after a reboot.
      
      In order to be able to decode a filehandle we need to be able to lookup by
      inum, which means that the inode needs to be added to the inode hash table
      (tmpfs doesn't currently hash inodes as there is never a need to lookup by
      inum).  To avoid overhead when not exporting, we only hash an inode when it is
      first exported.  This requires a lock to ensure it isn't hashed twice.
      
      This code is separate from the patch posted in June06 from Atal Shargorodsky
      which provided the same functionality, but does borrow slightly from it.
      
      Locking comment: Most filesystems that hash their inodes do so at the point
      where the 'struct inode' is initialised, and that has suitable locking
      (I_NEW).  Here in shmem, we are hashing the inode later, the first time we
      need an NFS file handle for it.  We no longer have I_NEW to ensure only one
      thread tries to add it to the hash table.
      
      Cc: Atal Shargorodsky <atal@codefidence.com>
      Cc: Gilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NDavid M. Grimes <dgrimes@navisite.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      91828a40
  15. 01 10月, 2006 2 次提交
  16. 30 9月, 2006 1 次提交
  17. 27 9月, 2006 2 次提交
  18. 26 9月, 2006 1 次提交
  19. 01 7月, 2006 2 次提交
  20. 29 6月, 2006 1 次提交
  21. 27 6月, 2006 2 次提交
  22. 23 6月, 2006 3 次提交
    • C
      [PATCH] migration: remove unnecessary PageSwapCache checks · 3c5a87f4
      Christoph Lameter 提交于
      Remove two unnecessary PageSwapCache checks.  The page refcount is raised
      and therefore page migration cannot occur in both functions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3c5a87f4
    • D
      [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry · 726c3342
      David Howells 提交于
      Give the statfs superblock operation a dentry pointer rather than a superblock
      pointer.
      
      This complements the get_sb() patch.  That reduced the significance of
      sb->s_root, allowing NFS to place a fake root there.  However, NFS does
      require a dentry to use as a target for the statfs operation.  This permits
      the root in the vfsmount to be used instead.
      
      linux/mount.h has been added where necessary to make allyesconfig build
      successfully.
      
      Interest has also been expressed for use with the FUSE and XFS filesystems.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c3342
    • D
      [PATCH] VFS: Permit filesystem to override root dentry on mount · 454e2398
      David Howells 提交于
      Extend the get_sb() filesystem operation to take an extra argument that
      permits the VFS to pass in the target vfsmount that defines the mountpoint.
      
      The filesystem is then required to manually set the superblock and root dentry
      pointers.  For most filesystems, this should be done with simple_set_mnt()
      which will set the superblock pointer and then set the root dentry to the
      superblock's s_root (as per the old default behaviour).
      
      The get_sb() op now returns an integer as there's now no need to return the
      superblock pointer.
      
      This patch permits a superblock to be implicitly shared amongst several mount
      points, such as can be done with NFS to avoid potential inode aliasing.  In
      such a case, simple_set_mnt() would not be called, and instead the mnt_root
      and mnt_sb would be set directly.
      
      The patch also makes the following changes:
      
       (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
           pointer argument and return an integer, so most filesystems have to change
           very little.
      
       (*) If one of the convenience function is not used, then get_sb() should
           normally call simple_set_mnt() to instantiate the vfsmount. This will
           always return 0, and so can be tail-called from get_sb().
      
       (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
           dcache upon superblock destruction rather than shrink_dcache_anon().
      
           This is required because the superblock may now have multiple trees that
           aren't actually bound to s_root, but that still need to be cleaned up. The
           currently called functions assume that the whole tree is rooted at s_root,
           and that anonymous dentries are not the roots of trees which results in
           dentries being left unculled.
      
           However, with the way NFS superblock sharing are currently set to be
           implemented, these assumptions are violated: the root of the filesystem is
           simply a dummy dentry and inode (the real inode for '/' may well be
           inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
           with child trees.
      
           [*] Anonymous until discovered from another tree.
      
       (*) The documentation has been adjusted, including the additional bit of
           changing ext2_* into foo_* in the documentation.
      
      [akpm@osdl.org: convert ipath_fs, do other stuff]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      454e2398
  23. 13 6月, 2006 2 次提交
  24. 09 6月, 2006 1 次提交
  25. 23 4月, 2006 1 次提交
    • L
      [PATCH] add migratepage address space op to shmem · 304dbdb7
      Lee Schermerhorn 提交于
      Basic problem: pages of a shared memory segment can only be migrated once.
      
      In 2.6.16 through 2.6.17-rc1, shared memory mappings do not have a
      migratepage address space op.  Therefore, migrate_pages() falls back to
      default processing.  In this path, it will try to pageout() dirty pages.
      Once a shared memory page has been migrated it becomes dirty, so
      migrate_pages() will try to page it out.  However, because the page count
      is 3 [cache + current + pte], pageout() will return PAGE_KEEP because
      is_page_cache_freeable() returns false.  This will abort all subsequent
      migrations.
      
      This patch adds a migratepage address space op to shared memory segments to
      avoid taking the default path.  We use the "migrate_page()" function
      because it knows how to migrate dirty pages.  This allows shared memory
      segment pages to migrate, subject to other conditions such as # pte's
      referencing the page [page_mapcount(page)], when requested.
      
      I think this is safe.  If we're migrating a shared memory page, then we
      found the page via a page table, so it must be in memory.
      
      Can be verified with memtoy and the shmem-mbind-test script, both
      available at:  http://free.linux.hp.com/~lts/Tools/Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      304dbdb7
  26. 22 3月, 2006 2 次提交
  27. 22 2月, 2006 1 次提交
    • H
      [PATCH] tmpfs: fix mount mpol nodelist parsing · b00dc3ad
      Hugh Dickins 提交于
      I've been dissatisfied with the mpol_nodelist mount option which was
      added to tmpfs earlier in -rc.  Replace it by mpol=policy:nodelist.
      
      And it was broken: a nodelist is a comma-separated list of numbers and
      ranges; the mount options are a comma-separated list of token=values.
      Whoops, blindly strsep'ing on commas doesn't work so well: since we've
      no numeric tokens, and unlikely to add them, use that to distinguish.
      
      Move the mpol= parsing to shmem_parse_mpol under CONFIG_NUMA, reject
      all its options as invalid if not NUMA.  /proc shows MPOL_PREFERRED
      as "prefer", so use that name for the policy instead of "preferred".
      
      Enforce that mpol=default has no nodelist; that mpol=prefer has one
      node only; that mpol=bind has a nodelist; but let mpol=interleave use
      node_online_map if no nodelist given.  Describe this in tmpfs.txt.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRobin Holt <holt@sgi.com>
      Acked-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b00dc3ad
  28. 02 2月, 2006 1 次提交