1. 14 10月, 2007 1 次提交
    • P
      lockdep: annotate dir vs file i_mutex · 14358e6d
      Peter Zijlstra 提交于
      On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote:
      > The circular lock seems to be this:
      > 
      > #1:
      > 
      >   sys_mmap2:              down_write(&mm->mmap_sem);
      >   nfs_revalidate_mapping: mutex_lock(&inode->i_mutex);
      > 
      > 
      > #0:
      > 
      >   vfs_readdir:     mutex_lock(&inode->i_mutex);
      >    - during the readdir (filldir64), we take a user fault (missing page?)
      >     and call do_page_fault -
      >   do_page_fault:   down_read(&mm->mmap_sem);
      > 
      > 
      > So it does indeed look like a circular locking. Now the question is, "is
      > this a bug?".  Looking like the inode of #1 must be a file or something
      > else that you can mmap and the inode of #0 seems it must be a directory.
      > I would say "no".
      > 
      > Now if you can readdir on a file or mmap a directory, then this could be
      > an issue.
      > 
      > Otherwise, I'd love to see someone teach lockdep about this issue! ;-)
      
      Make a distinction between file and dir usage of i_mutex.
      The inode should be complete and unused at unlock_new_inode(), re-init
      i_mutex depending on its type.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      14358e6d
  2. 15 10月, 2007 1 次提交
  3. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  4. 18 7月, 2007 2 次提交
    • R
      mm: clean up and kernelify shrinker registration · 8e1f936b
      Rusty Russell 提交于
      I can never remember what the function to register to receive VM pressure
      is called.  I have to trace down from __alloc_pages() to find it.
      
      It's called "set_shrinker()", and it needs Your Help.
      
      1) Don't hide struct shrinker.  It contains no magic.
      2) Don't allocate "struct shrinker".  It's not helpful.
      3) Call them "register_shrinker" and "unregister_shrinker".
      4) Call the function "shrink" not "shrinker".
      5) Reduce the 17 lines of waffly comments to 13, but document it properly.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e1f936b
    • M
      Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated · 769848c0
      Mel Gorman 提交于
      It is often known at allocation time whether a page may be migrated or not.
      This patch adds a flag called __GFP_MOVABLE and a new mask called
      GFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated
      using the page migration mechanism or reclaimed by syncing with backing
      storage and discarding.
      
      An API function very similar to alloc_zeroed_user_highpage() is added for
      __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The
      flags used by alloc_zeroed_user_highpage() are not changed because it would
      change the semantics of an existing API.  After this patch is applied there
      are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
      be marked deprecated if this patch is merged.
      
      Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
      shmem.c to keep all flag modifications to inode->mapping in the
      shmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of
      Hugh Dickens.
      
      Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
      concept.  Credit to Hugh Dickens for catching issues with shmem swap vector
      and ramfs allocations.
      
      [akpm@linux-foundation.org: build fix]
      [hugh@veritas.com: __GFP_ZERO cleanup]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      769848c0
  5. 17 5月, 2007 1 次提交
    • C
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter 提交于
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  6. 09 5月, 2007 4 次提交
    • J
      inode numbering: make static counters in new_inode and iunique be 32 bits · 866b04fc
      Jeff Layton 提交于
      The problems are:
      
      - on filesystems w/o permanent inode numbers, i_ino values can be larger
        than 32 bits, which can cause problems for some 32 bit userspace programs on
        a 64 bit kernel.  We can't do anything for filesystems that have actual
        >32-bit inode numbers, but on filesystems that generate i_ino values on the
        fly, we should try to have them fit in 32 bits.  We could trivially fix this
        by making the static counters in new_inode and iunique 32 bits, but...
      
      - many filesystems call new_inode and assume that the i_ino values they are
        given are unique.  They are not guaranteed to be so, since the static
        counter can wrap.  This problem is exacerbated by the fix for #1.
      
      - after allocating a new inode, some filesystems call iunique to try to get
        a unique i_ino value, but they don't actually add their inodes to the
        hashtable, and so they're still not guaranteed to be unique if that counter
        wraps.
      
      This patch set takes the simpler approach of simply using iunique and hashing
      the inodes afterward.  Christoph H.  previously mentioned that he thought that
      this approach may slow down lookups for filesystems that currently hash their
      inodes.
      
      The questions are:
      
      1) how much would this slow down lookups for these filesystems?
      2) is it enough to justify adding more infrastructure to avoid it?
      
      What might be best is to start with this approach and then only move to using
      IDR or some other scheme if these extra inodes in the hashtable prove to be
      problematic.
      
      I've done some cursory testing with this patch and the overhead of hashing and
      unhashing the inodes with pipefs is pretty low -- just a few seconds of system
      time added on to the creation and destruction of 10 million pipes (very
      similar to the overhead that the IDR approach would add).
      
      The hard thing to measure is what effect this has on other filesystems. I'm
      open to ways to try and gauge this.
      
      Again, I've only converted pipefs as an example. If this approach is
      acceptable then I'll start work on patches to convert other filesystems.
      
      With a pretty-much-worst-case microbenchmark provided by Eric Dumazet
      <dada1@cosmosbay.com>:
      
      hashing patch (pipebench):
      sys     1m15.329s
      sys     1m16.249s
      sys     1m17.169s
      
      unpatched (pipebench):
      sys     1m9.836s
      sys     1m12.541s
      sys     1m14.153s
      
      Which works out to 1.05642174294555027017.  So ~5-6% slowdown.
      
      This patch:
      
      When a 32-bit program that was not compiled with large file offsets does a
      stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
      (correctly) generates an EOVERFLOW error.  We can't do anything about fs's
      with larger permanent inode numbers, but when we generate them on the fly, we
      ought to try and have them fit within a 32 bit field.
      
      This patch takes the first step toward this by making the static counters in
      these two functions be 32 bits.
      
      [jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat]
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      866b04fc
    • P
      Introduce a handy list_first_entry macro · b5e61818
      Pavel Emelianov 提交于
      There are many places in the kernel where the construction like
      
         foo = list_entry(head->next, struct foo_struct, list);
      
      are used.
      The code might look more descriptive and neat if using the macro
      
         list_first_entry(head, type, member) \
                   list_entry((head)->next, type, member)
      
      Here is the macro itself and the examples of its usage in the generic code.
       If it will turn out to be useful, I can prepare the set of patches to
      inject in into arch-specific code, drivers, networking, etc.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5e61818
    • J
      make iunique use a do/while loop rather than its obscure goto loop · 3361c7be
      Jeffrey Layton 提交于
      A while back, Christoph mentioned that he thought that iunique ought to be
      cleaned up to use a more conventional loop construct. This patch does that,
      turning the strange goto loop into a do/while.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3361c7be
    • C
      vfs: remove superflous sb == NULL checks · acb0c854
      Christoph Hellwig 提交于
      inode->i_sb is always set, not need to check for it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      acb0c854
  7. 08 5月, 2007 1 次提交
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
  8. 13 2月, 2007 2 次提交
  9. 12 2月, 2007 3 次提交
  10. 14 12月, 2006 2 次提交
  11. 09 12月, 2006 1 次提交
  12. 08 12月, 2006 3 次提交
  13. 20 10月, 2006 1 次提交
    • M
      [PATCH] Take i_mutex in splice_from_pipe() · 62752ee1
      Mark Fasheh 提交于
      The splice_actor may be calling ->prepare_write() and ->commit_write(). We
      want i_mutex on the inode being written to before calling those so that we
      don't race i_size changes.
      
      The double locking behavior is done elsewhere in splice.c, and if we
      eventually want _nolock variants of generic_file_splice_write(), fs modules
      might have to replicate the nasty locking code. We introduce
      inode_double_lock() and inode_double_unlock() to consolidate the locking
      rules into one set of functions.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      62752ee1
  14. 11 10月, 2006 1 次提交
  15. 02 10月, 2006 1 次提交
  16. 01 10月, 2006 1 次提交
  17. 30 9月, 2006 1 次提交
    • A
      [PATCH] fs.h: ifdef security fields · 50462062
      Alexey Dobriyan 提交于
      [assuming BSD security levels are deleted]
      The only user of i_security, f_security, s_security fields is SELinux,
      however, quite a few security modules are trying to get into kernel.
      So, wrap them under CONFIG_SECURITY. Adding config option for each
      security field is likely an overkill.
      
      Following Stephen Smalley's suggestion, i_security initialization is
      moved to security_inode_alloc() to not clutter core code with ifdefs
      and make alloc_inode() codepath tiny little bit smaller and faster.
      
      The user of (highly greppable) struct fown_struct::security field is
      still to be found. I've checked every "fown_struct" and every "f_owner"
      occurence. Additionally it's removal doesn't break i386 allmodconfig
      build.
      
      struct inode, struct file, struct super_block, struct fown_struct
      become smaller.
      
      P.S. Combined with two reiserfs inode shrinking patches sent to
      linux-fsdevel, I can finally suck 12 reiserfs inodes into one page.
      
      		/proc/slabinfo
      
      	-ext2_inode_cache	388	10
      	+ext2_inode_cache	384	10
      	-inode_cache		280	14
      	+inode_cache		276	14
      	-proc_inode_cache	296	13
      	+proc_inode_cache	292	13
      	-reiser_inode_cache	336	11
      	+reiser_inode_cache	332	12 <=
      	-shmem_inode_cache	372	10
      	+shmem_inode_cache	368	10
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      50462062
  18. 27 9月, 2006 3 次提交
  19. 01 7月, 2006 2 次提交
  20. 29 6月, 2006 1 次提交
  21. 02 4月, 2006 1 次提交
  22. 29 3月, 2006 1 次提交
  23. 27 3月, 2006 1 次提交
  24. 26 3月, 2006 1 次提交
  25. 24 3月, 2006 1 次提交
    • P
      [PATCH] cpuset memory spread slab cache hooks · b0196009
      Paul Jackson 提交于
      Change the kmem_cache_create calls for certain slab caches to support cpuset
      memory spreading.
      
      See the previous patches, cpuset_mem_spread, for an explanation of cpuset
      memory spreading, and cpuset_mem_spread_slab_cache for the slab cache support
      for memory spreading.
      
      The slab caches marked for now are: dentry_cache, inode_cache, some xfs slab
      caches, and buffer_head.  This list may change over time.  In particular,
      other file system types that are used extensively on large NUMA systems may
      want to allow for spreading their directory and inode slab cache entries.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b0196009
  26. 23 3月, 2006 2 次提交