1. 07 1月, 2011 3 次提交
    • N
      fs: avoid inode RCU freeing for pseudo fs · ff0c7d15
      Nick Piggin 提交于
      Pseudo filesystems that don't put inode on RCU list or reachable by
      rcu-walk dentries do not need to RCU free their inodes.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      ff0c7d15
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • N
      fs: dcache remove dcache_lock · b5c84bf6
      Nick Piggin 提交于
      dcache_lock no longer protects anything. remove it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b5c84bf6
  2. 02 12月, 2010 1 次提交
    • L
      Call the filesystem back whenever a page is removed from the page cache · 6072d13c
      Linus Torvalds 提交于
      NFS needs to be able to release objects that are stored in the page
      cache once the page itself is no longer visible from the page cache.
      
      This patch adds a callback to the address space operations that allows
      filesystems to perform page cleanups once the page has been removed
      from the page cache.
      
      Original patch by: Linus Torvalds <torvalds@linux-foundation.org>
      [trondmy: cover the cases of invalidate_inode_pages2() and
                truncate_inode_pages()]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6072d13c
  3. 25 11月, 2010 1 次提交
  4. 20 11月, 2010 1 次提交
  5. 31 10月, 2010 2 次提交
  6. 30 10月, 2010 1 次提交
  7. 29 10月, 2010 7 次提交
  8. 28 10月, 2010 3 次提交
    • L
      fs: Add FITRIM ioctl · 367a51a3
      Lukas Czerner 提交于
      Adds an filesystem independent ioctl to allow implementation of file
      system batched discard support. I takes fstrim_range structure as an
      argument. fstrim_range is definec in the include/fs.h and its
      definition is as follows.
      
      struct fstrim_range {
      	start;
      	len;
      	minlen;
      }
      
      start	- first Byte to trim
      len	- number of Bytes to trim from start
      minlen	- minimum extent length to trim, free extents shorter than this
      	  number of Bytes will be ignored. This will be rounded up to fs
      	  block size.
      
      It is also possible to specify NULL as an argument. In this case the
      arguments will set itself as follows:
      
      start = 0;
      len = ULLONG_MAX;
      minlen = 0;
      
      So it will trim the whole file system at one run.
      
      After the FITRIM is done, the number of actually discarded Bytes is stored
      in fstrim_range.len to give the user better insight on how much storage
      space has been really released for wear-leveling.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      367a51a3
    • L
      fasync: re-organize fasync entry insertion to allow it under a spinlock · f7347ce4
      Linus Torvalds 提交于
      You currently cannot use "fasync_helper()" in an atomic environment to
      insert a new fasync entry, because it will need to allocate the new
      "struct fasync_struct".
      
      Yet fcntl_setlease() wants to call this under lock_flocks(), which is in
      the process of being converted from the BKL to a spinlock.
      
      In order to fix this, this abstracts out the actual fasync list
      insertion and the fasync allocations into functions of their own, and
      teaches fs/locks.c to pre-allocate the fasync_struct entry.  That way
      the actual list insertion can happen while holding the required
      spinlock.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      [bfields@redhat.com: rebase on top of my changes to Arnd's patch]
      Tested-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      f7347ce4
    • A
      locks/nfsd: allocate file lock outside of spinlock · c5b1f0d9
      Arnd Bergmann 提交于
      As suggested by Christoph Hellwig, this moves allocation
      of new file locks out of generic_setlease into the
      callers, nfs4_open_delegation and fcntl_setlease in order
      to allow GFP_KERNEL allocations when lock_flocks has
      become a spinlock.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJ. Bruce Fields <bfields@redhat.com>
      c5b1f0d9
  9. 27 10月, 2010 3 次提交
    • E
      fs: allow for more than 2^31 files · 518de9b3
      Eric Dumazet 提交于
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of atomic_t.
      
      get_max_files() is changed to return an unsigned long.  get_nr_files() is
      changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      518de9b3
    • E
      IMA: explicit IMA i_flag to remove global lock on inode_delete · 196f5181
      Eric Paris 提交于
      Currently for every removed inode IMA must take a global lock and search
      the IMA rbtree looking for an associated integrity structure.  Instead
      we explicitly mark an inode when we add an integrity structure so we
      only have to take the global lock and do the removal if it exists.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      196f5181
    • E
      IMA: move read counter into struct inode · a178d202
      Eric Paris 提交于
      IMA currently allocated an inode integrity structure for every inode in
      core.  This stucture is about 120 bytes long.  Most files however
      (especially on a system which doesn't make use of IMA) will never need
      any of this space.  The problem is that if IMA is enabled we need to
      know information about the number of readers and the number of writers
      for every inode on the box.  At the moment we collect that information
      in the per inode iint structure and waste the rest of the space.  This
      patch moves those counters into the struct inode so we can eventually
      stop allocating an IMA integrity structure except when absolutely
      needed.
      
      This patch does the minimum needed to move the location of the data.
      Further cleanups, especially the location of counter updates, may still
      be possible.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a178d202
  10. 26 10月, 2010 13 次提交
  11. 05 10月, 2010 1 次提交
    • A
      fs/locks.c: prepare for BKL removal · b89f4321
      Arnd Bergmann 提交于
      This prepares the removal of the big kernel lock from the
      file locking code. We still use the BKL as long as fs/lockd
      uses it and ceph might sleep, but we can flip the definition
      to a private spinlock as soon as that's done.
      All users outside of fs/lockd get converted to use
      lock_flocks() instead of lock_kernel() where appropriate.
      
      Based on an earlier patch to use a spinlock from Matthew
      Wilcox, who has attempted this a few times before, the
      earliest patch from over 10 years ago turned it into
      a semaphore, which ended up being slower than the BKL
      and was subsequently reverted.
      
      Someone should do some serious performance testing when
      this becomes a spinlock, since this has caused problems
      before. Using a spinlock should be at least as good
      as the BKL in theory, but who knows...
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      b89f4321
  12. 22 9月, 2010 1 次提交
  13. 16 9月, 2010 1 次提交
  14. 10 9月, 2010 2 次提交