1. 13 1月, 2011 1 次提交
  2. 11 1月, 2011 1 次提交
  3. 07 1月, 2011 6 次提交
    • N
      fs: dcache per-bucket dcache hash locking · ceb5bdc2
      Nick Piggin 提交于
      We can turn the dcache hash locking from a global dcache_hash_lock into
      per-bucket locking.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      ceb5bdc2
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: cache optimise dentry and inode for rcu-walk · 44a7d7a8
      Nick Piggin 提交于
      Put dentry and inode fields into top of data structure.  This allows RCU path
      traversal to perform an RCU dentry lookup in a path walk by touching only the
      first 56 bytes of the dentry.
      
      We also fit in 8 bytes of inline name in the first 64 bytes, so for short
      names, only 64 bytes needs to be touched to perform the lookup. We should
      get rid of the hash->prev pointer from the first 64 bytes, and fit 16 bytes
      of name in there, which will take care of 81% rather than 32% of the kernel
      tree.
      
      inode is also rearranged so that RCU lookup will only touch a single cacheline
      in the inode, plus one in the i_ops structure.
      
      This is important for directory component lookups in RCU path walking. In the
      kernel source, directory names average is around 6 chars, so this works.
      
      When we reach the last element of the lookup, we need to lock it and take its
      refcount which requires another cacheline access.
      
      Align dentry and inode operations structs, so members will be at predictable
      offsets and we can group common operations into head of structure.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      44a7d7a8
    • N
      fs: avoid inode RCU freeing for pseudo fs · ff0c7d15
      Nick Piggin 提交于
      Pseudo filesystems that don't put inode on RCU list or reachable by
      rcu-walk dentries do not need to RCU free their inodes.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      ff0c7d15
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • N
      fs: dcache remove dcache_lock · b5c84bf6
      Nick Piggin 提交于
      dcache_lock no longer protects anything. remove it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b5c84bf6
  4. 02 12月, 2010 1 次提交
    • L
      Call the filesystem back whenever a page is removed from the page cache · 6072d13c
      Linus Torvalds 提交于
      NFS needs to be able to release objects that are stored in the page
      cache once the page itself is no longer visible from the page cache.
      
      This patch adds a callback to the address space operations that allows
      filesystems to perform page cleanups once the page has been removed
      from the page cache.
      
      Original patch by: Linus Torvalds <torvalds@linux-foundation.org>
      [trondmy: cover the cases of invalidate_inode_pages2() and
                truncate_inode_pages()]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6072d13c
  5. 25 11月, 2010 1 次提交
  6. 20 11月, 2010 1 次提交
  7. 31 10月, 2010 2 次提交
  8. 30 10月, 2010 1 次提交
  9. 29 10月, 2010 7 次提交
  10. 28 10月, 2010 3 次提交
    • L
      fs: Add FITRIM ioctl · 367a51a3
      Lukas Czerner 提交于
      Adds an filesystem independent ioctl to allow implementation of file
      system batched discard support. I takes fstrim_range structure as an
      argument. fstrim_range is definec in the include/fs.h and its
      definition is as follows.
      
      struct fstrim_range {
      	start;
      	len;
      	minlen;
      }
      
      start	- first Byte to trim
      len	- number of Bytes to trim from start
      minlen	- minimum extent length to trim, free extents shorter than this
      	  number of Bytes will be ignored. This will be rounded up to fs
      	  block size.
      
      It is also possible to specify NULL as an argument. In this case the
      arguments will set itself as follows:
      
      start = 0;
      len = ULLONG_MAX;
      minlen = 0;
      
      So it will trim the whole file system at one run.
      
      After the FITRIM is done, the number of actually discarded Bytes is stored
      in fstrim_range.len to give the user better insight on how much storage
      space has been really released for wear-leveling.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      367a51a3
    • L
      fasync: re-organize fasync entry insertion to allow it under a spinlock · f7347ce4
      Linus Torvalds 提交于
      You currently cannot use "fasync_helper()" in an atomic environment to
      insert a new fasync entry, because it will need to allocate the new
      "struct fasync_struct".
      
      Yet fcntl_setlease() wants to call this under lock_flocks(), which is in
      the process of being converted from the BKL to a spinlock.
      
      In order to fix this, this abstracts out the actual fasync list
      insertion and the fasync allocations into functions of their own, and
      teaches fs/locks.c to pre-allocate the fasync_struct entry.  That way
      the actual list insertion can happen while holding the required
      spinlock.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      [bfields@redhat.com: rebase on top of my changes to Arnd's patch]
      Tested-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      f7347ce4
    • A
      locks/nfsd: allocate file lock outside of spinlock · c5b1f0d9
      Arnd Bergmann 提交于
      As suggested by Christoph Hellwig, this moves allocation
      of new file locks out of generic_setlease into the
      callers, nfs4_open_delegation and fcntl_setlease in order
      to allow GFP_KERNEL allocations when lock_flocks has
      become a spinlock.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJ. Bruce Fields <bfields@redhat.com>
      c5b1f0d9
  11. 27 10月, 2010 3 次提交
    • E
      fs: allow for more than 2^31 files · 518de9b3
      Eric Dumazet 提交于
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of atomic_t.
      
      get_max_files() is changed to return an unsigned long.  get_nr_files() is
      changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      518de9b3
    • E
      IMA: explicit IMA i_flag to remove global lock on inode_delete · 196f5181
      Eric Paris 提交于
      Currently for every removed inode IMA must take a global lock and search
      the IMA rbtree looking for an associated integrity structure.  Instead
      we explicitly mark an inode when we add an integrity structure so we
      only have to take the global lock and do the removal if it exists.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      196f5181
    • E
      IMA: move read counter into struct inode · a178d202
      Eric Paris 提交于
      IMA currently allocated an inode integrity structure for every inode in
      core.  This stucture is about 120 bytes long.  Most files however
      (especially on a system which doesn't make use of IMA) will never need
      any of this space.  The problem is that if IMA is enabled we need to
      know information about the number of readers and the number of writers
      for every inode on the box.  At the moment we collect that information
      in the per inode iint structure and waste the rest of the space.  This
      patch moves those counters into the struct inode so we can eventually
      stop allocating an IMA integrity structure except when absolutely
      needed.
      
      This patch does the minimum needed to move the location of the data.
      Further cleanups, especially the location of counter updates, may still
      be possible.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a178d202
  12. 26 10月, 2010 13 次提交