1. 07 1月, 2011 3 次提交
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • P
      CIFS: Simplify ipv*_connect functions into one (try #4) · a9f1b85e
      Pavel Shilovsky 提交于
      Make connect logic more ip-protocol independent and move RFC1001 stuff into
      a separate function. Also replace union addr in TCP_Server_Info structure
      with sockaddr_storage.
      Signed-off-by: NPavel Shilovsky <piastryyy@gmail.com>
      Reviewed-and-Tested-by: NJeff Layton <jlayton@samba.org>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      a9f1b85e
  2. 07 12月, 2010 1 次提交
  3. 03 12月, 2010 1 次提交
    • S
      cifs: add attribute cache timeout (actimeo) tunable · 6d20e840
      Suresh Jayaraman 提交于
      Currently, the attribute cache timeout for CIFS is hardcoded to 1 second. This
      means that the client might have to issue a QPATHINFO/QFILEINFO call every 1
      second to verify if something has changes, which seems too expensive. On the
      other hand, if the timeout is hardcoded to a higher value, workloads that
      expect strict cache coherency might see unexpected results.
      
      Making attribute cache timeout as a tunable will allow us to make a tradeoff
      between performance and cache metadata correctness depending on the
      application/workload needs.
      
      Add 'actimeo' tunable that can be used to tune the attribute cache timeout.
      The default timeout is set to 1 second. Also, display actimeo option value in
      /proc/mounts.
      
      It appears to me that 'actimeo' and the proposed (but not yet merged)
      'strictcache' option cannot coexist, so care must be taken that we reset the
      other option if one of them is set.
      
      Changes since last post:
         - fix option parsing and handle possible values correcly
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      6d20e840
  4. 30 11月, 2010 1 次提交
  5. 06 11月, 2010 1 次提交
  6. 03 11月, 2010 1 次提交
    • J
      cifs: convert tlink_tree to a rbtree · b647c35f
      Jeff Layton 提交于
      Radix trees are ideal when you want to track a bunch of pointers and
      can't embed a tracking structure within the target of those pointers.
      The tradeoff is an increase in memory, particularly if the tree is
      sparse.
      
      In CIFS, we use the tlink_tree to track tcon_link structs. A tcon_link
      can never be in more than one tlink_tree, so there's no impediment to
      using a rb_tree here instead of a radix tree.
      
      Convert the new multiuser mount code to use a rb_tree instead. This
      should reduce the memory required to manage the tlink_tree.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      b647c35f
  7. 31 10月, 2010 2 次提交
  8. 29 10月, 2010 1 次提交
  9. 25 10月, 2010 1 次提交
  10. 21 10月, 2010 1 次提交
    • S
      cifs: convert cifs_tcp_ses_lock from a rwlock to a spinlock · 3f9bcca7
      Suresh Jayaraman 提交于
      cifs_tcp_ses_lock is a rwlock with protects the cifs_tcp_ses_list,
      server->smb_ses_list and the ses->tcon_list. It also protects a few
      ref counters in server, ses and tcon. In most cases the critical section
      doesn't seem to be large, in a few cases where it is slightly large, there
      seem to be really no benefit from concurrent access. I briefly considered RCU
      mechanism but it appears to me that there is no real need.
      
      Replace it with a spinlock and get rid of the last rwlock in the cifs code.
      Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      3f9bcca7
  11. 18 10月, 2010 1 次提交
    • J
      cifs: convert GlobalSMBSeslock from a rwlock to regular spinlock · 4477288a
      Jeff Layton 提交于
      Convert this lock to a regular spinlock
      
      A rwlock_t offers little value here. It's more expensive than a regular
      spinlock unless you have a fairly large section of code that runs under
      the read lock and can benefit from the concurrency.
      
      Additionally, we need to ensure that the refcounting for files isn't
      racy and to do that we need to lock areas that can increment it for
      write. That means that the areas that can actually use a read_lock are
      very few and relatively infrequently used.
      
      While we're at it, change the name to something easier to type, and fix
      a bug in find_writable_file. cifsFileInfo_put can sleep and shouldn't be
      called while holding the lock.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      4477288a
  12. 13 10月, 2010 1 次提交
    • J
      cifs: don't use vfsmount to pin superblock for oplock breaks · d7c86ff8
      Jeff Layton 提交于
      Filesystems aren't really supposed to do anything with a vfsmount. It's
      considered a layering violation since vfsmounts are entirely managed at
      the VFS layer.
      
      CIFS currently keeps an active reference to a vfsmount in order to
      prevent the superblock vanishing before an oplock break has completed.
      What we really want to do instead is to keep sb->s_active high until the
      oplock break has completed. This patch borrows the scheme that NFS uses
      for handling sillyrenames.
      
      An atomic_t is added to the cifs_sb_info. When it transitions from 0 to
      1, an extra reference to the superblock is taken (by bumping the
      s_active value). When it transitions from 1 to 0, that reference is
      dropped and a the superblock teardown may proceed if there are no more
      references to it.
      
      Also, the vfsmount pointer is removed from cifsFileInfo and from
      cifs_new_fileinfo, and some bogus forward declarations are removed from
      cifsfs.h.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Acked-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      d7c86ff8
  13. 09 10月, 2010 1 次提交
  14. 07 10月, 2010 2 次提交
  15. 05 10月, 2010 3 次提交
    • A
      fs/locks.c: prepare for BKL removal · b89f4321
      Arnd Bergmann 提交于
      This prepares the removal of the big kernel lock from the
      file locking code. We still use the BKL as long as fs/lockd
      uses it and ceph might sleep, but we can flip the definition
      to a private spinlock as soon as that's done.
      All users outside of fs/lockd get converted to use
      lock_flocks() instead of lock_kernel() where appropriate.
      
      Based on an earlier patch to use a spinlock from Matthew
      Wilcox, who has attempted this a few times before, the
      earliest patch from over 10 years ago turned it into
      a semaphore, which ended up being slower than the BKL
      and was subsequently reverted.
      
      Someone should do some serious performance testing when
      this becomes a spinlock, since this has caused problems
      before. Using a spinlock should be at least as good
      as the BKL in theory, but who knows...
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      b89f4321
    • J
      BKL: Remove BKL from CifsFS · b0991aa3
      Jan Blunck 提交于
      The BKL is only used in put_super and fill_super that are both protected by
      the superblocks s_umount rw_semaphore. Therefore it is safe to remove the
      BKL entirely.
      Signed-off-by: NJan Blunck <jblunck@infradead.org>
      Cc: Steve French <smfrench@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      b0991aa3
    • J
      BKL: Explicitly add BKL around get_sb/fill_super · db719222
      Jan Blunck 提交于
      This patch is a preparation necessary to remove the BKL from do_new_mount().
      It explicitly adds calls to lock_kernel()/unlock_kernel() around
      get_sb/fill_super operations for filesystems that still uses the BKL.
      
      I've read through all the code formerly covered by the BKL inside
      do_kern_mount() and have satisfied myself that it doesn't need the BKL
      any more.
      
      do_kern_mount() is already called without the BKL when mounting the rootfs
      and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
      from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
      through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
      afs_mntpt_follow_link(). Both later functions are actually the filesystems
      follow_link inode operation. vfs_kern_mount() is calling the specified
      get_sb function and lets the filesystem do its job by calling the given
      fill_super function.
      
      Therefore I think it is safe to push down the BKL from the VFS to the
      low-level filesystems get_sb/fill_super operation.
      
      [arnd: do not add the BKL to those file systems that already
             don't use it elsewhere]
      Signed-off-by: NJan Blunck <jblunck@infradead.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      db719222
  16. 30 9月, 2010 6 次提交
  17. 10 8月, 2010 2 次提交
  18. 06 8月, 2010 1 次提交
    • W
      DNS: Separate out CIFS DNS Resolver code · 1a4240f4
      Wang Lei 提交于
      Separate out the DNS resolver key type from the CIFS filesystem into its own
      module so that it can be made available for general use, including the AFS
      filesystem module.
      
      This facility makes it possible for the kernel to upcall to userspace to have
      it issue DNS requests, package up the replies and present them to the kernel
      in a useful form.  The kernel is then able to cache the DNS replies as keys
      can be retained in keyrings.
      
      Resolver keys are of type "dns_resolver" and have a case-insensitive
      description that is of the form "[<type>:]<domain_name>".  The optional <type>
      indicates the particular DNS lookup and packaging that's required.  The
      <domain_name> is the query to be made.
      
      If <type> isn't given, a basic hostname to IP address lookup is made, and the
      result is stored in the key in the form of a printable string consisting of a
      comma-separated list of IPv4 and IPv6 addresses.
      
      This key type is supported by userspace helpers driven from /sbin/request-key
      and configured through /etc/request-key.conf.  The cifs.upcall utility is
      invoked for UNC path server name to IP address resolution.
      
      The CIFS functionality is encapsulated by the dns_resolve_unc_to_ip() function,
      which is used to resolve a UNC path to an IP address for CIFS filesystem.  This
      part remains in the CIFS module for now.
      
      See the added Documentation/networking/dns_resolver.txt for more information.
      Signed-off-by: NWang Lei <wang840925@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      1a4240f4
  19. 02 8月, 2010 2 次提交
  20. 23 7月, 2010 2 次提交
    • T
      cifs: use workqueue instead of slow-work · 9b646972
      Tejun Heo 提交于
      Workqueue can now handle high concurrency.  Use system_nrt_wq
      instead of slow-work.
      
      * Updated is_valid_oplock_break() to not call cifs_oplock_break_put()
        as advised by Steve French.  It might cause deadlock.  Instead,
        reference is increased after queueing succeeded and
        cifs_oplock_break() briefly grabs GlobalSMBSeslock before putting
        the cfile to make sure it doesn't put before the matching get is
        finished.
      
      * Anton Blanchard reported that cifs conversion was using now gone
        system_single_wq.  Use system_nrt_wq which provides non-reentrance
        guarantee which is enough and much better.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NSteve French <sfrench@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      9b646972
    • D
      CIFS: Fix a malicious redirect problem in the DNS lookup code · 4c0c03ca
      David Howells 提交于
      Fix the security problem in the CIFS filesystem DNS lookup code in which a
      malicious redirect could be installed by a random user by simply adding a
      result record into one of their keyrings with add_key() and then invoking a
      CIFS CFS lookup [CVE-2010-2524].
      
      This is done by creating an internal keyring specifically for the caching of
      DNS lookups.  To enforce the use of this keyring, the module init routine
      creates a set of override credentials with the keyring installed as the thread
      keyring and instructs request_key() to only install lookup result keys in that
      keyring.
      
      The override is then applied around the call to request_key().
      
      This has some additional benefits when a kernel service uses this module to
      request a key:
      
       (1) The result keys are owned by root, not the user that caused the lookup.
      
       (2) The result keys don't pop up in the user's keyrings.
      
       (3) The result keys don't come out of the quota of the user that caused the
           lookup.
      
      The keyring can be viewed as root by doing cat /proc/keys:
      
      2a0ca6c3 I-----     1 perm 1f030000     0     0 keyring   .dns_resolver: 1/4
      
      It can then be listed with 'keyctl list' by root.
      
      	# keyctl list 0x2a0ca6c3
      	1 key in keyring:
      	726766307: --alswrv     0     0 dns_resolver: foo.bar.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-and-Tested-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NSteve French <smfrench@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c0c03ca
  21. 12 6月, 2010 1 次提交
    • J
      cifs: implement drop_inode superblock op · 12420ac3
      Jeff Layton 提交于
      The standard behavior for drop_inode is to delete the inode when the
      last reference to it is put and the nlink count goes to 0. This helps
      keep inodes that are still considered "not deleted" in cache as long as
      possible even when there aren't dentries attached to them.
      
      When server inode numbers are disabled, it's not possible for cifs_iget
      to ever match an existing inode (since inode numbers are generated via
      iunique). In this situation, cifs can keep a lot of inodes in cache that
      will never be used again.
      
      Implement a drop_inode routine that deletes the inode if server inode
      numbers are disabled on the mount. This helps keep the cifs inode
      caches down to a more manageable size when server inode numbers are
      disabled.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      12420ac3
  22. 12 5月, 2010 1 次提交
  23. 06 5月, 2010 1 次提交
  24. 27 4月, 2010 2 次提交
  25. 22 4月, 2010 1 次提交