1. 12 4月, 2011 4 次提交
    • J
      cifs: don't allow mmap'ed pages to be dirtied while under writeback (try #3) · ca83ce3d
      Jeff Layton 提交于
      This is more or less the same patch as before, but with some merge
      conflicts fixed up.
      
      If a process has a dirty page mapped into its page tables, then it has
      the ability to change it while the client is trying to write the data
      out to the server. If that happens after the signature has been
      calculated then that signature will then be wrong, and the server will
      likely reset the TCP connection.
      
      This patch adds a page_mkwrite handler for CIFS that simply takes the
      page lock. Because the page lock is held over the life of writepage and
      writepages, this prevents the page from becoming writeable until
      the write call has completed.
      
      With this, we can also remove the "sign_zero_copy" module option and
      always inline the pages when writing.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      ca83ce3d
    • J
      cifs: set ra_pages in backing_dev_info · 2b6c26a0
      Jeff Layton 提交于
      Commit 522440ed made cifs set backing_dev_info on the mapping attached
      to new inodes. This change caused a fairly significant read performance
      regression, as cifs started doing page-sized reads exclusively.
      
      By virtue of the fact that they're allocated as part of cifs_sb_info by
      kzalloc, the ra_pages on cifs BDIs get set to 0, which prevents any
      readahead. This forces the normal read codepaths to use readpage instead
      of readpages causing a four-fold increase in the number of read calls
      with the default rsize.
      
      Fix it by setting ra_pages in the BDI to the same value as that in the
      default_backing_dev_info.
      
      Fixes https://bugzilla.kernel.org/show_bug.cgi?id=31662
      
      Cc: stable@kernel.org
      Reported-and-Tested-by: NTill <till2.schaefer@uni-dortmund.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      2b6c26a0
    • S
      Allow user names longer than 32 bytes · 8727c8a8
      Steve French 提交于
      We artificially limited the user name to 32 bytes, but modern servers handle
      larger.  Set the maximum length to a reasonable 256, and make the user name
      string dynamically allocated rather than a fixed size in session structure.
      Also clean up old checkpatch warning.
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      8727c8a8
    • J
      cifs: replace /proc/fs/cifs/Experimental with a module parm · bdf1b03e
      Jeff Layton 提交于
      This flag currently only affects whether we allow "zero-copy" writes
      with signing enabled. Typically we map pages in the pagecache directly
      into the write request. If signing is enabled however and the contents
      of the page change after the signature is calculated but before the
      write is sent then the signature will be wrong. Servers typically
      respond to this by closing down the socket.
      
      Still, this can provide a performance benefit so the "Experimental" flag
      was overloaded to allow this. That's really not a good place for this
      option however since it's not clear what that flag does.
      
      Move that flag instead to a new module parameter that better describes
      its purpose. That's also better since it can be set at module insertion
      time by configuring modprobe.d.
      Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      bdf1b03e
  2. 26 1月, 2011 1 次提交
  3. 21 1月, 2011 4 次提交
  4. 13 1月, 2011 1 次提交
  5. 10 1月, 2011 1 次提交
  6. 07 1月, 2011 3 次提交
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • P
      CIFS: Simplify ipv*_connect functions into one (try #4) · a9f1b85e
      Pavel Shilovsky 提交于
      Make connect logic more ip-protocol independent and move RFC1001 stuff into
      a separate function. Also replace union addr in TCP_Server_Info structure
      with sockaddr_storage.
      Signed-off-by: NPavel Shilovsky <piastryyy@gmail.com>
      Reviewed-and-Tested-by: NJeff Layton <jlayton@samba.org>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      a9f1b85e
  7. 07 12月, 2010 1 次提交
  8. 03 12月, 2010 1 次提交
    • S
      cifs: add attribute cache timeout (actimeo) tunable · 6d20e840
      Suresh Jayaraman 提交于
      Currently, the attribute cache timeout for CIFS is hardcoded to 1 second. This
      means that the client might have to issue a QPATHINFO/QFILEINFO call every 1
      second to verify if something has changes, which seems too expensive. On the
      other hand, if the timeout is hardcoded to a higher value, workloads that
      expect strict cache coherency might see unexpected results.
      
      Making attribute cache timeout as a tunable will allow us to make a tradeoff
      between performance and cache metadata correctness depending on the
      application/workload needs.
      
      Add 'actimeo' tunable that can be used to tune the attribute cache timeout.
      The default timeout is set to 1 second. Also, display actimeo option value in
      /proc/mounts.
      
      It appears to me that 'actimeo' and the proposed (but not yet merged)
      'strictcache' option cannot coexist, so care must be taken that we reset the
      other option if one of them is set.
      
      Changes since last post:
         - fix option parsing and handle possible values correcly
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      6d20e840
  9. 30 11月, 2010 1 次提交
  10. 06 11月, 2010 1 次提交
  11. 03 11月, 2010 1 次提交
    • J
      cifs: convert tlink_tree to a rbtree · b647c35f
      Jeff Layton 提交于
      Radix trees are ideal when you want to track a bunch of pointers and
      can't embed a tracking structure within the target of those pointers.
      The tradeoff is an increase in memory, particularly if the tree is
      sparse.
      
      In CIFS, we use the tlink_tree to track tcon_link structs. A tcon_link
      can never be in more than one tlink_tree, so there's no impediment to
      using a rb_tree here instead of a radix tree.
      
      Convert the new multiuser mount code to use a rb_tree instead. This
      should reduce the memory required to manage the tlink_tree.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      b647c35f
  12. 31 10月, 2010 2 次提交
  13. 29 10月, 2010 1 次提交
  14. 25 10月, 2010 1 次提交
  15. 21 10月, 2010 1 次提交
    • S
      cifs: convert cifs_tcp_ses_lock from a rwlock to a spinlock · 3f9bcca7
      Suresh Jayaraman 提交于
      cifs_tcp_ses_lock is a rwlock with protects the cifs_tcp_ses_list,
      server->smb_ses_list and the ses->tcon_list. It also protects a few
      ref counters in server, ses and tcon. In most cases the critical section
      doesn't seem to be large, in a few cases where it is slightly large, there
      seem to be really no benefit from concurrent access. I briefly considered RCU
      mechanism but it appears to me that there is no real need.
      
      Replace it with a spinlock and get rid of the last rwlock in the cifs code.
      Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      3f9bcca7
  16. 18 10月, 2010 1 次提交
    • J
      cifs: convert GlobalSMBSeslock from a rwlock to regular spinlock · 4477288a
      Jeff Layton 提交于
      Convert this lock to a regular spinlock
      
      A rwlock_t offers little value here. It's more expensive than a regular
      spinlock unless you have a fairly large section of code that runs under
      the read lock and can benefit from the concurrency.
      
      Additionally, we need to ensure that the refcounting for files isn't
      racy and to do that we need to lock areas that can increment it for
      write. That means that the areas that can actually use a read_lock are
      very few and relatively infrequently used.
      
      While we're at it, change the name to something easier to type, and fix
      a bug in find_writable_file. cifsFileInfo_put can sleep and shouldn't be
      called while holding the lock.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      4477288a
  17. 13 10月, 2010 1 次提交
    • J
      cifs: don't use vfsmount to pin superblock for oplock breaks · d7c86ff8
      Jeff Layton 提交于
      Filesystems aren't really supposed to do anything with a vfsmount. It's
      considered a layering violation since vfsmounts are entirely managed at
      the VFS layer.
      
      CIFS currently keeps an active reference to a vfsmount in order to
      prevent the superblock vanishing before an oplock break has completed.
      What we really want to do instead is to keep sb->s_active high until the
      oplock break has completed. This patch borrows the scheme that NFS uses
      for handling sillyrenames.
      
      An atomic_t is added to the cifs_sb_info. When it transitions from 0 to
      1, an extra reference to the superblock is taken (by bumping the
      s_active value). When it transitions from 1 to 0, that reference is
      dropped and a the superblock teardown may proceed if there are no more
      references to it.
      
      Also, the vfsmount pointer is removed from cifsFileInfo and from
      cifs_new_fileinfo, and some bogus forward declarations are removed from
      cifsfs.h.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Acked-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      d7c86ff8
  18. 09 10月, 2010 1 次提交
  19. 07 10月, 2010 2 次提交
  20. 05 10月, 2010 3 次提交
    • A
      fs/locks.c: prepare for BKL removal · b89f4321
      Arnd Bergmann 提交于
      This prepares the removal of the big kernel lock from the
      file locking code. We still use the BKL as long as fs/lockd
      uses it and ceph might sleep, but we can flip the definition
      to a private spinlock as soon as that's done.
      All users outside of fs/lockd get converted to use
      lock_flocks() instead of lock_kernel() where appropriate.
      
      Based on an earlier patch to use a spinlock from Matthew
      Wilcox, who has attempted this a few times before, the
      earliest patch from over 10 years ago turned it into
      a semaphore, which ended up being slower than the BKL
      and was subsequently reverted.
      
      Someone should do some serious performance testing when
      this becomes a spinlock, since this has caused problems
      before. Using a spinlock should be at least as good
      as the BKL in theory, but who knows...
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      b89f4321
    • J
      BKL: Remove BKL from CifsFS · b0991aa3
      Jan Blunck 提交于
      The BKL is only used in put_super and fill_super that are both protected by
      the superblocks s_umount rw_semaphore. Therefore it is safe to remove the
      BKL entirely.
      Signed-off-by: NJan Blunck <jblunck@infradead.org>
      Cc: Steve French <smfrench@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      b0991aa3
    • J
      BKL: Explicitly add BKL around get_sb/fill_super · db719222
      Jan Blunck 提交于
      This patch is a preparation necessary to remove the BKL from do_new_mount().
      It explicitly adds calls to lock_kernel()/unlock_kernel() around
      get_sb/fill_super operations for filesystems that still uses the BKL.
      
      I've read through all the code formerly covered by the BKL inside
      do_kern_mount() and have satisfied myself that it doesn't need the BKL
      any more.
      
      do_kern_mount() is already called without the BKL when mounting the rootfs
      and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
      from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
      through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
      afs_mntpt_follow_link(). Both later functions are actually the filesystems
      follow_link inode operation. vfs_kern_mount() is calling the specified
      get_sb function and lets the filesystem do its job by calling the given
      fill_super function.
      
      Therefore I think it is safe to push down the BKL from the VFS to the
      low-level filesystems get_sb/fill_super operation.
      
      [arnd: do not add the BKL to those file systems that already
             don't use it elsewhere]
      Signed-off-by: NJan Blunck <jblunck@infradead.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      db719222
  21. 30 9月, 2010 6 次提交
  22. 10 8月, 2010 2 次提交