1. 28 5月, 2009 1 次提交
  2. 21 4月, 2009 2 次提交
    • D
      Fix i_mutex vs. readdir handling in nfsd · 2f9092e1
      David Woodhouse 提交于
      Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
      bug to generic code which had been extant for a long time in the XFS
      version -- it started to call through into lookup_one_len() and hence
      into the file systems' ->lookup() methods without i_mutex held on the
      directory.
      
      This patch fixes it by locking the directory's i_mutex again before
      calling the filldir functions. The original deadlocks which commit
      14f7dd63 was designed to avoid are still avoided, because they were due
      to fs-internal locking, not i_mutex.
      
      While we're at it, fix the return type of nfsd_buffered_readdir() which
      should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
      And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
      Sparse would have caught that, if it wasn't so busy bitching about
      __cold__.
      
      Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
      code") introduced a similar problem with calling lookup_one_len()
      without i_mutex, which this patch also addresses. To fix that, it was
      necessary to fix the called functions so that they expect i_mutex to be
      held; that part was done by J. Bruce Fields.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Umm-I-can-live-with-that-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
      Tested-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      LKML-Reference: <8036.1237474444@jrobl>
      Cc: stable@kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2f9092e1
    • A
      Safer nfsd_cross_mnt() · 1644ccc8
      Al Viro 提交于
      AFAICS, we have a subtle bug there: if we have crossed mountpoint
      *and* it got mount --move'd away, we'll be holding only one
      reference to fs containing dentry - exp->ex_path.mnt.  IOW, we
      ought to dput() before exp_put().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1644ccc8
  3. 26 3月, 2009 1 次提交
  4. 19 3月, 2009 3 次提交
    • S
      Inconsistent setattr behaviour · 0953e620
      Sachin S. Prabhu 提交于
      There is an inconsistency seen in the behaviour of nfs compared to other local
      filesystems on linux when changing owner or group of a directory. If the
      directory has SUID/SGID flags set, on changing owner or group on the directory,
      the flags are stripped off on nfs. These flags are maintained on other
      filesystems such as ext3.
      
      To reproduce on a nfs share or local filesystem, run the following commands
      mkdir test; chmod +s+g test; chown user1 test; ls -ld test
      
      On the nfs share, the flags are stripped and the output seen is
      drwxr-xr-x 2 user1 root 4096 Feb 23  2009 test
      
      On other local filesystems(ex: ext3), the flags are not stripped and the output
      seen is
      drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test
      
      chown_common() called from sys_chown() will only strip the flags if the inode is
      not a directory.
      static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
      {
      ..
              if (!S_ISDIR(inode->i_mode))
                      newattrs.ia_valid |=
                              ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
      ..
      }
      
      See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html
      
      "If the path argument refers to a regular file, the set-user-ID (S_ISUID) and
      set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return
      from chown(), unless the call is made by a process with appropriate privileges,
      in which case it is implementation-dependent whether these bits are altered. If
      chown() is successfully invoked on a file that is not a regular file, these
      bits may be cleared. These bits are defined in <sys/stat.h>."
      
      The behaviour as it stands does not appear to violate POSIX.  However the
      actions performed are inconsistent when comparing ext3 and nfs.
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      0953e620
    • D
      Short write in nfsd becomes a full write to the client · 31dec253
      David Shaw 提交于
      If a filesystem being written to via NFS returns a short write count
      (as opposed to an error) to nfsd, nfsd treats that as a success for
      the entire write, rather than the short count that actually succeeded.
      
      For example, given a 8192 byte write, if the underlying filesystem
      only writes 4096 bytes, nfsd will ack back to the nfs client that all
      8192 bytes were written.  The nfs client does have retry logic for
      short writes, but this is never called as the client is told the
      complete write succeeded.
      
      There are probably other ways it could happen, but in my case it
      happened with a fuse (filesystem in userspace) filesystem which can
      rather easily have a partial write.
      
      Here is a patch to properly return the short write count to the
      client.
      Signed-off-by: NDavid Shaw <dshaw@jabberwocky.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      31dec253
    • W
      nfsd(v2/v3): fix the failure of creation from HPUX client · 4ac35c2f
      wengang wang 提交于
      sometimes HPUX nfs client sends a create request to linux nfs server(v2/v3).
      the dump of the request is like:
          obj_attributes
              mode: value follows
                  set_it: value follows (1)
                  mode: 00
              uid: no value
                  set_it: no value (0)
              gid: value follows
                  set_it: value follows (1)
                  gid: 8030
              size: value follows
                  set_it: value follows (1)
                  size: 0
              atime: don't change
                  set_it: don't change (0)
              mtime: don't change
                  set_it: don't change (0)
      
      note that mode is 00(havs no rwx privilege even for the owner) and it requires
      to set size to 0.
      
      as current nfsd(v2/v3) implementation, the server does mainly 2 steps:
      1) creates the file in mode specified by calling vfs_create().
      2) sets attributes for the file by calling nfsd_setattr().
      
      at step 2), it finally calls file system specific setattr() function which may
      fail when checking permission because changing size needs WRITE privilege but
      it has none since mode is 000.
      
      for this case, a new file created, we may simply ignore the request of
      setting size to 0, so that WRITE privilege is not needed and the open
      succeeds.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      --
       vfs.c |   19 +++++++++++++++++++
       1 file changed, 19 insertions(+)
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      4ac35c2f
  5. 16 3月, 2009 1 次提交
    • J
      Use f_lock to protect f_flags · db1dd4d3
      Jonathan Corbet 提交于
      Traditionally, changes to struct file->f_flags have been done under BKL
      protection, or with no protection at all.  This patch causes all f_flags
      changes after file open/creation time to be done under protection of
      f_lock.  This allows the removal of some BKL usage and fixes a number of
      longstanding (if microscopic) races.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      db1dd4d3
  6. 08 1月, 2009 1 次提交
    • J
      nfsd: fix double-locks of directory mutex · 9a8d248e
      J. Bruce Fields 提交于
      A number of nfsd operations depend on the i_mutex to cover more code
      than just the fsync, so the approach of 4c728ef5 "add a vfs_fsync
      helper" doesn't work for nfsd.  Revert the parts of those patches that
      touch nfsd.
      
      Note: we can't, however, remove the logic from vfs_fsync that was needed
      only for the special case of nfsd, because a vfs_fsync(NULL,...) call
      can still result indirectly from a stackable filesystem that was called
      by nfsd.  (Thanks to Christoph Hellwig for pointing this out.)
      Reported-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      9a8d248e
  7. 06 1月, 2009 2 次提交
    • C
      add a vfs_fsync helper · 4c728ef5
      Christoph Hellwig 提交于
      Fsync currently has a fdatawrite/fdatawait pair around the method call,
      and a mutex_lock/unlock of the inode mutex.  All callers of fsync have
      to duplicate this, but we have a few and most of them don't quite get
      it right.  This patch adds a new vfs_fsync that takes care of this.
      It's a little more complicated as usual as ->fsync might get a NULL file
      pointer and just a dentry from nfsd, but otherwise gets afile and we
      want to take the mapping and file operations from it when it is there.
      
      Notes on the fsync callers:
      
       - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
         	lower file
       - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
      	file, and returning 0 when ->fsync was missing
       - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
         taking i_mutex.  Now given that shared memory doesn't have disk
         backing not doing anything in fsync seems fine and I left it out of
         the vfs_fsync conversion for now, but in that case we might just
         not pass it through to the lower file at all but just call the no-op
         simple_sync_file directly.
      
      [and now actually export vfs_fsync]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4c728ef5
    • A
      inode->i_op is never NULL · acfa4380
      Al Viro 提交于
      We used to have rather schizophrenic set of checks for NULL ->i_op even
      though it had been eliminated years ago.  You'd need to go out of your
      way to set it to NULL explicitly _and_ a bunch of code would die on
      such inodes anyway.  After killing two remaining places that still
      did that bogosity, all that crap can go away.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      acfa4380
  8. 14 11月, 2008 2 次提交
  9. 10 11月, 2008 1 次提交
    • D
      Fix nfsd truncation of readdir results · b726e923
      Doug Nazar 提交于
      Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
      situations" introduced a bug: on a directory in an exported ext3
      filesystem with dir_index unset, a READDIR will only return about 250
      entries, even if the directory was larger.
      
      Bisected it back to this commit; reverting it fixes the problem.
      
      It turns out that in this case ext3 reads a block at a time, then
      returns from readdir, which means we can end up with buf.full==0 but
      with more entries in the directory still to be read.  Before 8d7c4203
      (but after c002a6c7 "Optimise NFS readdir hack slightly"), this would
      cause us to return the READDIR result immediately, but with the eof bit
      unset.  That could cause a performance regression (because the client
      would need more roundtrips to the server to read the whole directory),
      but no loss in correctness, since the cleared eof bit caused the client
      to send another readdir.  After 8d7c4203, the setting of the eof bit
      made this a correctness problem.
      
      So, move nfserr_eof into the loop and remove the buf.full check so that
      we loop until buf.used==0.  The following seems to do the right thing
      and reduces the network traffic since we don't return a READDIR result
      until the buffer is full.
      
      Tested on an empty directory & large directory; eof is properly sent and
      there are no more short buffers.
      Signed-off-by: NDoug Nazar <nazard@dragoninc.ca>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      b726e923
  10. 31 10月, 2008 1 次提交
    • J
      nfsd: fix failure to set eof in readdir in some situations · 8d7c4203
      J. Bruce Fields 提交于
      Before 14f7dd63 "[PATCH] Copy XFS
      readdir hack into nfsd code", readdir_cd->err was reset to eof before
      each call to vfs_readdir; afterwards, it is set only once.  Similarly,
      c002a6c7 "[PATCH] Optimise NFS readdir
      hack slightly", can cause us to exit without nfserr_eof set.  Fix this.
      
      This ensures the "eof" bit is set when needed in readdir replies.  (The
      particular case I saw was an nfsv4 readdir of an empty directory, which
      returned with no entries (the protocol requires "." and ".." to be
      filtered out), but with eof unset.)
      
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      8d7c4203
  11. 23 10月, 2008 5 次提交
  12. 30 9月, 2008 2 次提交
    • J
      knfsd: allocate readahead cache in individual chunks · 54a66e54
      Jeff Layton 提交于
      I had a report from someone building a large NFS server that they were
      unable to start more than 585 nfsd threads. It was reported against an
      older kernel using the slab allocator, and I tracked it down to the
      large allocation in nfsd_racache_init failing.
      
      It appears that the slub allocator handles large allocations better,
      but large contiguous allocations can often be problematic. There
      doesn't seem to be any reason that the racache has to be allocated as a
      single large chunk. This patch breaks this up so that the racache is
      built up from separate allocations.
      
      (Thanks also to Takashi Iwai for a bugfix.)
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Takashi Iwai <tiwai@suse.de>
      54a66e54
    • J
      nfsd: permit unauthenticated stat of export root · 04716e66
      J. Bruce Fields 提交于
      RFC 2623 section 2.3.2 permits the server to bypass gss authentication
      checks for certain operations that a client may perform when mounting.
      In the case of a client that doesn't have some form of credentials
      available to it on boot, this allows it to perform the mount unattended.
      (Presumably real file access won't be needed until a user with
      credentials logs in.)
      
      Being slightly more lenient allows lots of old clients to access
      krb5-only exports, with the only loss being a small amount of
      information leaked about the root directory of the export.
      
      This affects only v2 and v3; v4 still requires authentication for all
      access.
      
      Thanks to Peter Staubach testing against a Solaris client, which
      suggesting addition of v3 getattr, to the list, and to Trond for noting
      that doing so exposes no additional information.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Peter Staubach <staubach@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      04716e66
  13. 27 7月, 2008 2 次提交
  14. 02 7月, 2008 1 次提交
  15. 24 6月, 2008 1 次提交
  16. 24 4月, 2008 4 次提交
  17. 19 4月, 2008 6 次提交
  18. 15 2月, 2008 1 次提交
  19. 02 2月, 2008 2 次提交
    • J
      nfsd: allow root to set uid and gid on create · 5c002b3b
      J. Bruce Fields 提交于
      The server silently ignores attempts to set the uid and gid on create.
      Based on the comment, this appears to have been done to prevent some
      overly-clever IRIX client from causing itself problems.
      
      Perhaps we should remove that hack completely.  For now, at least, it
      makes sense to allow root (when no_root_squash is set) to set uid and
      gid.
      
      While we're there, since nfsd_create and nfsd_create_v3 share the same
      logic, pull that out into a separate function.  And spell out the
      individual modifications of ia_valid instead of doing them both at once
      inside a conditional.
      
      Thanks to Roger Willcocks <roger@filmlight.ltd.uk> for the bug report
      and original patch on which this is based.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      5c002b3b
    • C
      NFSD: Adjust filename length argument of nfsd_lookup · 5a022fc8
      Chuck Lever 提交于
      Clean up: adjust the sign of the length argument of nfsd_lookup and
      nfsd_lookup_dentry, for consistency with recent changes.  NFSD version
      4 callers already pass an unsigned file name length.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Acked-By: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      5a022fc8
  20. 20 10月, 2007 1 次提交