1. 22 9月, 2009 1 次提交
  2. 02 9月, 2009 1 次提交
    • D
      CRED: Add some configurable debugging [try #6] · e0e81739
      David Howells 提交于
      Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
      for credential management.  The additional code keeps track of the number of
      pointers from task_structs to any given cred struct, and checks to see that
      this number never exceeds the usage count of the cred struct (which includes
      all references, not just those from task_structs).
      
      Furthermore, if SELinux is enabled, the code also checks that the security
      pointer in the cred struct is never seen to be invalid.
      
      This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
      kernel thread on seeing cred->security be a NULL pointer (it appears that the
      credential struct has been previously released):
      
      	http://www.kerneloops.org/oops.php?number=252883Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      e0e81739
  3. 03 7月, 2009 1 次提交
    • D
      NFSD: Don't hold unrefcounted creds over call to nfsd_setuser() · 033a666c
      David Howells 提交于
      nfsd_open() gets an unrefcounted pointer to the current process's effective
      credentials at the top of the function, then calls nfsd_setuser() via
      fh_verify() - which may replace and destroy the current process's effective
      credentials - and then passes the unrefcounted pointer to dentry_open() - but
      the credentials may have been destroyed by this point.
      
      Instead, the value from current_cred() should be passed directly to
      dentry_open() as one of its arguments, rather than being cached in a variable.
      
      Possibly fh_verify() should return the creds to use.
      
      This is a regression introduced by
      745ca247 "CRED: Pass credentials through
      dentry_open()".
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-and-Verified-By: NSteve Dickson <steved@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      033a666c
  4. 16 6月, 2009 4 次提交
  5. 12 6月, 2009 4 次提交
  6. 28 5月, 2009 2 次提交
    • M
      integrity: nfsd imbalance bug fix · 14dba533
      Mimi Zohar 提交于
      An nfsd exported file is opened/closed by the kernel causing the
      integrity imbalance message.
      
      Before a file is opened, there normally is permission checking, which
      is done in inode_permission().  However, as integrity checking requires
      a dentry and mount point, which is not available in inode_permission(),
      the integrity (permission) checking must be called separately.
      
      In order to detect any missing integrity checking calls, we keep track
      of file open/closes.  ima_path_check() increments these counts and
      does the integrity (permission) checking. As a result, the number of
      calls to ima_path_check()/ima_file_free() should be balanced.  An extra
      call to fput(), indicates the file could have been accessed without first
      calling ima_path_check().
      
      In nfsv3 permission checking is done once, followed by multiple reads,
      which do an open/close for each read.  The integrity (permission) checking
      call should be in nfsd_permission() after the inode_permission() call, but
      as there is no correlation between the number of permission checking and
      open calls, the integrity checking call should not increment the counters,
      but defer it to when the file is actually opened.
      
      This patch adds:
      - integrity (permission) checking for nfsd exported files in nfsd_permission().
      - a call to increment counts for files opened by nfsd.
      
      This patch has been updated to return the nfs error types.
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      14dba533
    • W
      nfsd: fix hung up of nfs client while sync write data to nfs server · a0d24b29
      Wei Yongjun 提交于
      Commit 'Short write in nfsd becomes a full write to the client'
      (31dec253) broken the sync write.
      With the following commands to reproduce:
      
        $ mount -t nfs -o sync 192.168.0.21:/nfsroot /mnt
        $ cd /mnt
        $ echo aaaa > temp.txt
      
      Then nfs client is hung up.
      
      In SYNC mode the server alaways return the write count 0 to the
      client. This is because the value of host_err in nfsd_vfs_write()
      will be overwrite in SYNC mode by 'host_err=nfsd_sync(file);',
      and then we return host_err(which is now 0) as write count.
      
      This patch fixed the problem.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      a0d24b29
  7. 21 4月, 2009 2 次提交
    • D
      Fix i_mutex vs. readdir handling in nfsd · 2f9092e1
      David Woodhouse 提交于
      Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
      bug to generic code which had been extant for a long time in the XFS
      version -- it started to call through into lookup_one_len() and hence
      into the file systems' ->lookup() methods without i_mutex held on the
      directory.
      
      This patch fixes it by locking the directory's i_mutex again before
      calling the filldir functions. The original deadlocks which commit
      14f7dd63 was designed to avoid are still avoided, because they were due
      to fs-internal locking, not i_mutex.
      
      While we're at it, fix the return type of nfsd_buffered_readdir() which
      should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
      And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
      Sparse would have caught that, if it wasn't so busy bitching about
      __cold__.
      
      Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
      code") introduced a similar problem with calling lookup_one_len()
      without i_mutex, which this patch also addresses. To fix that, it was
      necessary to fix the called functions so that they expect i_mutex to be
      held; that part was done by J. Bruce Fields.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Umm-I-can-live-with-that-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
      Tested-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      LKML-Reference: <8036.1237474444@jrobl>
      Cc: stable@kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2f9092e1
    • A
      Safer nfsd_cross_mnt() · 1644ccc8
      Al Viro 提交于
      AFAICS, we have a subtle bug there: if we have crossed mountpoint
      *and* it got mount --move'd away, we'll be holding only one
      reference to fs containing dentry - exp->ex_path.mnt.  IOW, we
      ought to dput() before exp_put().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1644ccc8
  8. 26 3月, 2009 1 次提交
  9. 19 3月, 2009 3 次提交
    • S
      Inconsistent setattr behaviour · 0953e620
      Sachin S. Prabhu 提交于
      There is an inconsistency seen in the behaviour of nfs compared to other local
      filesystems on linux when changing owner or group of a directory. If the
      directory has SUID/SGID flags set, on changing owner or group on the directory,
      the flags are stripped off on nfs. These flags are maintained on other
      filesystems such as ext3.
      
      To reproduce on a nfs share or local filesystem, run the following commands
      mkdir test; chmod +s+g test; chown user1 test; ls -ld test
      
      On the nfs share, the flags are stripped and the output seen is
      drwxr-xr-x 2 user1 root 4096 Feb 23  2009 test
      
      On other local filesystems(ex: ext3), the flags are not stripped and the output
      seen is
      drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test
      
      chown_common() called from sys_chown() will only strip the flags if the inode is
      not a directory.
      static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
      {
      ..
              if (!S_ISDIR(inode->i_mode))
                      newattrs.ia_valid |=
                              ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
      ..
      }
      
      See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html
      
      "If the path argument refers to a regular file, the set-user-ID (S_ISUID) and
      set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return
      from chown(), unless the call is made by a process with appropriate privileges,
      in which case it is implementation-dependent whether these bits are altered. If
      chown() is successfully invoked on a file that is not a regular file, these
      bits may be cleared. These bits are defined in <sys/stat.h>."
      
      The behaviour as it stands does not appear to violate POSIX.  However the
      actions performed are inconsistent when comparing ext3 and nfs.
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      0953e620
    • D
      Short write in nfsd becomes a full write to the client · 31dec253
      David Shaw 提交于
      If a filesystem being written to via NFS returns a short write count
      (as opposed to an error) to nfsd, nfsd treats that as a success for
      the entire write, rather than the short count that actually succeeded.
      
      For example, given a 8192 byte write, if the underlying filesystem
      only writes 4096 bytes, nfsd will ack back to the nfs client that all
      8192 bytes were written.  The nfs client does have retry logic for
      short writes, but this is never called as the client is told the
      complete write succeeded.
      
      There are probably other ways it could happen, but in my case it
      happened with a fuse (filesystem in userspace) filesystem which can
      rather easily have a partial write.
      
      Here is a patch to properly return the short write count to the
      client.
      Signed-off-by: NDavid Shaw <dshaw@jabberwocky.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      31dec253
    • W
      nfsd(v2/v3): fix the failure of creation from HPUX client · 4ac35c2f
      wengang wang 提交于
      sometimes HPUX nfs client sends a create request to linux nfs server(v2/v3).
      the dump of the request is like:
          obj_attributes
              mode: value follows
                  set_it: value follows (1)
                  mode: 00
              uid: no value
                  set_it: no value (0)
              gid: value follows
                  set_it: value follows (1)
                  gid: 8030
              size: value follows
                  set_it: value follows (1)
                  size: 0
              atime: don't change
                  set_it: don't change (0)
              mtime: don't change
                  set_it: don't change (0)
      
      note that mode is 00(havs no rwx privilege even for the owner) and it requires
      to set size to 0.
      
      as current nfsd(v2/v3) implementation, the server does mainly 2 steps:
      1) creates the file in mode specified by calling vfs_create().
      2) sets attributes for the file by calling nfsd_setattr().
      
      at step 2), it finally calls file system specific setattr() function which may
      fail when checking permission because changing size needs WRITE privilege but
      it has none since mode is 000.
      
      for this case, a new file created, we may simply ignore the request of
      setting size to 0, so that WRITE privilege is not needed and the open
      succeeds.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      --
       vfs.c |   19 +++++++++++++++++++
       1 file changed, 19 insertions(+)
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      4ac35c2f
  10. 16 3月, 2009 1 次提交
    • J
      Use f_lock to protect f_flags · db1dd4d3
      Jonathan Corbet 提交于
      Traditionally, changes to struct file->f_flags have been done under BKL
      protection, or with no protection at all.  This patch causes all f_flags
      changes after file open/creation time to be done under protection of
      f_lock.  This allows the removal of some BKL usage and fixes a number of
      longstanding (if microscopic) races.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      db1dd4d3
  11. 08 1月, 2009 1 次提交
    • J
      nfsd: fix double-locks of directory mutex · 9a8d248e
      J. Bruce Fields 提交于
      A number of nfsd operations depend on the i_mutex to cover more code
      than just the fsync, so the approach of 4c728ef5 "add a vfs_fsync
      helper" doesn't work for nfsd.  Revert the parts of those patches that
      touch nfsd.
      
      Note: we can't, however, remove the logic from vfs_fsync that was needed
      only for the special case of nfsd, because a vfs_fsync(NULL,...) call
      can still result indirectly from a stackable filesystem that was called
      by nfsd.  (Thanks to Christoph Hellwig for pointing this out.)
      Reported-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      9a8d248e
  12. 06 1月, 2009 2 次提交
    • C
      add a vfs_fsync helper · 4c728ef5
      Christoph Hellwig 提交于
      Fsync currently has a fdatawrite/fdatawait pair around the method call,
      and a mutex_lock/unlock of the inode mutex.  All callers of fsync have
      to duplicate this, but we have a few and most of them don't quite get
      it right.  This patch adds a new vfs_fsync that takes care of this.
      It's a little more complicated as usual as ->fsync might get a NULL file
      pointer and just a dentry from nfsd, but otherwise gets afile and we
      want to take the mapping and file operations from it when it is there.
      
      Notes on the fsync callers:
      
       - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
         	lower file
       - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
      	file, and returning 0 when ->fsync was missing
       - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
         taking i_mutex.  Now given that shared memory doesn't have disk
         backing not doing anything in fsync seems fine and I left it out of
         the vfs_fsync conversion for now, but in that case we might just
         not pass it through to the lower file at all but just call the no-op
         simple_sync_file directly.
      
      [and now actually export vfs_fsync]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4c728ef5
    • A
      inode->i_op is never NULL · acfa4380
      Al Viro 提交于
      We used to have rather schizophrenic set of checks for NULL ->i_op even
      though it had been eliminated years ago.  You'd need to go out of your
      way to set it to NULL explicitly _and_ a bunch of code would die on
      such inodes anyway.  After killing two remaining places that still
      did that bogosity, all that crap can go away.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      acfa4380
  13. 14 11月, 2008 2 次提交
  14. 10 11月, 2008 1 次提交
    • D
      Fix nfsd truncation of readdir results · b726e923
      Doug Nazar 提交于
      Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
      situations" introduced a bug: on a directory in an exported ext3
      filesystem with dir_index unset, a READDIR will only return about 250
      entries, even if the directory was larger.
      
      Bisected it back to this commit; reverting it fixes the problem.
      
      It turns out that in this case ext3 reads a block at a time, then
      returns from readdir, which means we can end up with buf.full==0 but
      with more entries in the directory still to be read.  Before 8d7c4203
      (but after c002a6c7 "Optimise NFS readdir hack slightly"), this would
      cause us to return the READDIR result immediately, but with the eof bit
      unset.  That could cause a performance regression (because the client
      would need more roundtrips to the server to read the whole directory),
      but no loss in correctness, since the cleared eof bit caused the client
      to send another readdir.  After 8d7c4203, the setting of the eof bit
      made this a correctness problem.
      
      So, move nfserr_eof into the loop and remove the buf.full check so that
      we loop until buf.used==0.  The following seems to do the right thing
      and reduces the network traffic since we don't return a READDIR result
      until the buffer is full.
      
      Tested on an empty directory & large directory; eof is properly sent and
      there are no more short buffers.
      Signed-off-by: NDoug Nazar <nazard@dragoninc.ca>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      b726e923
  15. 31 10月, 2008 1 次提交
    • J
      nfsd: fix failure to set eof in readdir in some situations · 8d7c4203
      J. Bruce Fields 提交于
      Before 14f7dd63 "[PATCH] Copy XFS
      readdir hack into nfsd code", readdir_cd->err was reset to eof before
      each call to vfs_readdir; afterwards, it is set only once.  Similarly,
      c002a6c7 "[PATCH] Optimise NFS readdir
      hack slightly", can cause us to exit without nfserr_eof set.  Fix this.
      
      This ensures the "eof" bit is set when needed in readdir replies.  (The
      particular case I saw was an nfsv4 readdir of an empty directory, which
      returned with no entries (the protocol requires "." and ".." to be
      filtered out), but with eof unset.)
      
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      8d7c4203
  16. 23 10月, 2008 5 次提交
  17. 30 9月, 2008 2 次提交
    • J
      knfsd: allocate readahead cache in individual chunks · 54a66e54
      Jeff Layton 提交于
      I had a report from someone building a large NFS server that they were
      unable to start more than 585 nfsd threads. It was reported against an
      older kernel using the slab allocator, and I tracked it down to the
      large allocation in nfsd_racache_init failing.
      
      It appears that the slub allocator handles large allocations better,
      but large contiguous allocations can often be problematic. There
      doesn't seem to be any reason that the racache has to be allocated as a
      single large chunk. This patch breaks this up so that the racache is
      built up from separate allocations.
      
      (Thanks also to Takashi Iwai for a bugfix.)
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Takashi Iwai <tiwai@suse.de>
      54a66e54
    • J
      nfsd: permit unauthenticated stat of export root · 04716e66
      J. Bruce Fields 提交于
      RFC 2623 section 2.3.2 permits the server to bypass gss authentication
      checks for certain operations that a client may perform when mounting.
      In the case of a client that doesn't have some form of credentials
      available to it on boot, this allows it to perform the mount unattended.
      (Presumably real file access won't be needed until a user with
      credentials logs in.)
      
      Being slightly more lenient allows lots of old clients to access
      krb5-only exports, with the only loss being a small amount of
      information leaked about the root directory of the export.
      
      This affects only v2 and v3; v4 still requires authentication for all
      access.
      
      Thanks to Peter Staubach testing against a Solaris client, which
      suggesting addition of v3 getattr, to the list, and to Trond for noting
      that doing so exposes no additional information.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Peter Staubach <staubach@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      04716e66
  18. 27 7月, 2008 2 次提交
  19. 02 7月, 2008 1 次提交
  20. 24 6月, 2008 1 次提交
  21. 24 4月, 2008 2 次提交
    • J
      knfsd: clear both setuid and setgid whenever a chown is done · ca456252
      Jeff Layton 提交于
      Currently, knfsd only clears the setuid bit if the owner of a file is
      changed on a SETATTR call, and only clears the setgid bit if the group
      is changed. POSIX says this in the spec for chown():
      
          "If the specified file is a regular file, one or more of the
           S_IXUSR, S_IXGRP, or S_IXOTH bits of the file mode are set, and the
           process does not have appropriate privileges, the set-user-ID
           (S_ISUID) and set-group-ID (S_ISGID) bits of the file mode shall
           be cleared upon successful return from chown()."
      
      If I'm reading this correctly, then knfsd is doing this wrong. It should
      be clearing both the setuid and setgid bit on any SETATTR that changes
      the uid or gid. This wasn't really as noticable before, but now that the
      ATTR_KILL_S*ID bits are a no-op for the NFS client, it's more evident.
      
      This patch corrects the nfsd_setattr logic so that this occurs. It also
      does a bit of cleanup to the function.
      
      There is also one small behavioral change. If a SETATTR call comes in
      that changes the uid/gid and the mode, then we now only clear the setgid
      bit if the group execute bit isn't set. The setgid bit without a group
      execute bit signifies mandatory locking and we likely don't want to
      clear the bit in that case. Since there is no call in POSIX that should
      generate a SETATTR call like this, then this should rarely happen, but
      it's worth noting.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      ca456252
    • J
      knfsd: get rid of imode variable in nfsd_setattr · dee3209d
      Jeff Layton 提交于
      ...it's not really needed.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      dee3209d