1. 21 4月, 2009 1 次提交
    • A
      Safer nfsd_cross_mnt() · 1644ccc8
      Al Viro 提交于
      AFAICS, we have a subtle bug there: if we have crossed mountpoint
      *and* it got mount --move'd away, we'll be holding only one
      reference to fs containing dentry - exp->ex_path.mnt.  IOW, we
      ought to dput() before exp_put().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1644ccc8
  2. 26 3月, 2009 1 次提交
  3. 19 3月, 2009 3 次提交
    • S
      Inconsistent setattr behaviour · 0953e620
      Sachin S. Prabhu 提交于
      There is an inconsistency seen in the behaviour of nfs compared to other local
      filesystems on linux when changing owner or group of a directory. If the
      directory has SUID/SGID flags set, on changing owner or group on the directory,
      the flags are stripped off on nfs. These flags are maintained on other
      filesystems such as ext3.
      
      To reproduce on a nfs share or local filesystem, run the following commands
      mkdir test; chmod +s+g test; chown user1 test; ls -ld test
      
      On the nfs share, the flags are stripped and the output seen is
      drwxr-xr-x 2 user1 root 4096 Feb 23  2009 test
      
      On other local filesystems(ex: ext3), the flags are not stripped and the output
      seen is
      drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test
      
      chown_common() called from sys_chown() will only strip the flags if the inode is
      not a directory.
      static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
      {
      ..
              if (!S_ISDIR(inode->i_mode))
                      newattrs.ia_valid |=
                              ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
      ..
      }
      
      See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html
      
      "If the path argument refers to a regular file, the set-user-ID (S_ISUID) and
      set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return
      from chown(), unless the call is made by a process with appropriate privileges,
      in which case it is implementation-dependent whether these bits are altered. If
      chown() is successfully invoked on a file that is not a regular file, these
      bits may be cleared. These bits are defined in <sys/stat.h>."
      
      The behaviour as it stands does not appear to violate POSIX.  However the
      actions performed are inconsistent when comparing ext3 and nfs.
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      0953e620
    • D
      Short write in nfsd becomes a full write to the client · 31dec253
      David Shaw 提交于
      If a filesystem being written to via NFS returns a short write count
      (as opposed to an error) to nfsd, nfsd treats that as a success for
      the entire write, rather than the short count that actually succeeded.
      
      For example, given a 8192 byte write, if the underlying filesystem
      only writes 4096 bytes, nfsd will ack back to the nfs client that all
      8192 bytes were written.  The nfs client does have retry logic for
      short writes, but this is never called as the client is told the
      complete write succeeded.
      
      There are probably other ways it could happen, but in my case it
      happened with a fuse (filesystem in userspace) filesystem which can
      rather easily have a partial write.
      
      Here is a patch to properly return the short write count to the
      client.
      Signed-off-by: NDavid Shaw <dshaw@jabberwocky.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      31dec253
    • W
      nfsd(v2/v3): fix the failure of creation from HPUX client · 4ac35c2f
      wengang wang 提交于
      sometimes HPUX nfs client sends a create request to linux nfs server(v2/v3).
      the dump of the request is like:
          obj_attributes
              mode: value follows
                  set_it: value follows (1)
                  mode: 00
              uid: no value
                  set_it: no value (0)
              gid: value follows
                  set_it: value follows (1)
                  gid: 8030
              size: value follows
                  set_it: value follows (1)
                  size: 0
              atime: don't change
                  set_it: don't change (0)
              mtime: don't change
                  set_it: don't change (0)
      
      note that mode is 00(havs no rwx privilege even for the owner) and it requires
      to set size to 0.
      
      as current nfsd(v2/v3) implementation, the server does mainly 2 steps:
      1) creates the file in mode specified by calling vfs_create().
      2) sets attributes for the file by calling nfsd_setattr().
      
      at step 2), it finally calls file system specific setattr() function which may
      fail when checking permission because changing size needs WRITE privilege but
      it has none since mode is 000.
      
      for this case, a new file created, we may simply ignore the request of
      setting size to 0, so that WRITE privilege is not needed and the open
      succeeds.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      --
       vfs.c |   19 +++++++++++++++++++
       1 file changed, 19 insertions(+)
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      4ac35c2f
  4. 16 3月, 2009 1 次提交
    • J
      Use f_lock to protect f_flags · db1dd4d3
      Jonathan Corbet 提交于
      Traditionally, changes to struct file->f_flags have been done under BKL
      protection, or with no protection at all.  This patch causes all f_flags
      changes after file open/creation time to be done under protection of
      f_lock.  This allows the removal of some BKL usage and fixes a number of
      longstanding (if microscopic) races.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      db1dd4d3
  5. 08 1月, 2009 1 次提交
    • J
      nfsd: fix double-locks of directory mutex · 9a8d248e
      J. Bruce Fields 提交于
      A number of nfsd operations depend on the i_mutex to cover more code
      than just the fsync, so the approach of 4c728ef5 "add a vfs_fsync
      helper" doesn't work for nfsd.  Revert the parts of those patches that
      touch nfsd.
      
      Note: we can't, however, remove the logic from vfs_fsync that was needed
      only for the special case of nfsd, because a vfs_fsync(NULL,...) call
      can still result indirectly from a stackable filesystem that was called
      by nfsd.  (Thanks to Christoph Hellwig for pointing this out.)
      Reported-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      9a8d248e
  6. 06 1月, 2009 2 次提交
    • C
      add a vfs_fsync helper · 4c728ef5
      Christoph Hellwig 提交于
      Fsync currently has a fdatawrite/fdatawait pair around the method call,
      and a mutex_lock/unlock of the inode mutex.  All callers of fsync have
      to duplicate this, but we have a few and most of them don't quite get
      it right.  This patch adds a new vfs_fsync that takes care of this.
      It's a little more complicated as usual as ->fsync might get a NULL file
      pointer and just a dentry from nfsd, but otherwise gets afile and we
      want to take the mapping and file operations from it when it is there.
      
      Notes on the fsync callers:
      
       - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
         	lower file
       - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
      	file, and returning 0 when ->fsync was missing
       - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
         taking i_mutex.  Now given that shared memory doesn't have disk
         backing not doing anything in fsync seems fine and I left it out of
         the vfs_fsync conversion for now, but in that case we might just
         not pass it through to the lower file at all but just call the no-op
         simple_sync_file directly.
      
      [and now actually export vfs_fsync]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4c728ef5
    • A
      inode->i_op is never NULL · acfa4380
      Al Viro 提交于
      We used to have rather schizophrenic set of checks for NULL ->i_op even
      though it had been eliminated years ago.  You'd need to go out of your
      way to set it to NULL explicitly _and_ a bunch of code would die on
      such inodes anyway.  After killing two remaining places that still
      did that bogosity, all that crap can go away.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      acfa4380
  7. 14 11月, 2008 2 次提交
  8. 10 11月, 2008 1 次提交
    • D
      Fix nfsd truncation of readdir results · b726e923
      Doug Nazar 提交于
      Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
      situations" introduced a bug: on a directory in an exported ext3
      filesystem with dir_index unset, a READDIR will only return about 250
      entries, even if the directory was larger.
      
      Bisected it back to this commit; reverting it fixes the problem.
      
      It turns out that in this case ext3 reads a block at a time, then
      returns from readdir, which means we can end up with buf.full==0 but
      with more entries in the directory still to be read.  Before 8d7c4203
      (but after c002a6c7 "Optimise NFS readdir hack slightly"), this would
      cause us to return the READDIR result immediately, but with the eof bit
      unset.  That could cause a performance regression (because the client
      would need more roundtrips to the server to read the whole directory),
      but no loss in correctness, since the cleared eof bit caused the client
      to send another readdir.  After 8d7c4203, the setting of the eof bit
      made this a correctness problem.
      
      So, move nfserr_eof into the loop and remove the buf.full check so that
      we loop until buf.used==0.  The following seems to do the right thing
      and reduces the network traffic since we don't return a READDIR result
      until the buffer is full.
      
      Tested on an empty directory & large directory; eof is properly sent and
      there are no more short buffers.
      Signed-off-by: NDoug Nazar <nazard@dragoninc.ca>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      b726e923
  9. 31 10月, 2008 1 次提交
    • J
      nfsd: fix failure to set eof in readdir in some situations · 8d7c4203
      J. Bruce Fields 提交于
      Before 14f7dd63 "[PATCH] Copy XFS
      readdir hack into nfsd code", readdir_cd->err was reset to eof before
      each call to vfs_readdir; afterwards, it is set only once.  Similarly,
      c002a6c7 "[PATCH] Optimise NFS readdir
      hack slightly", can cause us to exit without nfserr_eof set.  Fix this.
      
      This ensures the "eof" bit is set when needed in readdir replies.  (The
      particular case I saw was an nfsv4 readdir of an empty directory, which
      returned with no entries (the protocol requires "." and ".." to be
      filtered out), but with eof unset.)
      
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      8d7c4203
  10. 23 10月, 2008 5 次提交
  11. 30 9月, 2008 2 次提交
    • J
      knfsd: allocate readahead cache in individual chunks · 54a66e54
      Jeff Layton 提交于
      I had a report from someone building a large NFS server that they were
      unable to start more than 585 nfsd threads. It was reported against an
      older kernel using the slab allocator, and I tracked it down to the
      large allocation in nfsd_racache_init failing.
      
      It appears that the slub allocator handles large allocations better,
      but large contiguous allocations can often be problematic. There
      doesn't seem to be any reason that the racache has to be allocated as a
      single large chunk. This patch breaks this up so that the racache is
      built up from separate allocations.
      
      (Thanks also to Takashi Iwai for a bugfix.)
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Takashi Iwai <tiwai@suse.de>
      54a66e54
    • J
      nfsd: permit unauthenticated stat of export root · 04716e66
      J. Bruce Fields 提交于
      RFC 2623 section 2.3.2 permits the server to bypass gss authentication
      checks for certain operations that a client may perform when mounting.
      In the case of a client that doesn't have some form of credentials
      available to it on boot, this allows it to perform the mount unattended.
      (Presumably real file access won't be needed until a user with
      credentials logs in.)
      
      Being slightly more lenient allows lots of old clients to access
      krb5-only exports, with the only loss being a small amount of
      information leaked about the root directory of the export.
      
      This affects only v2 and v3; v4 still requires authentication for all
      access.
      
      Thanks to Peter Staubach testing against a Solaris client, which
      suggesting addition of v3 getattr, to the list, and to Trond for noting
      that doing so exposes no additional information.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Peter Staubach <staubach@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      04716e66
  12. 27 7月, 2008 2 次提交
  13. 02 7月, 2008 1 次提交
  14. 24 6月, 2008 1 次提交
  15. 24 4月, 2008 4 次提交
  16. 19 4月, 2008 6 次提交
  17. 15 2月, 2008 1 次提交
  18. 02 2月, 2008 2 次提交
    • J
      nfsd: allow root to set uid and gid on create · 5c002b3b
      J. Bruce Fields 提交于
      The server silently ignores attempts to set the uid and gid on create.
      Based on the comment, this appears to have been done to prevent some
      overly-clever IRIX client from causing itself problems.
      
      Perhaps we should remove that hack completely.  For now, at least, it
      makes sense to allow root (when no_root_squash is set) to set uid and
      gid.
      
      While we're there, since nfsd_create and nfsd_create_v3 share the same
      logic, pull that out into a separate function.  And spell out the
      individual modifications of ia_valid instead of doing them both at once
      inside a conditional.
      
      Thanks to Roger Willcocks <roger@filmlight.ltd.uk> for the bug report
      and original patch on which this is based.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      5c002b3b
    • C
      NFSD: Adjust filename length argument of nfsd_lookup · 5a022fc8
      Chuck Lever 提交于
      Clean up: adjust the sign of the length argument of nfsd_lookup and
      nfsd_lookup_dentry, for consistency with recent changes.  NFSD version
      4 callers already pass an unsigned file name length.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Acked-By: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      5a022fc8
  19. 20 10月, 2007 1 次提交
  20. 19 10月, 2007 1 次提交
  21. 17 10月, 2007 1 次提交
    • S
      Implement file posix capabilities · b5376771
      Serge E. Hallyn 提交于
      Implement file posix capabilities.  This allows programs to be given a
      subset of root's powers regardless of who runs them, without having to use
      setuid and giving the binary all of root's powers.
      
      This version works with Kaigai Kohei's userspace tools, found at
      http://www.kaigai.gr.jp/index.php.  For more information on how to use this
      patch, Chris Friedhoff has posted a nice page at
      http://www.friedhoff.org/fscaps.html.
      
      Changelog:
      	Nov 27:
      	Incorporate fixes from Andrew Morton
      	(security-introduce-file-caps-tweaks and
      	security-introduce-file-caps-warning-fix)
      	Fix Kconfig dependency.
      	Fix change signaling behavior when file caps are not compiled in.
      
      	Nov 13:
      	Integrate comments from Alexey: Remove CONFIG_ ifdef from
      	capability.h, and use %zd for printing a size_t.
      
      	Nov 13:
      	Fix endianness warnings by sparse as suggested by Alexey
      	Dobriyan.
      
      	Nov 09:
      	Address warnings of unused variables at cap_bprm_set_security
      	when file capabilities are disabled, and simultaneously clean
      	up the code a little, by pulling the new code into a helper
      	function.
      
      	Nov 08:
      	For pointers to required userspace tools and how to use
      	them, see http://www.friedhoff.org/fscaps.html.
      
      	Nov 07:
      	Fix the calculation of the highest bit checked in
      	check_cap_sanity().
      
      	Nov 07:
      	Allow file caps to be enabled without CONFIG_SECURITY, since
      	capabilities are the default.
      	Hook cap_task_setscheduler when !CONFIG_SECURITY.
      	Move capable(TASK_KILL) to end of cap_task_kill to reduce
      	audit messages.
      
      	Nov 05:
      	Add secondary calls in selinux/hooks.c to task_setioprio and
      	task_setscheduler so that selinux and capabilities with file
      	cap support can be stacked.
      
      	Sep 05:
      	As Seth Arnold points out, uid checks are out of place
      	for capability code.
      
      	Sep 01:
      	Define task_setscheduler, task_setioprio, cap_task_kill, and
      	task_setnice to make sure a user cannot affect a process in which
      	they called a program with some fscaps.
      
      	One remaining question is the note under task_setscheduler: are we
      	ok with CAP_SYS_NICE being sufficient to confine a process to a
      	cpuset?
      
      	It is a semantic change, as without fsccaps, attach_task doesn't
      	allow CAP_SYS_NICE to override the uid equivalence check.  But since
      	it uses security_task_setscheduler, which elsewhere is used where
      	CAP_SYS_NICE can be used to override the uid equivalence check,
      	fixing it might be tough.
      
      	     task_setscheduler
      		 note: this also controls cpuset:attach_task.  Are we ok with
      		     CAP_SYS_NICE being used to confine to a cpuset?
      	     task_setioprio
      	     task_setnice
      		 sys_setpriority uses this (through set_one_prio) for another
      		 process.  Need same checks as setrlimit
      
      	Aug 21:
      	Updated secureexec implementation to reflect the fact that
      	euid and uid might be the same and nonzero, but the process
      	might still have elevated caps.
      
      	Aug 15:
      	Handle endianness of xattrs.
      	Enforce capability version match between kernel and disk.
      	Enforce that no bits beyond the known max capability are
      	set, else return -EPERM.
      	With this extra processing, it may be worth reconsidering
      	doing all the work at bprm_set_security rather than
      	d_instantiate.
      
      	Aug 10:
      	Always call getxattr at bprm_set_security, rather than
      	caching it at d_instantiate.
      
      [morgan@kernel.org: file-caps clean up for linux/capability.h]
      [bunk@kernel.org: unexport cap_inode_killpriv]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Andrew Morgan <morgan@kernel.org>
      Signed-off-by: NAndrew Morgan <morgan@kernel.org>
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5376771