1. 08 1月, 2009 3 次提交
  2. 07 1月, 2009 7 次提交
  3. 06 1月, 2009 2 次提交
    • C
      add a vfs_fsync helper · 4c728ef5
      Christoph Hellwig 提交于
      Fsync currently has a fdatawrite/fdatawait pair around the method call,
      and a mutex_lock/unlock of the inode mutex.  All callers of fsync have
      to duplicate this, but we have a few and most of them don't quite get
      it right.  This patch adds a new vfs_fsync that takes care of this.
      It's a little more complicated as usual as ->fsync might get a NULL file
      pointer and just a dentry from nfsd, but otherwise gets afile and we
      want to take the mapping and file operations from it when it is there.
      
      Notes on the fsync callers:
      
       - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
         	lower file
       - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
      	file, and returning 0 when ->fsync was missing
       - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
         taking i_mutex.  Now given that shared memory doesn't have disk
         backing not doing anything in fsync seems fine and I left it out of
         the vfs_fsync conversion for now, but in that case we might just
         not pass it through to the lower file at all but just call the no-op
         simple_sync_file directly.
      
      [and now actually export vfs_fsync]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4c728ef5
    • A
      inode->i_op is never NULL · acfa4380
      Al Viro 提交于
      We used to have rather schizophrenic set of checks for NULL ->i_op even
      though it had been eliminated years ago.  You'd need to go out of your
      way to set it to NULL explicitly _and_ a bunch of code would die on
      such inodes anyway.  After killing two remaining places that still
      did that bogosity, all that crap can go away.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      acfa4380
  4. 24 12月, 2008 3 次提交
  5. 25 11月, 2008 2 次提交
    • J
      nfsd: use of unitialized list head on error exit in nfs4recover.c · e4625eb8
      J. Bruce Fields 提交于
      Thanks to Matthew Dodd for this bug report:
      
      A file label issue while running SELinux in MLS mode provoked the
      following bug, which is a result of use before init on a 'struct list_head'.
      
      In nfsd4_list_rec_dir() if the call to dentry_open() fails the 'goto
      out' skips INIT_LIST_HEAD() which results in the normally improbable
      case where list_entry() returns NULL.
      
      Trace follows.
      
      NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
      SELinux:  Context unconfined_t:object_r:var_lib_nfs_t:s0 is not valid
      (left unmapped).
      type=1400 audit(1227298063.609:282): avc:  denied  { read } for
      pid=1890 comm="rpc.nfsd" name="v4recovery" dev=dm-0 ino=148726
      scontext=system_u:system_r:nfsd_t:s0-s15:c0.c1023
      tcontext=system_u:object_r:unlabeled_t:s15:c0.c1023 tclass=dir
      BUG: unable to handle kernel NULL pointer dereference at 00000004
      IP: [<c050894e>] list_del+0x6/0x60
      *pde = 0d9ce067 *pte = 00000000
      Oops: 0000 [#1] SMP
      Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs autofs4
      sunrpc ipv6 dm_multipath scsi_dh ppdev parport_pc sg parport floppy
      ata_piix pata_acpi ata_generic libata pcnet32 i2c_piix4 mii pcspkr
      i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod BusLogic sd_mod
      scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last
      unloaded: microcode]
      
      Pid: 1890, comm: rpc.nfsd Not tainted (2.6.27.5-37.fc9.i686 #1)
      EIP: 0060:[<c050894e>] EFLAGS: 00010217 CPU: 0
      EIP is at list_del+0x6/0x60
      EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: cd99e480
      ESI: cf9caed8 EDI: 00000000 EBP: cf9caebc ESP: cf9caeb8
        DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process rpc.nfsd (pid: 1890, ti=cf9ca000 task=cf4de580 task.ti=cf9ca000)
      Stack: 00000000 cf9caef0 d0a9f139 c0496d04 d0a9f217 fffffff3 00000000
      00000000
              00000000 00000000 cf32b220 00000000 00000008 00000801 cf9caefc
      d0a9f193
              00000000 cf9caf08 d0a9b6ea 00000000 cf9caf1c d0a874f2 cf9c3004
      00000008
      Call Trace:
        [<d0a9f139>] ? nfsd4_list_rec_dir+0xf3/0x13a [nfsd]
        [<c0496d04>] ? do_path_lookup+0x12d/0x175
        [<d0a9f217>] ? load_recdir+0x0/0x26 [nfsd]
        [<d0a9f193>] ? nfsd4_recdir_load+0x13/0x34 [nfsd]
        [<d0a9b6ea>] ? nfs4_state_start+0x2a/0xc5 [nfsd]
        [<d0a874f2>] ? nfsd_svc+0x51/0xff [nfsd]
        [<d0a87f2d>] ? write_svc+0x0/0x1e [nfsd]
        [<d0a87f48>] ? write_svc+0x1b/0x1e [nfsd]
        [<d0a87854>] ? nfsctl_transaction_write+0x3a/0x61 [nfsd]
        [<c04b6a4e>] ? sys_nfsservctl+0x116/0x154
        [<c04975c1>] ? putname+0x24/0x2f
        [<c04975c1>] ? putname+0x24/0x2f
        [<c048d49f>] ? do_sys_open+0xad/0xb7
        [<c048d337>] ? filp_close+0x50/0x5a
        [<c048d4eb>] ? sys_open+0x1e/0x26
        [<c0403cca>] ? syscall_call+0x7/0xb
        [<c064007b>] ? init_cyrix+0x185/0x490
        =======================
      Code: 75 e1 8b 53 08 8d 4b 04 8d 46 04 e8 75 00 00 00 8b 53 10 8d 4b 0c
      8d 46 0c e8 67 00 00 00 5b 5e 5f 5d c3 90 90 55 89 e5 53 89 c3 <8b> 40
      04 8b 00 39 d8 74 16 50 53 68 3e d6 6f c0 6a 30 68 78 d6
      EIP: [<c050894e>] list_del+0x6/0x60 SS:ESP 0068:cf9caeb8
      ---[ end trace a89c4ad091c4ad53 ]---
      
      Cc: Matthew N. Dodd <Matthew.Dodd@spart.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      e4625eb8
    • J
      nfsd: clean up grace period on early exit · 2c5e7615
      J. Bruce Fields 提交于
      If nfsd was shut down before the grace period ended, we could end up
      with a freed object still on grace_list.  Thanks to Jeff Moyer for
      reporting the resulting list corruption warnings.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Tested-by: NJeff Moyer <jmoyer@redhat.com>
      2c5e7615
  6. 14 11月, 2008 5 次提交
    • D
      CRED: Differentiate objective and effective subjective credentials on a task · 3b11a1de
      David Howells 提交于
      Differentiate the objective and real subjective credentials from the effective
      subjective credentials on a task by introducing a second credentials pointer
      into the task_struct.
      
      task_struct::real_cred then refers to the objective and apparent real
      subjective credentials of a task, as perceived by the other tasks in the
      system.
      
      task_struct::cred then refers to the effective subjective credentials of a
      task, as used by that task when it's actually running.  These are not visible
      to the other tasks in the system.
      
      __task_cred(task) then refers to the objective/real credentials of the task in
      question.
      
      current_cred() refers to the effective subjective credentials of the current
      task.
      
      prepare_creds() uses the objective creds as a base and commit_creds() changes
      both pointers in the task_struct (indeed commit_creds() requires them to be the
      same).
      
      override_creds() and revert_creds() change the subjective creds pointer only,
      and the former returns the old subjective creds.  These are used by NFSD,
      faccessat() and do_coredump(), and will by used by CacheFiles.
      
      In SELinux, current_has_perm() is provided as an alternative to
      task_has_perm().  This uses the effective subjective context of current,
      whereas task_has_perm() uses the objective/real context of the subject.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      3b11a1de
    • D
      CRED: Inaugurate COW credentials · d84f4f99
      David Howells 提交于
      Inaugurate copy-on-write credentials management.  This uses RCU to manage the
      credentials pointer in the task_struct with respect to accesses by other tasks.
      A process may only modify its own credentials, and so does not need locking to
      access or modify its own credentials.
      
      A mutex (cred_replace_mutex) is added to the task_struct to control the effect
      of PTRACE_ATTACHED on credential calculations, particularly with respect to
      execve().
      
      With this patch, the contents of an active credentials struct may not be
      changed directly; rather a new set of credentials must be prepared, modified
      and committed using something like the following sequence of events:
      
      	struct cred *new = prepare_creds();
      	int ret = blah(new);
      	if (ret < 0) {
      		abort_creds(new);
      		return ret;
      	}
      	return commit_creds(new);
      
      There are some exceptions to this rule: the keyrings pointed to by the active
      credentials may be instantiated - keyrings violate the COW rule as managing
      COW keyrings is tricky, given that it is possible for a task to directly alter
      the keys in a keyring in use by another task.
      
      To help enforce this, various pointers to sets of credentials, such as those in
      the task_struct, are declared const.  The purpose of this is compile-time
      discouragement of altering credentials through those pointers.  Once a set of
      credentials has been made public through one of these pointers, it may not be
      modified, except under special circumstances:
      
        (1) Its reference count may incremented and decremented.
      
        (2) The keyrings to which it points may be modified, but not replaced.
      
      The only safe way to modify anything else is to create a replacement and commit
      using the functions described in Documentation/credentials.txt (which will be
      added by a later patch).
      
      This patch and the preceding patches have been tested with the LTP SELinux
      testsuite.
      
      This patch makes several logical sets of alteration:
      
       (1) execve().
      
           This now prepares and commits credentials in various places in the
           security code rather than altering the current creds directly.
      
       (2) Temporary credential overrides.
      
           do_coredump() and sys_faccessat() now prepare their own credentials and
           temporarily override the ones currently on the acting thread, whilst
           preventing interference from other threads by holding cred_replace_mutex
           on the thread being dumped.
      
           This will be replaced in a future patch by something that hands down the
           credentials directly to the functions being called, rather than altering
           the task's objective credentials.
      
       (3) LSM interface.
      
           A number of functions have been changed, added or removed:
      
           (*) security_capset_check(), ->capset_check()
           (*) security_capset_set(), ->capset_set()
      
           	 Removed in favour of security_capset().
      
           (*) security_capset(), ->capset()
      
           	 New.  This is passed a pointer to the new creds, a pointer to the old
           	 creds and the proposed capability sets.  It should fill in the new
           	 creds or return an error.  All pointers, barring the pointer to the
           	 new creds, are now const.
      
           (*) security_bprm_apply_creds(), ->bprm_apply_creds()
      
           	 Changed; now returns a value, which will cause the process to be
           	 killed if it's an error.
      
           (*) security_task_alloc(), ->task_alloc_security()
      
           	 Removed in favour of security_prepare_creds().
      
           (*) security_cred_free(), ->cred_free()
      
           	 New.  Free security data attached to cred->security.
      
           (*) security_prepare_creds(), ->cred_prepare()
      
           	 New. Duplicate any security data attached to cred->security.
      
           (*) security_commit_creds(), ->cred_commit()
      
           	 New. Apply any security effects for the upcoming installation of new
           	 security by commit_creds().
      
           (*) security_task_post_setuid(), ->task_post_setuid()
      
           	 Removed in favour of security_task_fix_setuid().
      
           (*) security_task_fix_setuid(), ->task_fix_setuid()
      
           	 Fix up the proposed new credentials for setuid().  This is used by
           	 cap_set_fix_setuid() to implicitly adjust capabilities in line with
           	 setuid() changes.  Changes are made to the new credentials, rather
           	 than the task itself as in security_task_post_setuid().
      
           (*) security_task_reparent_to_init(), ->task_reparent_to_init()
      
           	 Removed.  Instead the task being reparented to init is referred
           	 directly to init's credentials.
      
      	 NOTE!  This results in the loss of some state: SELinux's osid no
      	 longer records the sid of the thread that forked it.
      
           (*) security_key_alloc(), ->key_alloc()
           (*) security_key_permission(), ->key_permission()
      
           	 Changed.  These now take cred pointers rather than task pointers to
           	 refer to the security context.
      
       (4) sys_capset().
      
           This has been simplified and uses less locking.  The LSM functions it
           calls have been merged.
      
       (5) reparent_to_kthreadd().
      
           This gives the current thread the same credentials as init by simply using
           commit_thread() to point that way.
      
       (6) __sigqueue_alloc() and switch_uid()
      
           __sigqueue_alloc() can't stop the target task from changing its creds
           beneath it, so this function gets a reference to the currently applicable
           user_struct which it then passes into the sigqueue struct it returns if
           successful.
      
           switch_uid() is now called from commit_creds(), and possibly should be
           folded into that.  commit_creds() should take care of protecting
           __sigqueue_alloc().
      
       (7) [sg]et[ug]id() and co and [sg]et_current_groups.
      
           The set functions now all use prepare_creds(), commit_creds() and
           abort_creds() to build and check a new set of credentials before applying
           it.
      
           security_task_set[ug]id() is called inside the prepared section.  This
           guarantees that nothing else will affect the creds until we've finished.
      
           The calling of set_dumpable() has been moved into commit_creds().
      
           Much of the functionality of set_user() has been moved into
           commit_creds().
      
           The get functions all simply access the data directly.
      
       (8) security_task_prctl() and cap_task_prctl().
      
           security_task_prctl() has been modified to return -ENOSYS if it doesn't
           want to handle a function, or otherwise return the return value directly
           rather than through an argument.
      
           Additionally, cap_task_prctl() now prepares a new set of credentials, even
           if it doesn't end up using it.
      
       (9) Keyrings.
      
           A number of changes have been made to the keyrings code:
      
           (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
           	 all been dropped and built in to the credentials functions directly.
           	 They may want separating out again later.
      
           (b) key_alloc() and search_process_keyrings() now take a cred pointer
           	 rather than a task pointer to specify the security context.
      
           (c) copy_creds() gives a new thread within the same thread group a new
           	 thread keyring if its parent had one, otherwise it discards the thread
           	 keyring.
      
           (d) The authorisation key now points directly to the credentials to extend
           	 the search into rather pointing to the task that carries them.
      
           (e) Installing thread, process or session keyrings causes a new set of
           	 credentials to be created, even though it's not strictly necessary for
           	 process or session keyrings (they're shared).
      
      (10) Usermode helper.
      
           The usermode helper code now carries a cred struct pointer in its
           subprocess_info struct instead of a new session keyring pointer.  This set
           of credentials is derived from init_cred and installed on the new process
           after it has been cloned.
      
           call_usermodehelper_setup() allocates the new credentials and
           call_usermodehelper_freeinfo() discards them if they haven't been used.  A
           special cred function (prepare_usermodeinfo_creds()) is provided
           specifically for call_usermodehelper_setup() to call.
      
           call_usermodehelper_setkeys() adjusts the credentials to sport the
           supplied keyring as the new session keyring.
      
      (11) SELinux.
      
           SELinux has a number of changes, in addition to those to support the LSM
           interface changes mentioned above:
      
           (a) selinux_setprocattr() no longer does its check for whether the
           	 current ptracer can access processes with the new SID inside the lock
           	 that covers getting the ptracer's SID.  Whilst this lock ensures that
           	 the check is done with the ptracer pinned, the result is only valid
           	 until the lock is released, so there's no point doing it inside the
           	 lock.
      
      (12) is_single_threaded().
      
           This function has been extracted from selinux_setprocattr() and put into
           a file of its own in the lib/ directory as join_session_keyring() now
           wants to use it too.
      
           The code in SELinux just checked to see whether a task shared mm_structs
           with other tasks (CLONE_VM), but that isn't good enough.  We really want
           to know if they're part of the same thread group (CLONE_THREAD).
      
      (13) nfsd.
      
           The NFS server daemon now has to use the COW credentials to set the
           credentials it is going to use.  It really needs to pass the credentials
           down to the functions it calls, but it can't do that until other patches
           in this series have been applied.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      d84f4f99
    • D
      CRED: Pass credentials through dentry_open() · 745ca247
      David Howells 提交于
      Pass credentials through dentry_open() so that the COW creds patch can have
      SELinux's flush_unauthorized_files() pass the appropriate creds back to itself
      when it opens its null chardev.
      
      The security_dentry_open() call also now takes a creds pointer, as does the
      dentry_open hook in struct security_operations.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      745ca247
    • D
      CRED: Separate task security context from task_struct · b6dff3ec
      David Howells 提交于
      Separate the task security context from task_struct.  At this point, the
      security data is temporarily embedded in the task_struct with two pointers
      pointing to it.
      
      Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
      entry.S via asm-offsets.
      
      With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      b6dff3ec
    • D
      CRED: Wrap task credential accesses in the NFS daemon · 5cc0a840
      David Howells 提交于
      Wrap access to task credentials so that they can be separated more easily from
      the task_struct during the introduction of COW creds.
      
      Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
      
      Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
      sense to use RCU directly rather than a convenient wrapper; these will be
      addressed by later patches.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-nfs@vger.kernel.org
      Signed-off-by: NJames Morris <jmorris@namei.org>
      5cc0a840
  7. 10 11月, 2008 1 次提交
    • D
      Fix nfsd truncation of readdir results · b726e923
      Doug Nazar 提交于
      Commit 8d7c4203 "nfsd: fix failure to set eof in readdir in some
      situations" introduced a bug: on a directory in an exported ext3
      filesystem with dir_index unset, a READDIR will only return about 250
      entries, even if the directory was larger.
      
      Bisected it back to this commit; reverting it fixes the problem.
      
      It turns out that in this case ext3 reads a block at a time, then
      returns from readdir, which means we can end up with buf.full==0 but
      with more entries in the directory still to be read.  Before 8d7c4203
      (but after c002a6c7 "Optimise NFS readdir hack slightly"), this would
      cause us to return the READDIR result immediately, but with the eof bit
      unset.  That could cause a performance regression (because the client
      would need more roundtrips to the server to read the whole directory),
      but no loss in correctness, since the cleared eof bit caused the client
      to send another readdir.  After 8d7c4203, the setting of the eof bit
      made this a correctness problem.
      
      So, move nfserr_eof into the loop and remove the buf.full check so that
      we loop until buf.used==0.  The following seems to do the right thing
      and reduces the network traffic since we don't return a READDIR result
      until the buffer is full.
      
      Tested on an empty directory & large directory; eof is properly sent and
      there are no more short buffers.
      Signed-off-by: NDoug Nazar <nazard@dragoninc.ca>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      b726e923
  8. 31 10月, 2008 2 次提交
  9. 23 10月, 2008 10 次提交
  10. 05 10月, 2008 2 次提交
  11. 04 10月, 2008 1 次提交
    • J
      nfsd: common grace period control · af558e33
      J. Bruce Fields 提交于
      Rewrite grace period code to unify management of grace period across
      lockd and nfsd.  The current code has lockd and nfsd cooperate to
      compute a grace period which is satisfactory to them both, and then
      individually enforce it.  This creates a slight race condition, since
      the enforcement is not coordinated.  It's also more complicated than
      necessary.
      
      Here instead we have lockd and nfsd each inform common code when they
      enter the grace period, and when they're ready to leave the grace
      period, and allow normal locking only after both of them are ready to
      leave.
      
      We also expect the locks_start_grace()/locks_end_grace() interface here
      to be simpler to build on for future cluster/high-availability work,
      which may require (for example) putting individual filesystems into
      grace, or enforcing grace periods across multiple cluster nodes.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      af558e33
  12. 30 9月, 2008 2 次提交