1. 24 2月, 2011 1 次提交
    • N
      Fix over-zealous flush_disk when changing device size. · 93b270f7
      NeilBrown 提交于
      There are two cases when we call flush_disk.
      In one, the device has disappeared (check_disk_change) so any
      data will hold becomes irrelevant.
      In the oter, the device has changed size (check_disk_size_change)
      so data we hold may be irrelevant.
      
      In both cases it makes sense to discard any 'clean' buffers,
      so they will be read back from the device if needed.
      
      In the former case it makes sense to discard 'dirty' buffers
      as there will never be anywhere safe to write the data.  In the
      second case it *does*not* make sense to discard dirty buffers
      as that will lead to file system corruption when you simply enlarge
      the containing devices.
      
      flush_disk calls __invalidate_devices.
      __invalidate_device calls both invalidate_inodes and invalidate_bdev.
      
      invalidate_inodes *does* discard I_DIRTY inodes and this does lead
      to fs corruption.
      
      invalidate_bev *does*not* discard dirty pages, but I don't really care
      about that at present.
      
      So this patch adds a flag to __invalidate_device (calling it
      __invalidate_device2) to indicate whether dirty buffers should be
      killed, and this is passed to invalidate_inodes which can choose to
      skip dirty inodes.
      
      flusk_disk then passes true from check_disk_change and false from
      check_disk_size_change.
      
      dm avoids tripping over this problem by calling i_size_write directly
      rathher than using check_disk_size_change.
      
      md does use check_disk_size_change and so is affected.
      
      This regression was introduced by commit 608aeef1 which causes
      check_disk_size_change to call flush_disk, so it is suitable for any
      kernel since 2.6.27.
      
      Cc: stable@kernel.org
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Andrew Patterson <andrew.patterson@hp.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      93b270f7
  2. 17 1月, 2011 3 次提交
    • A
      tidy up around finish_automount() · b1e75df4
      Al Viro 提交于
      do_add_mount() and mnt_clear_expiry() are not needed outside of
      namespace.c anymore, now that namei has finish_automount() to
      use.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b1e75df4
    • A
      Take the completion of automount into new helper · 19a167af
      Al Viro 提交于
      ... and shift it from namei.c to namespace.c
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      19a167af
    • A
      sanitize vfsmount refcounting changes · f03c6599
      Al Viro 提交于
      Instead of splitting refcount between (per-cpu) mnt_count
      and (SMP-only) mnt_longrefs, make all references contribute
      to mnt_count again and keep track of how many are longterm
      ones.
      
      Accounting rules for longterm count:
      	* 1 for each fs_struct.root.mnt
      	* 1 for each fs_struct.pwd.mnt
      	* 1 for having non-NULL ->mnt_ns
      	* decrement to 0 happens only under vfsmount lock exclusive
      
      That allows nice common case for mntput() - since we can't drop the
      final reference until after mnt_longterm has reached 0 due to the rules
      above, mntput() can grab vfsmount lock shared and check mnt_longterm.
      If it turns out to be non-zero (which is the common case), we know
      that this is not the final mntput() and can just blindly decrement
      percpu mnt_count.  Otherwise we grab vfsmount lock exclusive and
      do usual decrement-and-check of percpu mnt_count.
      
      For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
      namespace.c uses the latter in places where we don't already hold
      vfsmount lock exclusive and opencodes a few remaining spots where
      we need to manipulate mnt_longterm.
      
      Note that we mostly revert the code outside of fs/namespace.c back
      to what we used to have; in particular, normal code doesn't need
      to care about two kinds of references, etc.  And we get to keep
      the optimization Nick's variant had bought us...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f03c6599
  3. 16 1月, 2011 1 次提交
    • D
      Unexport do_add_mount() and add in follow_automount(), not ->d_automount() · ea5b778a
      David Howells 提交于
      Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
      added rather than calling do_add_mount() itself.  follow_automount() will then
      do the addition.
      
      This slightly complicates things as ->d_automount() normally wants to add the
      new vfsmount to an expiration list and start an expiration timer.  The problem
      with that is that the vfsmount will be deleted if it has a refcount of 1 and
      the timer will not repeat if the expiration list is empty.
      
      To this end, we require the vfsmount to be returned from d_automount() with a
      refcount of (at least) 2.  One of these refs will be dropped unconditionally.
      In addition, follow_automount() must get a 3rd ref around the call to
      do_add_mount() lest it eat a ref and return an error, leaving the mount we
      have open to being expired as we would otherwise have only 1 ref on it.
      
      d_automount() should also add the the vfsmount to the expiration list (by
      calling mnt_set_expiry()) and start the expiration timer before returning, if
      this mechanism is to be used.  The vfsmount will be unlinked from the
      expiration list by follow_automount() if do_add_mount() fails.
      
      This patch also fixes the call to do_add_mount() for AFS to propagate the mount
      flags from the parent vfsmount.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ea5b778a
  4. 07 1月, 2011 1 次提交
    • N
      fs: scale mntget/mntput · b3e19d92
      Nick Piggin 提交于
      The problem that this patch aims to fix is vfsmount refcounting scalability.
      We need to take a reference on the vfsmount for every successful path lookup,
      which often go to the same mount point.
      
      The fundamental difficulty is that a "simple" reference count can never be made
      scalable, because any time a reference is dropped, we must check whether that
      was the last reference. To do that requires communication with all other CPUs
      that may have taken a reference count.
      
      We can make refcounts more scalable in a couple of ways, involving keeping
      distributed counters, and checking for the global-zero condition less
      frequently.
      
      - check the global sum once every interval (this will delay zero detection
        for some interval, so it's probably a showstopper for vfsmounts).
      
      - keep a local count and only taking the global sum when local reaches 0 (this
        is difficult for vfsmounts, because we can't hold preempt off for the life of
        a reference, so a counter would need to be per-thread or tied strongly to a
        particular CPU which requires more locking).
      
      - keep a local difference of increments and decrements, which allows us to sum
        the total difference and hence find the refcount when summing all CPUs. Then,
        keep a single integer "long" refcount for slow and long lasting references,
        and only take the global sum of local counters when the long refcount is 0.
      
      This last scheme is what I implemented here. Attached mounts and process root
      and working directory references are "long" references, and everything else is
      a short reference.
      
      This allows scalable vfsmount references during path walking over mounted
      subtrees and unattached (lazy umounted) mounts with processes still running
      in them.
      
      This results in one fewer atomic op in the fastpath: mntget is now just a
      per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
      and non-atomic decrement in the common case. However code is otherwise bigger
      and heavier, so single threaded performance is basically a wash.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b3e19d92
  5. 29 10月, 2010 1 次提交
  6. 26 10月, 2010 3 次提交
  7. 18 8月, 2010 2 次提交
    • N
      fs: brlock vfsmount_lock · 99b7db7b
      Nick Piggin 提交于
      fs: brlock vfsmount_lock
      
      Use a brlock for the vfsmount lock. It must be taken for write whenever
      modifying the mount hash or associated fields, and may be taken for read when
      performing mount hash lookups.
      
      A new lock is added for the mnt-id allocator, so it doesn't need to take
      the heavy vfsmount write-lock.
      
      The number of atomics should remain the same for fastpath rlock cases, though
      code would be slightly slower due to per-cpu access. Scalability is not not be
      much improved in common cases yet, due to other locks (ie. dcache_lock) getting
      in the way. However path lookups crossing mountpoints should be one case where
      scalability is improved (currently requiring the global lock).
      
      The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
      Altix system (high latency to remote nodes), a simple umount microbenchmark
      (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
      took 6.8s, afterwards took 7.1s, about 5% slower.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99b7db7b
    • N
      tty: fix fu_list abuse · d996b62a
      Nick Piggin 提交于
      tty: fix fu_list abuse
      
      tty code abuses fu_list, which causes a bug in remount,ro handling.
      
      If a tty device node is opened on a filesystem, then the last link to the inode
      removed, the filesystem will be allowed to be remounted readonly. This is
      because fs_may_remount_ro does not find the 0 link tty inode on the file sb
      list (because the tty code incorrectly removed it to use for its own purpose).
      This can result in a filesystem with errors after it is marked "clean".
      
      Taking idea from Christoph's initial patch, allocate a tty private struct
      at file->private_data and put our required list fields in there, linking
      file and tty. This makes tty nodes behave the same way as other device nodes
      and avoid meddling with the vfs, and avoids this bug.
      
      The error handling is not trivial in the tty code, so for this bugfix, I take
      the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
      This is not a problem because our allocator doesn't fail small allocs as a rule
      anyway. So proper error handling is left as an exercise for tty hackers.
      
      [ Arguably filesystem's device inode would ideally be divorced from the
      driver's pseudo inode when it is opened, but in practice it's not clear whether
      that will ever be worth implementing. ]
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d996b62a
  8. 22 5月, 2010 1 次提交
  9. 04 3月, 2010 1 次提交
  10. 23 12月, 2009 1 次提交
  11. 17 12月, 2009 1 次提交
  12. 24 9月, 2009 1 次提交
    • V
      fs: fix overflow in sys_mount() for in-kernel calls · eca6f534
      Vegard Nossum 提交于
      sys_mount() reads/copies a whole page for its "type" parameter.  When
      do_mount_root() passes a kernel address that points to an object which is
      smaller than a whole page, copy_mount_options() will happily go past this
      memory object, possibly dereferencing "wild" pointers that could be in any
      state (hence the kmemcheck warning, which shows that parts of the next
      page are not even allocated).
      
      (The likelihood of something going wrong here is pretty low -- first of
      all this only applies to kernel calls to sys_mount(), which are mostly
      found in the boot code.  Secondly, I guess if the page was not mapped,
      exact_copy_from_user() _would_ in fact handle it correctly because of its
      access_ok(), etc.  checks.)
      
      But it is much nicer to avoid the dubious reads altogether, by stopping as
      soon as we find a NUL byte.  Is there a good reason why we can't do
      something like this, using the already existing strndup_from_user()?
      
      [akpm@linux-foundation.org: make copy_mount_string() static]
      [AV: fix compat mount breakage, which involves undoing akpm's change above]
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: Nal <al@dizzy.pdmi.ras.ru>
      eca6f534
  13. 12 6月, 2009 4 次提交
    • A
      Trim a bit of crap from fs.h · 62c6943b
      Al Viro 提交于
      do_remount_sb() is fs/internal.h fodder, fsync_no_super() is long gone.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      62c6943b
    • J
      vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815
      Jan Kara 提交于
      It is unnecessarily fragile to have two places (fsync_super() and do_sync())
      doing data integrity sync of the filesystem. Alter __fsync_super() to
      accommodate needs of both callers and use it. So after this patch
      __fsync_super() is the only place where we gather all the calls needed to
      properly send all data on a filesystem to disk.
      
      Nice bonus is that we get a complete livelock avoidance and write_supers()
      is now only used for periodic writeback of superblocks.
      
      sync_blockdevs() introduced a couple of patches ago is gone now.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5cee5815
    • J
      vfs: Fix sys_sync() and fsync_super() reliability (version 4) · 5a3e5cb8
      Jan Kara 提交于
      So far, do_sync() called:
        sync_inodes(0);
        sync_supers();
        sync_filesystems(0);
        sync_filesystems(1);
        sync_inodes(1);
      
      This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
      submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
      transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
      not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
      and others are hit by this) when racing e.g. with background writeback. A
      similar problem hits also other filesystems (e.g. ext2) because of
      write_supers() being called before the sync_inodes(1).
      
      Change the ordering of calls in do_sync() - this requires a new function
      sync_blockdevs() to preserve the property that block devices are always synced
      after write_super() / sync_fs() call.
      
      The same issue is fixed in __fsync_super() function used on umount /
      remount read-only.
      
      [AV: build fixes]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5a3e5cb8
    • N
      fs: move mark_files_ro into file_table.c · 864d7c4c
      npiggin@suse.de 提交于
      This function walks the s_files lock, and operates primarily on the
      files in a superblock, so it better belongs here (eg. see also
      fs_may_remount_ro).
      
      [AV: ... and it shouldn't be static after that move]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      864d7c4c
  14. 01 4月, 2009 2 次提交
    • A
      New locking/refcounting for fs_struct · 498052bb
      Al Viro 提交于
      * all changes of current->fs are done under task_lock and write_lock of
        old fs->lock
      * refcount is not atomic anymore (same protection)
      * its decrements are done when removing reference from current; at the
        same time we decide whether to free it.
      * put_fs_struct() is gone
      * new field - ->in_exec.  Set by check_unsafe_exec() if we are trying to do
        execve() and only subthreads share fs_struct.  Cleared when finishing exec
        (success and failure alike).  Makes CLONE_FS fail with -EAGAIN if set.
      * check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
        is in progress.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      498052bb
    • A
      Take fs_struct handling to new file (fs/fs_struct.c) · 3e93cd67
      Al Viro 提交于
      Pure code move; two new helper functions for nfsd and daemonize
      (unshare_fs_struct() and daemonize_fs_struct() resp.; for now -
      the same code as used to be in callers).  unshare_fs_struct()
      exported (for nfsd, as copy_fs_struct()/exit_fs() used to be),
      copy_fs_struct() and exit_fs() don't need exports anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3e93cd67
  15. 29 3月, 2009 1 次提交
    • H
      fix setuid sometimes doesn't · e426b64c
      Hugh Dickins 提交于
      Joe Malicki reports that setuid sometimes doesn't: very rarely,
      a setuid root program does not get root euid; and, by the way,
      they have a health check running lsof every few minutes.
      
      Right, check_unsafe_exec() notes whether the files_struct is being
      shared by more threads than will get killed by the exec, and if so
      sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
      But /proc/<pid>/fd and /proc/<pid>/fdinfo lookups make transient
      use of get_files_struct(), which also raises that sharing count.
      
      There's a rather simple fix for this: exec's check on files->count
      has been redundant ever since 2.6.1 made it unshare_files() (except
      while compat_do_execve() omitted to do so) - just remove that check.
      
      [Note to -stable: this patch will not apply before 2.6.29: earlier
      releases should just remove the files->count line from unsafe_exec().]
      Reported-by: NJoe Malicki <jmalicki@metacarta.com>
      Narrowed-down-by: NMichael Itz <mitz@metacarta.com>
      Tested-by: NJoe Malicki <jmalicki@metacarta.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e426b64c
  16. 07 2月, 2009 1 次提交
    • D
      CRED: Fix SUID exec regression · 0bf2f3ae
      David Howells 提交于
      The patch:
      
      	commit a6f76f23
      	CRED: Make execve() take advantage of copy-on-write credentials
      
      moved the place in which the 'safeness' of a SUID/SGID exec was performed to
      before de_thread() was called.  This means that LSM_UNSAFE_SHARE is now
      calculated incorrectly.  This flag is set if any of the usage counts for
      fs_struct, files_struct and sighand_struct are greater than 1 at the time the
      determination is made.  All of which are true for threads created by the
      pthread library.
      
      However, since we wish to make the security calculation before irrevocably
      damaging the process so that we can return it an error code in the case where
      we decide we want to reject the exec request on this basis, we have to make the
      determination before calling de_thread().
      
      So, instead, we count up the number of threads (CLONE_THREAD) that are sharing
      our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs
      (CLONE_SIGHAND/CLONE_THREAD) with us.  These will be killed by de_thread() and
      so can be discounted by check_unsafe_exec().
      
      We do have to be careful because CLONE_THREAD does not imply FS or FILES.
      
      We _assume_ that there will be no extra references to these structs held by the
      threads we're going to kill.
      
      This can be tested with the attached pair of programs.  Build the two programs
      using the Makefile supplied, and run ./test1 as a non-root user.  If
      successful, you should see something like:
      
      	[dhowells@andromeda tmp]$ ./test1
      	--TEST1--
      	uid=4043, euid=4043 suid=4043
      	exec ./test2
      	--TEST2--
      	uid=4043, euid=0 suid=0
      	SUCCESS - Correct effective user ID
      
      and if unsuccessful, something like:
      
      	[dhowells@andromeda tmp]$ ./test1
      	--TEST1--
      	uid=4043, euid=4043 suid=4043
      	exec ./test2
      	--TEST2--
      	uid=4043, euid=4043 suid=4043
      	ERROR - Incorrect effective user ID!
      
      The non-root user ID you see will depend on the user you run as.
      
      [test1.c]
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      #include <pthread.h>
      
      static void *thread_func(void *arg)
      {
      	while (1) {}
      }
      
      int main(int argc, char **argv)
      {
      	pthread_t tid;
      	uid_t uid, euid, suid;
      
      	printf("--TEST1--\n");
      	getresuid(&uid, &euid, &suid);
      	printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);
      
      	if (pthread_create(&tid, NULL, thread_func, NULL) < 0) {
      		perror("pthread_create");
      		exit(1);
      	}
      
      	printf("exec ./test2\n");
      	execlp("./test2", "test2", NULL);
      	perror("./test2");
      	_exit(1);
      }
      
      [test2.c]
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      
      int main(int argc, char **argv)
      {
      	uid_t uid, euid, suid;
      
      	getresuid(&uid, &euid, &suid);
      	printf("--TEST2--\n");
      	printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);
      
      	if (euid != 0) {
      		fprintf(stderr, "ERROR - Incorrect effective user ID!\n");
      		exit(1);
      	}
      	printf("SUCCESS - Correct effective user ID\n");
      	exit(0);
      }
      
      [Makefile]
      CFLAGS = -D_GNU_SOURCE -Wall -Werror -Wunused
      all: test1 test2
      
      test1: test1.c
      	gcc $(CFLAGS) -o test1 test1.c -lpthread
      
      test2: test2.c
      	gcc $(CFLAGS) -o test2 test2.c
      	sudo chown root.root test2
      	sudo chmod +s test2
      Reported-by: NDavid Smith <dsmith@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NDavid Smith <dsmith@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      0bf2f3ae
  17. 14 11月, 2008 1 次提交
    • D
      CRED: Make execve() take advantage of copy-on-write credentials · a6f76f23
      David Howells 提交于
      Make execve() take advantage of copy-on-write credentials, allowing it to set
      up the credentials in advance, and then commit the whole lot after the point
      of no return.
      
      This patch and the preceding patches have been tested with the LTP SELinux
      testsuite.
      
      This patch makes several logical sets of alteration:
      
       (1) execve().
      
           The credential bits from struct linux_binprm are, for the most part,
           replaced with a single credentials pointer (bprm->cred).  This means that
           all the creds can be calculated in advance and then applied at the point
           of no return with no possibility of failure.
      
           I would like to replace bprm->cap_effective with:
      
      	cap_isclear(bprm->cap_effective)
      
           but this seems impossible due to special behaviour for processes of pid 1
           (they always retain their parent's capability masks where normally they'd
           be changed - see cap_bprm_set_creds()).
      
           The following sequence of events now happens:
      
           (a) At the start of do_execve, the current task's cred_exec_mutex is
           	 locked to prevent PTRACE_ATTACH from obsoleting the calculation of
           	 creds that we make.
      
           (a) prepare_exec_creds() is then called to make a copy of the current
           	 task's credentials and prepare it.  This copy is then assigned to
           	 bprm->cred.
      
        	 This renders security_bprm_alloc() and security_bprm_free()
           	 unnecessary, and so they've been removed.
      
           (b) The determination of unsafe execution is now performed immediately
           	 after (a) rather than later on in the code.  The result is stored in
           	 bprm->unsafe for future reference.
      
           (c) prepare_binprm() is called, possibly multiple times.
      
           	 (i) This applies the result of set[ug]id binaries to the new creds
           	     attached to bprm->cred.  Personality bit clearance is recorded,
           	     but now deferred on the basis that the exec procedure may yet
           	     fail.
      
               (ii) This then calls the new security_bprm_set_creds().  This should
      	     calculate the new LSM and capability credentials into *bprm->cred.
      
      	     This folds together security_bprm_set() and parts of
      	     security_bprm_apply_creds() (these two have been removed).
      	     Anything that might fail must be done at this point.
      
               (iii) bprm->cred_prepared is set to 1.
      
      	     bprm->cred_prepared is 0 on the first pass of the security
      	     calculations, and 1 on all subsequent passes.  This allows SELinux
      	     in (ii) to base its calculations only on the initial script and
      	     not on the interpreter.
      
           (d) flush_old_exec() is called to commit the task to execution.  This
           	 performs the following steps with regard to credentials:
      
      	 (i) Clear pdeath_signal and set dumpable on certain circumstances that
      	     may not be covered by commit_creds().
      
               (ii) Clear any bits in current->personality that were deferred from
                   (c.i).
      
           (e) install_exec_creds() [compute_creds() as was] is called to install the
           	 new credentials.  This performs the following steps with regard to
           	 credentials:
      
               (i) Calls security_bprm_committing_creds() to apply any security
                   requirements, such as flushing unauthorised files in SELinux, that
                   must be done before the credentials are changed.
      
      	     This is made up of bits of security_bprm_apply_creds() and
      	     security_bprm_post_apply_creds(), both of which have been removed.
      	     This function is not allowed to fail; anything that might fail
      	     must have been done in (c.ii).
      
               (ii) Calls commit_creds() to apply the new credentials in a single
                   assignment (more or less).  Possibly pdeath_signal and dumpable
                   should be part of struct creds.
      
      	 (iii) Unlocks the task's cred_replace_mutex, thus allowing
      	     PTRACE_ATTACH to take place.
      
               (iv) Clears The bprm->cred pointer as the credentials it was holding
                   are now immutable.
      
               (v) Calls security_bprm_committed_creds() to apply any security
                   alterations that must be done after the creds have been changed.
                   SELinux uses this to flush signals and signal handlers.
      
           (f) If an error occurs before (d.i), bprm_free() will call abort_creds()
           	 to destroy the proposed new credentials and will then unlock
           	 cred_replace_mutex.  No changes to the credentials will have been
           	 made.
      
       (2) LSM interface.
      
           A number of functions have been changed, added or removed:
      
           (*) security_bprm_alloc(), ->bprm_alloc_security()
           (*) security_bprm_free(), ->bprm_free_security()
      
           	 Removed in favour of preparing new credentials and modifying those.
      
           (*) security_bprm_apply_creds(), ->bprm_apply_creds()
           (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()
      
           	 Removed; split between security_bprm_set_creds(),
           	 security_bprm_committing_creds() and security_bprm_committed_creds().
      
           (*) security_bprm_set(), ->bprm_set_security()
      
           	 Removed; folded into security_bprm_set_creds().
      
           (*) security_bprm_set_creds(), ->bprm_set_creds()
      
           	 New.  The new credentials in bprm->creds should be checked and set up
           	 as appropriate.  bprm->cred_prepared is 0 on the first call, 1 on the
           	 second and subsequent calls.
      
           (*) security_bprm_committing_creds(), ->bprm_committing_creds()
           (*) security_bprm_committed_creds(), ->bprm_committed_creds()
      
           	 New.  Apply the security effects of the new credentials.  This
           	 includes closing unauthorised files in SELinux.  This function may not
           	 fail.  When the former is called, the creds haven't yet been applied
           	 to the process; when the latter is called, they have.
      
       	 The former may access bprm->cred, the latter may not.
      
       (3) SELinux.
      
           SELinux has a number of changes, in addition to those to support the LSM
           interface changes mentioned above:
      
           (a) The bprm_security_struct struct has been removed in favour of using
           	 the credentials-under-construction approach.
      
           (c) flush_unauthorized_files() now takes a cred pointer and passes it on
           	 to inode_has_perm(), file_has_perm() and dentry_open().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      a6f76f23
  18. 22 4月, 2008 1 次提交
  19. 09 5月, 2007 1 次提交
  20. 01 10月, 2006 4 次提交
    • A
      [PATCH] CONFIG_BLOCK internal.h cleanups · 5e6d12b2
      Andrew Morton 提交于
      - forward declare struct superblock
      - use inlines, not macros
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5e6d12b2
    • D
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells 提交于
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
           	 taking a leaf out of JFFS2's book.
      
       (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
           linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
           however, still used in places, and so is still available.
      
       (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
           parts of linux/fs.h.
      
       (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
      
       (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
      
       (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
           is not enabled.
      
       (*) fs/no-block.c is created to hold out-of-line stubs and things that are
           required when CONFIG_BLOCK is not set:
      
           (*) Default blockdev file operations (to give error ENODEV on opening).
      
       (*) Makes some /proc changes:
      
           (*) /proc/devices does not list any blockdevs.
      
           (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
      
       (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
      
       (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
           given command other than Q_SYNC or if a special device is specified.
      
       (*) In init/do_mounts.c, no reference is made to the blockdev routines if
           CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
      
       (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
           error ENOSYS by way of cond_syscall if so).
      
       (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
           CONFIG_BLOCK is not set, since they can't then happen.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9361401e
    • D
      [PATCH] BLOCK: Remove dependence on existence of blockdev_superblock [try #6] · 7b0de42d
      David Howells 提交于
      Move blockdev_superblock extern declaration from fs/fs-writeback.c to a
      headerfile and remove the dependence on it by wrapping it in a macro.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7b0de42d
    • D
      [PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6] · 07f3f05c
      David Howells 提交于
      Create a new header file, fs/internal.h, for common definitions local to the
      sources in the fs/ directory.
      
      Move extern definitions that should be in header files from fs/*.c to
      fs/internal.h or other main header files where they span directories.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      07f3f05c