1. 04 4月, 2011 1 次提交
    • E
      capabilites: allow the application of capability limits to usermode helpers · 17f60a7d
      Eric Paris 提交于
      There is no way to limit the capabilities of usermodehelpers. This problem
      reared its head recently when someone complained that any user with
      cap_net_admin was able to load arbitrary kernel modules, even though the user
      didn't have cap_sys_module.  The reason is because the actual load is done by
      a usermode helper and those always have the full cap set.  This patch addes new
      sysctls which allow us to bound the permissions of usermode helpers.
      
      /proc/sys/kernel/usermodehelper/bset
      /proc/sys/kernel/usermodehelper/inheritable
      
      You must have CAP_SYS_MODULE  and CAP_SETPCAP to change these (changes are
      &= ONLY).  When the kernel launches a usermodehelper it will do so with these
      as the bset and pI.
      
      -v2:	make globals static
      	create spinlock to protect globals
      
      -v3:	require both CAP_SETPCAP and CAP_SYS_MODULE
      -v4:	fix the typo s/CAP_SET_PCAP/CAP_SETPCAP/ because I didn't commit
      Signed-off-by: NEric Paris <eparis@redhat.com>
      No-objection-from: Serge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: NAndrew G. Morgan <morgan@kernel.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      17f60a7d
  2. 18 8月, 2010 1 次提交
    • D
      Make do_execve() take a const filename pointer · d7627467
      David Howells 提交于
      Make do_execve() take a const filename pointer so that kernel_execve() compiles
      correctly on ARM:
      
      arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
      
      This also requires the argv and envp arguments to be consted twice, once for
      the pointer array and once for the strings the array points to.  This is
      because do_execve() passes a pointer to the filename (now const) to
      copy_strings_kernel().  A simpler alternative would be to cast the filename
      pointer in do_execve() when it's passed to copy_strings_kernel().
      
      do_execve() may not change any of the strings it is passed as part of the argv
      or envp lists as they are some of them in .rodata, so marking these strings as
      const should be fine.
      
      Further kernel_execve() and sys_execve() need to be changed to match.
      
      This has been test built on x86_64, frv, arm and mips.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7627467
  3. 28 5月, 2010 8 次提交
    • O
      call_usermodehelper: UMH_WAIT_EXEC ignores kernel_thread() failure · 04b1c384
      Oleg Nesterov 提交于
      UMH_WAIT_EXEC should report the error if kernel_thread() fails, like
      UMH_WAIT_PROC does.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04b1c384
    • O
      call_usermodehelper: simplify/fix UMH_NO_WAIT case · d47419cd
      Oleg Nesterov 提交于
      __call_usermodehelper(UMH_NO_WAIT) has 2 problems:
      
      	- if kernel_thread() fails, call_usermodehelper_freeinfo()
      	  is not called.
      
      	- for unknown reason UMH_NO_WAIT has UMH_WAIT_PROC logic,
      	  we spawn yet another thread which waits until the user
      	  mode application exits.
      
      Change the UMH_NO_WAIT code to use ____call_usermodehelper() instead of
      wait_for_helper(), and do call_usermodehelper_freeinfo() unconditionally.
      We can rely on CLONE_VFORK, do_fork(CLONE_VFORK) until the child exits or
      execs.
      
      With or without this patch UMH_NO_WAIT does not report the error if
      kernel_thread() fails, this is correct since the caller doesn't wait for
      result.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d47419cd
    • O
      wait_for_helper: SIGCHLD from user-space can lead to use-after-free · 7d642242
      Oleg Nesterov 提交于
      1. wait_for_helper() calls allow_signal(SIGCHLD) to ensure the child
         can't autoreap itself.
      
         However, this means that a spurious SIGCHILD from user-space can
         set TIF_SIGPENDING and:
      
         	- kernel_thread() or sys_wait4() can fail due to signal_pending()
      
         	- worse, wait4() can fail before ____call_usermodehelper() execs
         	  or exits. In this case the caller may kfree(subprocess_info)
         	  while the child still uses this memory.
      
         Change the code to use SIG_DFL instead of magic "(void __user *)2"
         set by allow_signal(). This means that SIGCHLD won't be delivered,
         yet the child won't autoreap itsefl.
      
         The problem is minor, only root can send a signal to this kthread.
      
      2. If sys_wait4(&ret) fails it doesn't populate "ret", in this case
         wait_for_helper() reports a random value from uninitialized var.
      
         With this patch sys_wait4() should never fail, but still it makes
         sense to initialize ret = -ECHILD so that the caller can notice
         the problem.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d642242
    • O
      call_usermodehelper: no need to unblock signals · 363da402
      Oleg Nesterov 提交于
      ____call_usermodehelper() correctly calls flush_signal_handlers() to set
      SIG_DFL, but sigemptyset(->blocked) and recalc_sigpending() are not
      needed.
      
      This kthread was forked by workqueue thread, all signals must be unblocked
      and ignored, no pending signal is possible.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      363da402
    • O
      umh: creds: kill subprocess_info->cred logic · c70a626d
      Oleg Nesterov 提交于
      Now that nobody ever changes subprocess_info->cred we can kill this member
      and related code.  ____call_usermodehelper() always runs in the context of
      freshly forked kernel thread, it has the proper ->cred copied from its
      parent kthread, keventd.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c70a626d
    • O
      umh: creds: convert call_usermodehelper_keys() to use subprocess_info->init() · 685bfd2c
      Oleg Nesterov 提交于
      call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change
      subprocess_info->cred in advance.  Now that we have info->init() we can
      change this code to set tgcred->session_keyring in context of execing
      kernel thread.
      
      Note: since currently call_usermodehelper_keys() is never called with
      UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup()
      are not really needed, we could rely on install_session_keyring_to_cred()
      which does key_get() on success.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      685bfd2c
    • N
      exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit · 898b374a
      Neil Horman 提交于
      The first patch in this series introduced an init function to the
      call_usermodehelper api so that processes could be customized by caller.
      This patch takes advantage of that fact, by customizing the helper in
      do_coredump to create the pipe and set its core limit to one (for our
      recusrsion check).  This lets us clean up the previous uglyness in the
      usermodehelper internals and factor call_usermodehelper out entirely.
      While I'm at it, we can also modify the helper setup to look for a core
      limit value of 1 rather than zero for our recursion check
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      898b374a
    • N
      kmod: add init function to usermodehelper · a06a4dc3
      Neil Horman 提交于
      About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe
      feature in the kernel works.  We had reports of several races, including
      some reports of apps bypassing our recursion check so that a process that
      was forked as part of a core_pattern setup could infinitely crash and
      refork until the system crashed.
      
      We fixed those by improving our recursion checks.  The new check basically
      refuses to fork a process if its core limit is zero, which works well.
      
      Unfortunately, I've been getting grief from maintainer of user space
      programs that are inserted as the forked process of core_pattern.  They
      contend that in order for their programs (such as abrt and apport) to
      work, all the running processes in a system must have their core limits
      set to a non-zero value, to which I say 'yes'.  I did this by design, and
      think thats the right way to do things.
      
      But I've been asked to ease this burden on user space enough times that I
      thought I would take a look at it.  The first suggestion was to make the
      recursion check fail on a non-zero 'special' number, like one.  That way
      the core collector process could set its core size ulimit to 1, and enable
      the kernel's recursion detection.  This isn't a bad idea on the surface,
      but I don't like it since its opt-in, in that if a program like abrt or
      apport has a bug and fails to set such a core limit, we're left with a
      recursively crashing system again.
      
      So I've come up with this.  What I've done is modify the
      call_usermodehelper api such that an extra parameter is added, a function
      pointer which will be called by the user helper task, after it forks, but
      before it exec's the required process.  This will give the caller the
      opportunity to get a call back in the processes context, allowing it to do
      whatever it needs to to the process in the kernel prior to exec-ing the
      user space code.  In the case of do_coredump, this callback is ues to set
      the core ulimit of the helper process to 1.  This elimnates the opt-in
      problem that I had above, as it allows the ulimit for core sizes to be set
      to the value of 1, which is what the recursion check looks for in
      do_coredump.
      
      This patch:
      
      Create new function call_usermodehelper_fns() and allow it to assign both
      an init and cleanup function, as we'll as arbitrary data.
      
      The init function is called from the context of the forked process and
      allows for customization of the helper process prior to calling exec.  Its
      return code gates the continuation of the process, or causes its exit.
      Also add an arbitrary data pointer to the subprocess_info struct allowing
      for data to be passed from the caller to the new process, and the
      subsequent cleanup process
      
      Also, use this patch to cleanup the cleanup function.  It currently takes
      an argp and envp pointer for freeing, which is ugly.  Lets instead just
      make the subprocess_info structure public, and pass that to the cleanup
      and init routines
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a06a4dc3
  4. 12 1月, 2010 1 次提交
    • M
      kmod: fix resource leak in call_usermodehelper_pipe() · 8767ba27
      Masami Hiramatsu 提交于
      Fix resource (write-pipe file) leak in call_usermodehelper_pipe().
      
      When call_usermodehelper_exec() fails, write-pipe file is opened and
      call_usermodehelper_pipe() just returns an error.  Since it is hard for
      caller to determine whether the error occured when opening the pipe or
      executing the helper, the caller cannot close the pipe by themselves.
      
      I've found this resoruce leak when testing coredump.  You can check how
      the resource leaks as below;
      
      $ echo "|nocommand" > /proc/sys/kernel/core_pattern
      $ ulimit -c unlimited
      $ while [ 1 ]; do ./segv; done &> /dev/null &
      $ cat /proc/meminfo (<- repeat it)
      
      where segv.c is;
      //-----
      int main () {
              char *p = 0;
              *p = 1;
      }
      //-----
      
      This patch closes write-pipe file if call_usermodehelper_exec() failed.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8767ba27
  5. 10 11月, 2009 1 次提交
    • E
      security: report the module name to security_module_request · dd8dbf2e
      Eric Paris 提交于
      For SELinux to do better filtering in userspace we send the name of the
      module along with the AVC denial when a program is denied module_request.
      
      Example output:
      
      type=SYSCALL msg=audit(11/03/2009 10:59:43.510:9) : arch=x86_64 syscall=write success=yes exit=2 a0=3 a1=7fc28c0d56c0 a2=2 a3=7fffca0d7440 items=0 ppid=1727 pid=1729 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.nfsd exe=/usr/sbin/rpc.nfsd subj=system_u:system_r:nfsd_t:s0 key=(null)
      type=AVC msg=audit(11/03/2009 10:59:43.510:9) : avc:  denied  { module_request } for  pid=1729 comm=rpc.nfsd kmod="net-pf-10" scontext=system_u:system_r:nfsd_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=system
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      dd8dbf2e
  6. 24 9月, 2009 1 次提交
    • S
      Revert "kmod: fix race in usermodehelper code" · 95e0d86b
      Sebastian Andrzej Siewior 提交于
      This reverts commit c02e3f36 ("kmod: fix race in usermodehelper code")
      
      The patch is wrong.  UMH_WAIT_EXEC is called with VFORK what ensures
      that the child finishes prior returing back to the parent.  No race.
      
      In fact, the patch makes it even worse because it does the thing it
      claims not do:
      
       - It calls ->complete() on UMH_WAIT_EXEC
      
       - the complete() callback may de-allocated subinfo as seen in the
         following call chain:
      
          [<c009f904>] (__link_path_walk+0x20/0xeb4) from [<c00a094c>] (path_walk+0x48/0x94)
          [<c00a094c>] (path_walk+0x48/0x94) from [<c00a0a34>] (do_path_lookup+0x24/0x4c)
          [<c00a0a34>] (do_path_lookup+0x24/0x4c) from [<c00a158c>] (do_filp_open+0xa4/0x83c)
          [<c00a158c>] (do_filp_open+0xa4/0x83c) from [<c009ba90>] (open_exec+0x24/0xe0)
          [<c009ba90>] (open_exec+0x24/0xe0) from [<c009bfa8>] (do_execve+0x7c/0x2e4)
          [<c009bfa8>] (do_execve+0x7c/0x2e4) from [<c0026a80>] (kernel_execve+0x34/0x80)
          [<c0026a80>] (kernel_execve+0x34/0x80) from [<c004b514>] (____call_usermodehelper+0x130/0x148)
          [<c004b514>] (____call_usermodehelper+0x130/0x148) from [<c0024858>] (kernel_thread_exit+0x0/0x8)
      
         and the path pointer was NULL.  Good that ARM's kernel_execve()
         doesn't check the pointer for NULL or else I wouldn't notice it.
      
      The only race there might be is with UMH_NO_WAIT but it is too late for
      me to investigate it now.  UMH_WAIT_PROC could probably also use VFORK
      and we could save one exec.  So the only race I see is with UMH_NO_WAIT
      and recent scheduler changes where the child does not always run first
      might have trigger here something but as I said, it is late....
      Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95e0d86b
  7. 23 9月, 2009 1 次提交
    • N
      kmod: fix race in usermodehelper code · c02e3f36
      Neil Horman 提交于
      The user mode helper code has a race in it.  call_usermodehelper_exec()
      takes an allocated subprocess_info structure, which it passes to a
      workqueue, and then passes it to a kernel thread which it creates, after
      which it calls complete to signal to the caller of
      call_usermodehelper_exec() that it can free the subprocess_info struct.
      
      But since we use that structure in the created thread, we can't call
      complete from __call_usermodehelper(), which is where we create the kernel
      thread.  We need to call complete() from within the kernel thread and then
      not use subprocess_info afterward in the case of UMH_WAIT_EXEC.  Tested
      successfully by me.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c02e3f36
  8. 02 9月, 2009 1 次提交
    • D
      CRED: Add some configurable debugging [try #6] · e0e81739
      David Howells 提交于
      Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
      for credential management.  The additional code keeps track of the number of
      pointers from task_structs to any given cred struct, and checks to see that
      this number never exceeds the usage count of the cred struct (which includes
      all references, not just those from task_structs).
      
      Furthermore, if SELinux is enabled, the code also checks that the security
      pointer in the cred struct is never seen to be invalid.
      
      This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
      kernel thread on seeing cred->security be a NULL pointer (it appears that the
      credential struct has been previously released):
      
      	http://www.kerneloops.org/oops.php?number=252883Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      e0e81739
  9. 17 8月, 2009 1 次提交
    • L
      tracing/events: Add module tracepoints · 7ead8b83
      Li Zefan 提交于
      Add trace points to trace module_load, module_free, module_get,
      module_put and module_request, and use trace_event facility to
      get the trace output.
      
      Here's the sample output:
      
           TASK-PID    CPU#    TIMESTAMP  FUNCTION
              | |       |          |         |
          <...>-42    [000]     1.758380: module_request: fb0 wait=1 call_site=fb_open
          ...
          <...>-60    [000]     3.269403: module_load: scsi_wait_scan
          <...>-60    [000]     3.269432: module_put: scsi_wait_scan call_site=sys_init_module refcnt=0
          <...>-61    [001]     3.273168: module_free: scsi_wait_scan
          ...
          <...>-1021  [000]    13.836081: module_load: sunrpc
          <...>-1021  [000]    13.840589: module_put: sunrpc call_site=sys_init_module refcnt=-1
          <...>-1027  [000]    13.848098: module_get: sunrpc call_site=try_module_get refcnt=0
          <...>-1027  [000]    13.848308: module_get: sunrpc call_site=get_filesystem refcnt=1
          <...>-1027  [000]    13.848692: module_put: sunrpc call_site=put_filesystem refcnt=0
          ...
       modprobe-2587  [001]  1088.437213: module_load: trace_events_sample F
       modprobe-2587  [001]  1088.437786: module_put: trace_events_sample call_site=sys_init_module refcnt=0
      
      Note:
      
      - the taints flag can be 'F', 'C' and/or 'P' if mod->taints != 0
      
      - the module refcnt is percpu, so it can be negative in a
        specific cpu
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <4A891B3C.5030608@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ead8b83
  10. 14 8月, 2009 1 次提交
  11. 09 7月, 2009 1 次提交
  12. 27 5月, 2009 1 次提交
  13. 31 3月, 2009 1 次提交
    • A
      module: create a request_module_nowait() · acae0515
      Arjan van de Ven 提交于
      There seems to be a common pattern in the kernel where drivers want to
      call request_module() from inside a module_init() function. Currently
      this would deadlock.
      
      As a result, several drivers go through hoops like scheduling things via
      kevent, or creating custom work queues (because kevent can deadlock on them).
      
      This patch changes this to use a request_module_nowait() function macro instead,
      which just fires the modprobe off but doesn't wait for it, and thus avoids the
      original deadlock entirely.
      
      On my laptop this already results in one less kernel thread running..
      
      (Includes Jiri's patch to use enum umh_wait)
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (bool-ified)
      Cc: Jiri Slaby <jirislaby@gmail.com>
      acae0515
  14. 30 3月, 2009 1 次提交
    • R
      cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL · 1a2142af
      Rusty Russell 提交于
      Impact: cleanup
      
      (Thanks to Al Viro for reminding me of this, via Ingo)
      
      CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:
      
      	#define CPU_MASK_ALL (cpumask_t) { { ... } }
      
      Taking the address of such a temporary is questionable at best,
      unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
      CPU_MASK_ALL_PTR:
      
      	#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)
      
      Which formalizes this practice.  One day gcc could bite us over this
      usage (though we seem to have gotten away with it so far).
      
      So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR
      with the modern "cpu_all_mask" (a real const struct cpumask *).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Mike Travis <travis@sgi.com>
      1a2142af
  15. 07 1月, 2009 1 次提交
  16. 14 11月, 2008 2 次提交
    • D
      CRED: Inaugurate COW credentials · d84f4f99
      David Howells 提交于
      Inaugurate copy-on-write credentials management.  This uses RCU to manage the
      credentials pointer in the task_struct with respect to accesses by other tasks.
      A process may only modify its own credentials, and so does not need locking to
      access or modify its own credentials.
      
      A mutex (cred_replace_mutex) is added to the task_struct to control the effect
      of PTRACE_ATTACHED on credential calculations, particularly with respect to
      execve().
      
      With this patch, the contents of an active credentials struct may not be
      changed directly; rather a new set of credentials must be prepared, modified
      and committed using something like the following sequence of events:
      
      	struct cred *new = prepare_creds();
      	int ret = blah(new);
      	if (ret < 0) {
      		abort_creds(new);
      		return ret;
      	}
      	return commit_creds(new);
      
      There are some exceptions to this rule: the keyrings pointed to by the active
      credentials may be instantiated - keyrings violate the COW rule as managing
      COW keyrings is tricky, given that it is possible for a task to directly alter
      the keys in a keyring in use by another task.
      
      To help enforce this, various pointers to sets of credentials, such as those in
      the task_struct, are declared const.  The purpose of this is compile-time
      discouragement of altering credentials through those pointers.  Once a set of
      credentials has been made public through one of these pointers, it may not be
      modified, except under special circumstances:
      
        (1) Its reference count may incremented and decremented.
      
        (2) The keyrings to which it points may be modified, but not replaced.
      
      The only safe way to modify anything else is to create a replacement and commit
      using the functions described in Documentation/credentials.txt (which will be
      added by a later patch).
      
      This patch and the preceding patches have been tested with the LTP SELinux
      testsuite.
      
      This patch makes several logical sets of alteration:
      
       (1) execve().
      
           This now prepares and commits credentials in various places in the
           security code rather than altering the current creds directly.
      
       (2) Temporary credential overrides.
      
           do_coredump() and sys_faccessat() now prepare their own credentials and
           temporarily override the ones currently on the acting thread, whilst
           preventing interference from other threads by holding cred_replace_mutex
           on the thread being dumped.
      
           This will be replaced in a future patch by something that hands down the
           credentials directly to the functions being called, rather than altering
           the task's objective credentials.
      
       (3) LSM interface.
      
           A number of functions have been changed, added or removed:
      
           (*) security_capset_check(), ->capset_check()
           (*) security_capset_set(), ->capset_set()
      
           	 Removed in favour of security_capset().
      
           (*) security_capset(), ->capset()
      
           	 New.  This is passed a pointer to the new creds, a pointer to the old
           	 creds and the proposed capability sets.  It should fill in the new
           	 creds or return an error.  All pointers, barring the pointer to the
           	 new creds, are now const.
      
           (*) security_bprm_apply_creds(), ->bprm_apply_creds()
      
           	 Changed; now returns a value, which will cause the process to be
           	 killed if it's an error.
      
           (*) security_task_alloc(), ->task_alloc_security()
      
           	 Removed in favour of security_prepare_creds().
      
           (*) security_cred_free(), ->cred_free()
      
           	 New.  Free security data attached to cred->security.
      
           (*) security_prepare_creds(), ->cred_prepare()
      
           	 New. Duplicate any security data attached to cred->security.
      
           (*) security_commit_creds(), ->cred_commit()
      
           	 New. Apply any security effects for the upcoming installation of new
           	 security by commit_creds().
      
           (*) security_task_post_setuid(), ->task_post_setuid()
      
           	 Removed in favour of security_task_fix_setuid().
      
           (*) security_task_fix_setuid(), ->task_fix_setuid()
      
           	 Fix up the proposed new credentials for setuid().  This is used by
           	 cap_set_fix_setuid() to implicitly adjust capabilities in line with
           	 setuid() changes.  Changes are made to the new credentials, rather
           	 than the task itself as in security_task_post_setuid().
      
           (*) security_task_reparent_to_init(), ->task_reparent_to_init()
      
           	 Removed.  Instead the task being reparented to init is referred
           	 directly to init's credentials.
      
      	 NOTE!  This results in the loss of some state: SELinux's osid no
      	 longer records the sid of the thread that forked it.
      
           (*) security_key_alloc(), ->key_alloc()
           (*) security_key_permission(), ->key_permission()
      
           	 Changed.  These now take cred pointers rather than task pointers to
           	 refer to the security context.
      
       (4) sys_capset().
      
           This has been simplified and uses less locking.  The LSM functions it
           calls have been merged.
      
       (5) reparent_to_kthreadd().
      
           This gives the current thread the same credentials as init by simply using
           commit_thread() to point that way.
      
       (6) __sigqueue_alloc() and switch_uid()
      
           __sigqueue_alloc() can't stop the target task from changing its creds
           beneath it, so this function gets a reference to the currently applicable
           user_struct which it then passes into the sigqueue struct it returns if
           successful.
      
           switch_uid() is now called from commit_creds(), and possibly should be
           folded into that.  commit_creds() should take care of protecting
           __sigqueue_alloc().
      
       (7) [sg]et[ug]id() and co and [sg]et_current_groups.
      
           The set functions now all use prepare_creds(), commit_creds() and
           abort_creds() to build and check a new set of credentials before applying
           it.
      
           security_task_set[ug]id() is called inside the prepared section.  This
           guarantees that nothing else will affect the creds until we've finished.
      
           The calling of set_dumpable() has been moved into commit_creds().
      
           Much of the functionality of set_user() has been moved into
           commit_creds().
      
           The get functions all simply access the data directly.
      
       (8) security_task_prctl() and cap_task_prctl().
      
           security_task_prctl() has been modified to return -ENOSYS if it doesn't
           want to handle a function, or otherwise return the return value directly
           rather than through an argument.
      
           Additionally, cap_task_prctl() now prepares a new set of credentials, even
           if it doesn't end up using it.
      
       (9) Keyrings.
      
           A number of changes have been made to the keyrings code:
      
           (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
           	 all been dropped and built in to the credentials functions directly.
           	 They may want separating out again later.
      
           (b) key_alloc() and search_process_keyrings() now take a cred pointer
           	 rather than a task pointer to specify the security context.
      
           (c) copy_creds() gives a new thread within the same thread group a new
           	 thread keyring if its parent had one, otherwise it discards the thread
           	 keyring.
      
           (d) The authorisation key now points directly to the credentials to extend
           	 the search into rather pointing to the task that carries them.
      
           (e) Installing thread, process or session keyrings causes a new set of
           	 credentials to be created, even though it's not strictly necessary for
           	 process or session keyrings (they're shared).
      
      (10) Usermode helper.
      
           The usermode helper code now carries a cred struct pointer in its
           subprocess_info struct instead of a new session keyring pointer.  This set
           of credentials is derived from init_cred and installed on the new process
           after it has been cloned.
      
           call_usermodehelper_setup() allocates the new credentials and
           call_usermodehelper_freeinfo() discards them if they haven't been used.  A
           special cred function (prepare_usermodeinfo_creds()) is provided
           specifically for call_usermodehelper_setup() to call.
      
           call_usermodehelper_setkeys() adjusts the credentials to sport the
           supplied keyring as the new session keyring.
      
      (11) SELinux.
      
           SELinux has a number of changes, in addition to those to support the LSM
           interface changes mentioned above:
      
           (a) selinux_setprocattr() no longer does its check for whether the
           	 current ptracer can access processes with the new SID inside the lock
           	 that covers getting the ptracer's SID.  Whilst this lock ensures that
           	 the check is done with the ptracer pinned, the result is only valid
           	 until the lock is released, so there's no point doing it inside the
           	 lock.
      
      (12) is_single_threaded().
      
           This function has been extracted from selinux_setprocattr() and put into
           a file of its own in the lib/ directory as join_session_keyring() now
           wants to use it too.
      
           The code in SELinux just checked to see whether a task shared mm_structs
           with other tasks (CLONE_VM), but that isn't good enough.  We really want
           to know if they're part of the same thread group (CLONE_THREAD).
      
      (13) nfsd.
      
           The NFS server daemon now has to use the COW credentials to set the
           credentials it is going to use.  It really needs to pass the credentials
           down to the functions it calls, but it can't do that until other patches
           in this series have been applied.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      d84f4f99
    • D
      KEYS: Alter use of key instantiation link-to-keyring argument · 8bbf4976
      David Howells 提交于
      Alter the use of the key instantiation and negation functions' link-to-keyring
      arguments.  Currently this specifies a keyring in the target process to link
      the key into, creating the keyring if it doesn't exist.  This, however, can be
      a problem for copy-on-write credentials as it means that the instantiating
      process can alter the credentials of the requesting process.
      
      This patch alters the behaviour such that:
      
       (1) If keyctl_instantiate_key() or keyctl_negate_key() are given a specific
           keyring by ID (ringid >= 0), then that keyring will be used.
      
       (2) If keyctl_instantiate_key() or keyctl_negate_key() are given one of the
           special constants that refer to the requesting process's keyrings
           (KEY_SPEC_*_KEYRING, all <= 0), then:
      
           (a) If sys_request_key() was given a keyring to use (destringid) then the
           	 key will be attached to that keyring.
      
           (b) If sys_request_key() was given a NULL keyring, then the key being
           	 instantiated will be attached to the default keyring as set by
           	 keyctl_set_reqkey_keyring().
      
       (3) No extra link will be made.
      
      Decision point (1) follows current behaviour, and allows those instantiators
      who've searched for a specifically named keyring in the requestor's keyring so
      as to partition the keys by type to still have their named keyrings.
      
      Decision point (2) allows the requestor to make sure that the key or keys that
      get produced by request_key() go where they want, whilst allowing the
      instantiator to request that the key is retained.  This is mainly useful for
      situations where the instantiator makes a secondary request, the key for which
      should be retained by the initial requestor:
      
      	+-----------+        +--------------+        +--------------+
      	|           |        |              |        |              |
      	| Requestor |------->| Instantiator |------->| Instantiator |
      	|           |        |              |        |              |
      	+-----------+        +--------------+        +--------------+
      	           request_key()           request_key()
      
      This might be useful, for example, in Kerberos, where the requestor requests a
      ticket, and then the ticket instantiator requests the TGT, which someone else
      then has to go and fetch.  The TGT, however, should be retained in the
      keyrings of the requestor, not the first instantiator.  To make this explict
      an extra special keyring constant is also added.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      8bbf4976
  17. 17 10月, 2008 1 次提交
  18. 16 10月, 2008 1 次提交
  19. 26 7月, 2008 1 次提交
  20. 25 7月, 2008 1 次提交
    • U
      flag parameters: NONBLOCK in pipe · be61a86d
      Ulrich Drepper 提交于
      This patch adds O_NONBLOCK support to pipe2.  It is minimally more involved
      than the patches for eventfd et.al but still trivial.  The interfaces of the
      create_write_pipe and create_read_pipe helper functions were changed and the
      one other caller as well.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fds[2];
        if (syscall (__NR_pipe2, fds, 0) == -1)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (fl & O_NONBLOCK)
              {
                printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1)
          {
            puts ("pipe2(O_NONBLOCK) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((fl & O_NONBLOCK) == 0)
              {
                printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be61a86d
  21. 22 7月, 2008 1 次提交
  22. 02 5月, 2008 1 次提交
  23. 20 4月, 2008 1 次提交
    • M
      generic: use new set_cpus_allowed_ptr function · f70316da
      Mike Travis 提交于
        * Use new set_cpus_allowed_ptr() function added by previous patch,
          which instead of passing the "newly allowed cpus" cpumask_t arg
          by value,  pass it by pointer:
      
          -int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
          +int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
      
        * Modify CPU_MASK_ALL
      
      Depends on:
      	[sched-devel]: sched: add new set_cpus_allowed_ptr function
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f70316da
  24. 15 2月, 2008 1 次提交
  25. 18 1月, 2008 1 次提交
  26. 12 9月, 2007 1 次提交
    • M
      Restore call_usermodehelper_pipe() behaviour · 3210f0ec
      Michael Ellerman 提交于
      The semantics of call_usermodehelper_pipe() used to be that it would fork
      the helper, and wait for the kernel thread to be started.  This was
      implemented by setting sub_info.wait to 0 (implicitly), and doing a
      wait_for_completion().
      
      As part of the cleanup done in 0ab4dc92,
      call_usermodehelper_pipe() was changed to pass 1 as the value for wait to
      call_usermodehelper_exec().
      
      This is equivalent to setting sub_info.wait to 1, which is a change from
      the previous behaviour.  Using 1 instead of 0 causes
      __call_usermodehelper() to start the kernel thread running
      wait_for_helper(), rather than directly calling ____call_usermodehelper().
      
      The end result is that the calling kernel code blocks until the user mode
      helper finishes.  As the helper is expecting input on stdin, and now no one
      is writing anything, everything locks up (observed in do_coredump).
      
      The fix is to change the 1 to UMH_WAIT_EXEC (aka 0), indicating that we
      want to wait for the kernel thread to be started, but not for the helper to
      finish.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Acked-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3210f0ec
  27. 27 7月, 2007 1 次提交
  28. 20 7月, 2007 2 次提交
  29. 18 7月, 2007 2 次提交
    • J
      usermodehelper: Tidy up waiting · 86313c48
      Jeremy Fitzhardinge 提交于
      Rather than using a tri-state integer for the wait flag in
      call_usermodehelper_exec, define a proper enum, and use that.  I've
      preserved the integer values so that any callers I've missed should
      still work OK.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: David Howells <dhowells@redhat.com>
      86313c48
    • J
      usermodehelper: split setup from execution · 0ab4dc92
      Jeremy Fitzhardinge 提交于
      Rather than having hundreds of variations of call_usermodehelper for
      various pieces of usermode state which could be set up, split the
      info allocation and initialization from the actual process execution.
      
      This means the general pattern becomes:
       info = call_usermodehelper_setup(path, argv, envp); /* basic state */
       call_usermodehelper_<SET EXTRA STATE>(info, stuff...);	/* extra state */
       call_usermodehelper_exec(info, wait);	/* run process and free info */
      
      This patch introduces wrappers for all the existing calling styles for
      call_usermodehelper_*, but folds their implementations into one.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Bj?rn Steinbrink <B.Steinbrink@gmx.de>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      0ab4dc92
  30. 10 5月, 2007 1 次提交