1. 14 11月, 2008 5 次提交
  2. 11 11月, 2008 2 次提交
  3. 09 11月, 2008 1 次提交
  4. 05 11月, 2008 1 次提交
  5. 01 11月, 2008 1 次提交
  6. 30 10月, 2008 1 次提交
  7. 14 10月, 2008 3 次提交
  8. 10 10月, 2008 6 次提交
  9. 29 9月, 2008 1 次提交
    • S
      selinux: use default proc sid on symlinks · ea6b184f
      Stephen Smalley 提交于
      As we are not concerned with fine-grained control over reading of
      symlinks in proc, always use the default proc SID for all proc symlinks.
      This should help avoid permission issues upon changes to the proc tree
      as in the /proc/net -> /proc/self/net example.
      This does not alter labeling of symlinks within /proc/pid directories.
      ls -Zd /proc/net output before and after the patch should show the difference.
      Signed-off-by: NStephen D. Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      ea6b184f
  10. 14 9月, 2008 1 次提交
    • F
      timers: fix itimer/many thread hang · f06febc9
      Frank Mayhar 提交于
      Overview
      
      This patch reworks the handling of POSIX CPU timers, including the
      ITIMER_PROF, ITIMER_VIRT timers and rlimit handling.  It was put together
      with the help of Roland McGrath, the owner and original writer of this code.
      
      The problem we ran into, and the reason for this rework, has to do with using
      a profiling timer in a process with a large number of threads.  It appears
      that the performance of the old implementation of run_posix_cpu_timers() was
      at least O(n*3) (where "n" is the number of threads in a process) or worse.
      Everything is fine with an increasing number of threads until the time taken
      for that routine to run becomes the same as or greater than the tick time, at
      which point things degrade rather quickly.
      
      This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF."
      
      Code Changes
      
      This rework corrects the implementation of run_posix_cpu_timers() to make it
      run in constant time for a particular machine.  (Performance may vary between
      one machine and another depending upon whether the kernel is built as single-
      or multiprocessor and, in the latter case, depending upon the number of
      running processors.)  To do this, at each tick we now update fields in
      signal_struct as well as task_struct.  The run_posix_cpu_timers() function
      uses those fields to make its decisions.
      
      We define a new structure, "task_cputime," to contain user, system and
      scheduler times and use these in appropriate places:
      
      struct task_cputime {
      	cputime_t utime;
      	cputime_t stime;
      	unsigned long long sum_exec_runtime;
      };
      
      This is included in the structure "thread_group_cputime," which is a new
      substructure of signal_struct and which varies for uniprocessor versus
      multiprocessor kernels.  For uniprocessor kernels, it uses "task_cputime" as
      a simple substructure, while for multiprocessor kernels it is a pointer:
      
      struct thread_group_cputime {
      	struct task_cputime totals;
      };
      
      struct thread_group_cputime {
      	struct task_cputime *totals;
      };
      
      We also add a new task_cputime substructure directly to signal_struct, to
      cache the earliest expiration of process-wide timers, and task_cputime also
      replaces the it_*_expires fields of task_struct (used for earliest expiration
      of thread timers).  The "thread_group_cputime" structure contains process-wide
      timers that are updated via account_user_time() and friends.  In the non-SMP
      case the structure is a simple aggregator; unfortunately in the SMP case that
      simplicity was not achievable due to cache-line contention between CPUs (in
      one measured case performance was actually _worse_ on a 16-cpu system than
      the same test on a 4-cpu system, due to this contention).  For SMP, the
      thread_group_cputime counters are maintained as a per-cpu structure allocated
      using alloc_percpu().  The timer functions update only the timer field in
      the structure corresponding to the running CPU, obtained using per_cpu_ptr().
      
      We define a set of inline functions in sched.h that we use to maintain the
      thread_group_cputime structure and hide the differences between UP and SMP
      implementations from the rest of the kernel.  The thread_group_cputime_init()
      function initializes the thread_group_cputime structure for the given task.
      The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the
      out-of-line function thread_group_cputime_alloc_smp() to allocate and fill
      in the per-cpu structures and fields.  The thread_group_cputime_free()
      function, also a no-op for UP, in SMP frees the per-cpu structures.  The
      thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls
      thread_group_cputime_alloc() if the per-cpu structures haven't yet been
      allocated.  The thread_group_cputime() function fills the task_cputime
      structure it is passed with the contents of the thread_group_cputime fields;
      in UP it's that simple but in SMP it must also safely check that tsk->signal
      is non-NULL (if it is it just uses the appropriate fields of task_struct) and,
      if so, sums the per-cpu values for each online CPU.  Finally, the three
      functions account_group_user_time(), account_group_system_time() and
      account_group_exec_runtime() are used by timer functions to update the
      respective fields of the thread_group_cputime structure.
      
      Non-SMP operation is trivial and will not be mentioned further.
      
      The per-cpu structure is always allocated when a task creates its first new
      thread, via a call to thread_group_cputime_clone_thread() from copy_signal().
      It is freed at process exit via a call to thread_group_cputime_free() from
      cleanup_signal().
      
      All functions that formerly summed utime/stime/sum_sched_runtime values from
      from all threads in the thread group now use thread_group_cputime() to
      snapshot the values in the thread_group_cputime structure or the values in
      the task structure itself if the per-cpu structure hasn't been allocated.
      
      Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit.
      The run_posix_cpu_timers() function has been split into a fast path and a
      slow path; the former safely checks whether there are any expired thread
      timers and, if not, just returns, while the slow path does the heavy lifting.
      With the dedicated thread group fields, timers are no longer "rebalanced" and
      the process_timer_rebalance() function and related code has gone away.  All
      summing loops are gone and all code that used them now uses the
      thread_group_cputime() inline.  When process-wide timers are set, the new
      task_cputime structure in signal_struct is used to cache the earliest
      expiration; this is checked in the fast path.
      
      Performance
      
      The fix appears not to add significant overhead to existing operations.  It
      generally performs the same as the current code except in two cases, one in
      which it performs slightly worse (Case 5 below) and one in which it performs
      very significantly better (Case 2 below).  Overall it's a wash except in those
      two cases.
      
      I've since done somewhat more involved testing on a dual-core Opteron system.
      
      Case 1: With no itimer running, for a test with 100,000 threads, the fixed
      	kernel took 1428.5 seconds, 513 seconds more than the unfixed system,
      	all of which was spent in the system.  There were twice as many
      	voluntary context switches with the fix as without it.
      
      Case 2: With an itimer running at .01 second ticks and 4000 threads (the most
      	an unmodified kernel can handle), the fixed kernel ran the test in
      	eight percent of the time (5.8 seconds as opposed to 70 seconds) and
      	had better tick accuracy (.012 seconds per tick as opposed to .023
      	seconds per tick).
      
      Case 3: A 4000-thread test with an initial timer tick of .01 second and an
      	interval of 10,000 seconds (i.e. a timer that ticks only once) had
      	very nearly the same performance in both cases:  6.3 seconds elapsed
      	for the fixed kernel versus 5.5 seconds for the unfixed kernel.
      
      With fewer threads (eight in these tests), the Case 1 test ran in essentially
      the same time on both the modified and unmodified kernels (5.2 seconds versus
      5.8 seconds).  The Case 2 test ran in about the same time as well, 5.9 seconds
      versus 5.4 seconds but again with much better tick accuracy, .013 seconds per
      tick versus .025 seconds per tick for the unmodified kernel.
      
      Since the fix affected the rlimit code, I also tested soft and hard CPU limits.
      
      Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer
      	running), the modified kernel was very slightly favored in that while
      	it killed the process in 19.997 seconds of CPU time (5.002 seconds of
      	wall time), only .003 seconds of that was system time, the rest was
      	user time.  The unmodified kernel killed the process in 20.001 seconds
      	of CPU (5.014 seconds of wall time) of which .016 seconds was system
      	time.  Really, though, the results were too close to call.  The results
      	were essentially the same with no itimer running.
      
      Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds
      	(where the hard limit would never be reached) and an itimer running,
      	the modified kernel exhibited worse tick accuracy than the unmodified
      	kernel: .050 seconds/tick versus .028 seconds/tick.  Otherwise,
      	performance was almost indistinguishable.  With no itimer running this
      	test exhibited virtually identical behavior and times in both cases.
      
      In times past I did some limited performance testing.  those results are below.
      
      On a four-cpu Opteron system without this fix, a sixteen-thread test executed
      in 3569.991 seconds, of which user was 3568.435s and system was 1.556s.  On
      the same system with the fix, user and elapsed time were about the same, but
      system time dropped to 0.007 seconds.  Performance with eight, four and one
      thread were comparable.  Interestingly, the timer ticks with the fix seemed
      more accurate:  The sixteen-thread test with the fix received 149543 ticks
      for 0.024 seconds per tick, while the same test without the fix received 58720
      for 0.061 seconds per tick.  Both cases were configured for an interval of
      0.01 seconds.  Again, the other tests were comparable.  Each thread in this
      test computed the primes up to 25,000,000.
      
      I also did a test with a large number of threads, 100,000 threads, which is
      impossible without the fix.  In this case each thread computed the primes only
      up to 10,000 (to make the runtime manageable).  System time dominated, at
      1546.968 seconds out of a total 2176.906 seconds (giving a user time of
      629.938s).  It received 147651 ticks for 0.015 seconds per tick, still quite
      accurate.  There is obviously no comparable test without the fix.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f06febc9
  11. 28 8月, 2008 1 次提交
    • K
      SELinux: add boundary support and thread context assignment · d9250dea
      KaiGai Kohei 提交于
      The purpose of this patch is to assign per-thread security context
      under a constraint. It enables multi-threaded server application
      to kick a request handler with its fair security context, and
      helps some of userspace object managers to handle user's request.
      
      When we assign a per-thread security context, it must not have wider
      permissions than the original one. Because a multi-threaded process
      shares a single local memory, an arbitary per-thread security context
      also means another thread can easily refer violated information.
      
      The constraint on a per-thread security context requires a new domain
      has to be equal or weaker than its original one, when it tries to assign
      a per-thread security context.
      
      Bounds relationship between two types is a way to ensure a domain can
      never have wider permission than its bounds. We can define it in two
      explicit or implicit ways.
      
      The first way is using new TYPEBOUNDS statement. It enables to define
      a boundary of types explicitly. The other one expand the concept of
      existing named based hierarchy. If we defines a type with "." separated
      name like "httpd_t.php", toolchain implicitly set its bounds on "httpd_t".
      
      This feature requires a new policy version.
      The 24th version (POLICYDB_VERSION_BOUNDARY) enables to ship them into
      kernel space, and the following patch enables to handle it.
      Signed-off-by: NKaiGai Kohei <kaigai@ak.jp.nec.com>
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      d9250dea
  12. 14 8月, 2008 1 次提交
    • D
      security: Fix setting of PF_SUPERPRIV by __capable() · 5cd9c58f
      David Howells 提交于
      Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
      the target process if that is not the current process and it is trying to
      change its own flags in a different way at the same time.
      
      __capable() is using neither atomic ops nor locking to protect t->flags.  This
      patch removes __capable() and introduces has_capability() that doesn't set
      PF_SUPERPRIV on the process being queried.
      
      This patch further splits security_ptrace() in two:
      
       (1) security_ptrace_may_access().  This passes judgement on whether one
           process may access another only (PTRACE_MODE_ATTACH for ptrace() and
           PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
           current is the parent.
      
       (2) security_ptrace_traceme().  This passes judgement on PTRACE_TRACEME only,
           and takes only a pointer to the parent process.  current is the child.
      
           In Smack and commoncap, this uses has_capability() to determine whether
           the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
           This does not set PF_SUPERPRIV.
      
      Two of the instances of __capable() actually only act on current, and so have
      been changed to calls to capable().
      
      Of the places that were using __capable():
      
       (1) The OOM killer calls __capable() thrice when weighing the killability of a
           process.  All of these now use has_capability().
      
       (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
           whether the parent was allowed to trace any process.  As mentioned above,
           these have been split.  For PTRACE_ATTACH and /proc, capable() is now
           used, and for PTRACE_TRACEME, has_capability() is used.
      
       (3) cap_safe_nice() only ever saw current, so now uses capable().
      
       (4) smack_setprocattr() rejected accesses to tasks other than current just
           after calling __capable(), so the order of these two tests have been
           switched and capable() is used instead.
      
       (5) In smack_file_send_sigiotask(), we need to allow privileged processes to
           receive SIGIO on files they're manipulating.
      
       (6) In smack_task_wait(), we let a process wait for a privileged process,
           whether or not the process doing the waiting is privileged.
      
      I've tested this with the LTP SELinux and syscalls testscripts.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      Acked-by: NAndrew G. Morgan <morgan@kernel.org>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      5cd9c58f
  13. 05 8月, 2008 2 次提交
  14. 30 7月, 2008 1 次提交
    • E
      SELinux: /proc/mounts should show what it can · 383795c2
      Eric Paris 提交于
      Given a hosed SELinux config in which a system never loads policy or
      disables SELinux we currently just return -EINVAL for anyone trying to
      read /proc/mounts.  This is a configuration problem but we can certainly
      be more graceful.  This patch just ignores -EINVAL when displaying LSM
      options and causes /proc/mounts display everything else it can.  If
      policy isn't loaded the obviously there are no options, so we aren't
      really loosing any information here.
      
      This is safe as the only other return of EINVAL comes from
      security_sid_to_context_core() in the case of an invalid sid.  Even if a
      FS was mounted with a now invalidated context that sid should have been
      remapped to unlabeled and so we won't hit the EINVAL and will work like
      we should.  (yes, I tested to make sure it worked like I thought)
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Tested-by: NMarc Dionne <marc.c.dionne@gmail.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      383795c2
  15. 27 7月, 2008 3 次提交
  16. 15 7月, 2008 1 次提交
  17. 14 7月, 2008 9 次提交
    • J
      security: remove register_security hook · 6f0f0fd4
      James Morris 提交于
      The register security hook is no longer required, as the capability
      module is always registered.  LSMs wishing to stack capability as
      a secondary module should do so explicitly.
      Signed-off-by: NJames Morris <jmorris@namei.org>
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      6f0f0fd4
    • M
      security: remove unused sb_get_mnt_opts hook · b478a9f9
      Miklos Szeredi 提交于
      The sb_get_mnt_opts() hook is unused, and is superseded by the
      sb_show_options() hook.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NJames Morris <jmorris@namei.org>
      b478a9f9
    • E
      LSM/SELinux: show LSM mount options in /proc/mounts · 2069f457
      Eric Paris 提交于
      This patch causes SELinux mount options to show up in /proc/mounts.  As
      with other code in the area seq_put errors are ignored.  Other LSM's
      will not have their mount options displayed until they fill in their own
      security_sb_show_options() function.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      2069f457
    • E
      SELinux: allow fstype unknown to policy to use xattrs if present · 811f3799
      Eric Paris 提交于
      Currently if a FS is mounted for which SELinux policy does not define an
      fs_use_* that FS will either be genfs labeled or not labeled at all.
      This decision is based on the existence of a genfscon rule in policy and
      is irrespective of the capabilities of the filesystem itself.  This
      patch allows the kernel to check if the filesystem supports security
      xattrs and if so will use those if there is no fs_use_* rule in policy.
      An fstype with a no fs_use_* rule but with a genfs rule will use xattrs
      if available and will follow the genfs rule.
      
      This can be particularly interesting for things like ecryptfs which
      actually overlays a real underlying FS.  If we define excryptfs in
      policy to use xattrs we will likely get this wrong at times, so with
      this path we just don't need to define it!
      
      Overlay ecryptfs on top of NFS with no xattr support:
      SELinux: initialized (dev ecryptfs, type ecryptfs), uses genfs_contexts
      Overlay ecryptfs on top of ext4 with xattr support:
      SELinux: initialized (dev ecryptfs, type ecryptfs), uses xattr
      
      It is also useful as the kernel adds new FS we don't need to add them in
      policy if they support xattrs and that is how we want to handle them.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      811f3799
    • J
      SELinux: use do_each_thread as a proper do/while block · 2baf06df
      James Morris 提交于
      Use do_each_thread as a proper do/while block.  Sparse complained.
      Signed-off-by: NJames Morris <jmorris@namei.org>
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      2baf06df
    • J
      SELinux: remove unused and shadowed addrlen variable · e399f982
      James Morris 提交于
      Remove unused and shadowed addrlen variable.  Picked up by sparse.
      Signed-off-by: NJames Morris <jmorris@namei.org>
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NPaul Moore <paul.moore@hp.com>
      e399f982
    • S
      selinux: simplify ioctl checking · 242631c4
      Stephen Smalley 提交于
      Simplify and improve the robustness of the SELinux ioctl checking by
      using the "access mode" bits of the ioctl command to determine the
      permission check rather than dealing with individual command values.
      This removes any knowledge of specific ioctl commands from SELinux
      and follows the same guidance we gave to Smack earlier.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      242631c4
    • S
      SELinux: enable processes with mac_admin to get the raw inode contexts · abc69bb6
      Stephen Smalley 提交于
      Enable processes with CAP_MAC_ADMIN + mac_admin permission in policy
      to get undefined contexts on inodes.  This extends the support for
      deferred mapping of security contexts in order to permit restorecon
      and similar programs to see the raw file contexts unknown to the
      system policy in order to check them.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      abc69bb6
    • S
      Security: split proc ptrace checking into read vs. attach · 006ebb40
      Stephen Smalley 提交于
      Enable security modules to distinguish reading of process state via
      proc from full ptrace access by renaming ptrace_may_attach to
      ptrace_may_access and adding a mode argument indicating whether only
      read access or full attach access is requested.  This allows security
      modules to permit access to reading process state without granting
      full ptrace access.  The base DAC/capability checking remains unchanged.
      
      Read access to /proc/pid/mem continues to apply a full ptrace attach
      check since check_mem_permission() already requires the current task
      to already be ptracing the target.  The other ptrace checks within
      proc for elements like environ, maps, and fds are changed to pass the
      read mode instead of attach.
      
      In the SELinux case, we model such reading of process state as a
      reading of a proc file labeled with the target process' label.  This
      enables SELinux policy to permit such reading of process state without
      permitting control or manipulation of the target process, as there are
      a number of cases where programs probe for such information via proc
      but do not need to be able to control the target (e.g. procps,
      lsof, PolicyKit, ConsoleKit).  At present we have to choose between
      allowing full ptrace in policy (more permissive than required/desired)
      or breaking functionality (or in some cases just silencing the denials
      via dontaudit rules but this can hide genuine attacks).
      
      This version of the patch incorporates comments from Casey Schaufler
      (change/replace existing ptrace_may_attach interface, pass access
      mode), and Chris Wright (provide greater consistency in the checking).
      
      Note that like their predecessors __ptrace_may_attach and
      ptrace_may_attach, the __ptrace_may_access and ptrace_may_access
      interfaces use different return value conventions from each other (0
      or -errno vs. 1 or 0).  I retained this difference to avoid any
      changes to the caller logic but made the difference clearer by
      changing the latter interface to return a bool rather than an int and
      by adding a comment about it to ptrace.h for any future callers.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      006ebb40