- 14 11月, 2008 9 次提交
-
-
由 David Howells 提交于
Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into *bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: (*) security_bprm_alloc(), ->bprm_alloc_security() (*) security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. (*) security_bprm_apply_creds(), ->bprm_apply_creds() (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). (*) security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). (*) security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. (*) security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Pass credentials through dentry_open() so that the COW creds patch can have SELinux's flush_unauthorized_files() pass the appropriate creds back to itself when it opens its null chardev. The security_dentry_open() call also now takes a creds pointer, as does the dentry_open hook in struct security_operations. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Make inode_has_perm() and file_has_perm() take a cred pointer rather than a task pointer. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Wrap access to SELinux's task SID, using task_sid() and current_sid() as appropriate. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Detach the credentials from task_struct, duplicating them in copy_process() and releasing them in __put_task_struct(). Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Constify the kernel_cap_t arguments to the capset LSM hooks. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Take away the ability for sys_capset() to affect processes other than current. This means that current will not need to lock its own credentials when reading them against interference by other processes. This has effectively been the case for a while anyway, since: (1) Without LSM enabled, sys_capset() is disallowed. (2) With file-based capabilities, sys_capset() is neutered. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NAndrew G. Morgan <morgan@kernel.org> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 11 11月, 2008 2 次提交
-
-
由 Eric Paris 提交于
check when determining if a process has additional powers to override memory limits or when trying to read/write illegal file labels. Use the new noaudit call instead. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Eric Paris 提交于
make an A or B type decision instead of a security decision. Currently this is the case at least for filesystems when deciding if a process can use the reserved 'root' blocks and for the case of things like the oom algorithm determining if processes are root processes and should be less likely to be killed. These types of security system requests should not be audited or logged since they are not really security decisions. It would be possible to solve this problem like the vm_enough_memory security check did by creating a new LSM interface and moving all of the policy into that interface but proves the needlessly bloat the LSM and provide complex indirection. This merely allows those decisions to be made where they belong and to not flood logs or printk with denials for thing that are not security decisions. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 09 11月, 2008 1 次提交
-
-
由 Eric Paris 提交于
Currently when SELinux has not been updated to handle a netlink message type the operation is denied with EINVAL. This patch will leave the audit/warning message so things get fixed but if policy chose to allow unknowns this will allow the netlink operation. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 05 11月, 2008 1 次提交
-
-
由 Eric Paris 提交于
SELinux has long been calling wake_up_interruptible() on current->parent->signal->wait_chldexit without holding any locks. It appears that this operation should hold the tasklist_lock to dereference current->parent and we should hold the siglock when waking up the signal->wait_chldexit. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 01 11月, 2008 1 次提交
-
-
由 Eric Paris 提交于
SELinux has wrongly (since 2004) had an incorrect test for an empty tty->tty_files list. With an empty list selinux would be pointing to part of the tty struct itself and would then proceed to dereference that value and again dereference that result. An F10 change to plymouth on a ppc64 system is actually currently triggering this bug. This patch uses list_empty() to handle empty lists rather than looking at a meaningless location. [note, this fixes the oops reported in https://bugzilla.redhat.com/show_bug.cgi?id=469079] Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 30 10月, 2008 1 次提交
-
-
由 Eric Paris 提交于
Some operations, like searching a directory path or connecting a unix domain socket, make explicit calls into inode_permission. Our choices are to either try to come up with a signature for all of the explicit calls to inode_permission and do not check open on those, or to move the open checks to dentry_open where we know this is always an open operation. This patch moves the checks to dentry_open. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 14 10月, 2008 3 次提交
-
-
由 Steven Whitehouse 提交于
This is a much better version of a previous patch to make the parser tables constant. Rather than changing the typedef, we put the "const" in all the various places where its required, allowing the __initconst exception for nfsroot which was the cause of the previous trouble. This was posted for review some time ago and I believe its been in -mm since then. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Alexander Viro <aviro@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alan Cox 提交于
Currently it is sometimes locked by the tty mutex and sometimes by the sighand lock. The latter is in fact correct and now we can hand back referenced objects we can fix this up without problems around sleeping functions. Signed-off-by: NAlan Cox <alan@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alan Cox 提交于
We now return a kref covered tty reference. That ensures the tty structure doesn't go away when you have a return from get_current_tty. This is not enough to protect you from most of the resources being freed behind your back - yet. [Updated to include fixes for SELinux problems found by Andrew Morton and an s390 leak found while debugging the former] Signed-off-by: NAlan Cox <alan@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 10月, 2008 6 次提交
-
-
由 Paul Moore 提交于
Previous work enabled the use of address based NetLabel selectors, which while highly useful, brought the potential for additional per-packet overhead when used. This patch attempts to mitigate some of that overhead by caching the NetLabel security attribute struct within the SELinux socket security structure. This should help eliminate the need to recreate the NetLabel secattr structure for each packet resulting in less overhead. Signed-off-by: NPaul Moore <paul.moore@hp.com> Acked-by: NJames Morris <jmorris@namei.org>
-
由 Paul Moore 提交于
Previous work enabled the use of address based NetLabel selectors, which while highly useful, brought the potential for additional per-packet overhead when used. This patch attempts to solve that by applying NetLabel socket labels when sockets are connect()'d. This should alleviate the per-packet NetLabel labeling for all connected sockets (yes, it even works for connected DGRAM sockets). Signed-off-by: NPaul Moore <paul.moore@hp.com> Reviewed-by: NJames Morris <jmorris@namei.org>
-
由 Paul Moore 提交于
This patch builds upon the new NetLabel address selector functionality by providing the NetLabel KAPI and CIPSO engine support needed to enable the new packet-based labeling. The only new addition to the NetLabel KAPI at this point is shown below: * int netlbl_skbuff_setattr(skb, family, secattr) ... and is designed to be called from a Netfilter hook after the packet's IP header has been populated such as in the FORWARD or LOCAL_OUT hooks. This patch also provides the necessary SELinux hooks to support this new functionality. Smack support is not currently included due to uncertainty regarding the permissions needed to expand the Smack network access controls. Signed-off-by: NPaul Moore <paul.moore@hp.com> Reviewed-by: NJames Morris <jmorris@namei.org>
-
由 Paul Moore 提交于
At some point I think I messed up and dropped the calls to netlbl_skbuff_err() which are necessary for CIPSO to send error notifications to remote systems. This patch re-introduces the error handling calls into the SELinux code. Signed-off-by: NPaul Moore <paul.moore@hp.com> Acked-by: NJames Morris <jmorris@namei.org>
-
由 Paul Moore 提交于
It turns out that checking to see if skb->sk is NULL is not a very good indicator of a forwarded packet as some locally generated packets also have skb->sk set to NULL. Fix this by not only checking the skb->sk field but also the IP[6]CB(skb)->flags field for the IP[6]SKB_FORWARDED flag. While we are at it, we are calling selinux_parse_skb() much earlier than we really should resulting in potentially wasted cycles parsing packets for information we might no use; so shuffle the code around a bit to fix this. Signed-off-by: NPaul Moore <paul.moore@hp.com> Acked-by: NJames Morris <jmorris@namei.org>
-
由 Paul Moore 提交于
We did the right thing in a few cases but there were several areas where we determined a packet's address family based on the socket's address family which is not the right thing to do since we can get IPv4 packets on IPv6 sockets. This patch fixes these problems by either taking the address family directly from the packet. Signed-off-by: NPaul Moore <paul.moore@hp.com> Acked-by: NJames Morris <jmorris@namei.org>
-
- 29 9月, 2008 1 次提交
-
-
由 Stephen Smalley 提交于
As we are not concerned with fine-grained control over reading of symlinks in proc, always use the default proc SID for all proc symlinks. This should help avoid permission issues upon changes to the proc tree as in the /proc/net -> /proc/self/net example. This does not alter labeling of symlinks within /proc/pid directories. ls -Zd /proc/net output before and after the patch should show the difference. Signed-off-by: NStephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 14 9月, 2008 1 次提交
-
-
由 Frank Mayhar 提交于
Overview This patch reworks the handling of POSIX CPU timers, including the ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together with the help of Roland McGrath, the owner and original writer of this code. The problem we ran into, and the reason for this rework, has to do with using a profiling timer in a process with a large number of threads. It appears that the performance of the old implementation of run_posix_cpu_timers() was at least O(n*3) (where "n" is the number of threads in a process) or worse. Everything is fine with an increasing number of threads until the time taken for that routine to run becomes the same as or greater than the tick time, at which point things degrade rather quickly. This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF." Code Changes This rework corrects the implementation of run_posix_cpu_timers() to make it run in constant time for a particular machine. (Performance may vary between one machine and another depending upon whether the kernel is built as single- or multiprocessor and, in the latter case, depending upon the number of running processors.) To do this, at each tick we now update fields in signal_struct as well as task_struct. The run_posix_cpu_timers() function uses those fields to make its decisions. We define a new structure, "task_cputime," to contain user, system and scheduler times and use these in appropriate places: struct task_cputime { cputime_t utime; cputime_t stime; unsigned long long sum_exec_runtime; }; This is included in the structure "thread_group_cputime," which is a new substructure of signal_struct and which varies for uniprocessor versus multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as a simple substructure, while for multiprocessor kernels it is a pointer: struct thread_group_cputime { struct task_cputime totals; }; struct thread_group_cputime { struct task_cputime *totals; }; We also add a new task_cputime substructure directly to signal_struct, to cache the earliest expiration of process-wide timers, and task_cputime also replaces the it_*_expires fields of task_struct (used for earliest expiration of thread timers). The "thread_group_cputime" structure contains process-wide timers that are updated via account_user_time() and friends. In the non-SMP case the structure is a simple aggregator; unfortunately in the SMP case that simplicity was not achievable due to cache-line contention between CPUs (in one measured case performance was actually _worse_ on a 16-cpu system than the same test on a 4-cpu system, due to this contention). For SMP, the thread_group_cputime counters are maintained as a per-cpu structure allocated using alloc_percpu(). The timer functions update only the timer field in the structure corresponding to the running CPU, obtained using per_cpu_ptr(). We define a set of inline functions in sched.h that we use to maintain the thread_group_cputime structure and hide the differences between UP and SMP implementations from the rest of the kernel. The thread_group_cputime_init() function initializes the thread_group_cputime structure for the given task. The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the out-of-line function thread_group_cputime_alloc_smp() to allocate and fill in the per-cpu structures and fields. The thread_group_cputime_free() function, also a no-op for UP, in SMP frees the per-cpu structures. The thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls thread_group_cputime_alloc() if the per-cpu structures haven't yet been allocated. The thread_group_cputime() function fills the task_cputime structure it is passed with the contents of the thread_group_cputime fields; in UP it's that simple but in SMP it must also safely check that tsk->signal is non-NULL (if it is it just uses the appropriate fields of task_struct) and, if so, sums the per-cpu values for each online CPU. Finally, the three functions account_group_user_time(), account_group_system_time() and account_group_exec_runtime() are used by timer functions to update the respective fields of the thread_group_cputime structure. Non-SMP operation is trivial and will not be mentioned further. The per-cpu structure is always allocated when a task creates its first new thread, via a call to thread_group_cputime_clone_thread() from copy_signal(). It is freed at process exit via a call to thread_group_cputime_free() from cleanup_signal(). All functions that formerly summed utime/stime/sum_sched_runtime values from from all threads in the thread group now use thread_group_cputime() to snapshot the values in the thread_group_cputime structure or the values in the task structure itself if the per-cpu structure hasn't been allocated. Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit. The run_posix_cpu_timers() function has been split into a fast path and a slow path; the former safely checks whether there are any expired thread timers and, if not, just returns, while the slow path does the heavy lifting. With the dedicated thread group fields, timers are no longer "rebalanced" and the process_timer_rebalance() function and related code has gone away. All summing loops are gone and all code that used them now uses the thread_group_cputime() inline. When process-wide timers are set, the new task_cputime structure in signal_struct is used to cache the earliest expiration; this is checked in the fast path. Performance The fix appears not to add significant overhead to existing operations. It generally performs the same as the current code except in two cases, one in which it performs slightly worse (Case 5 below) and one in which it performs very significantly better (Case 2 below). Overall it's a wash except in those two cases. I've since done somewhat more involved testing on a dual-core Opteron system. Case 1: With no itimer running, for a test with 100,000 threads, the fixed kernel took 1428.5 seconds, 513 seconds more than the unfixed system, all of which was spent in the system. There were twice as many voluntary context switches with the fix as without it. Case 2: With an itimer running at .01 second ticks and 4000 threads (the most an unmodified kernel can handle), the fixed kernel ran the test in eight percent of the time (5.8 seconds as opposed to 70 seconds) and had better tick accuracy (.012 seconds per tick as opposed to .023 seconds per tick). Case 3: A 4000-thread test with an initial timer tick of .01 second and an interval of 10,000 seconds (i.e. a timer that ticks only once) had very nearly the same performance in both cases: 6.3 seconds elapsed for the fixed kernel versus 5.5 seconds for the unfixed kernel. With fewer threads (eight in these tests), the Case 1 test ran in essentially the same time on both the modified and unmodified kernels (5.2 seconds versus 5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds versus 5.4 seconds but again with much better tick accuracy, .013 seconds per tick versus .025 seconds per tick for the unmodified kernel. Since the fix affected the rlimit code, I also tested soft and hard CPU limits. Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer running), the modified kernel was very slightly favored in that while it killed the process in 19.997 seconds of CPU time (5.002 seconds of wall time), only .003 seconds of that was system time, the rest was user time. The unmodified kernel killed the process in 20.001 seconds of CPU (5.014 seconds of wall time) of which .016 seconds was system time. Really, though, the results were too close to call. The results were essentially the same with no itimer running. Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds (where the hard limit would never be reached) and an itimer running, the modified kernel exhibited worse tick accuracy than the unmodified kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise, performance was almost indistinguishable. With no itimer running this test exhibited virtually identical behavior and times in both cases. In times past I did some limited performance testing. those results are below. On a four-cpu Opteron system without this fix, a sixteen-thread test executed in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On the same system with the fix, user and elapsed time were about the same, but system time dropped to 0.007 seconds. Performance with eight, four and one thread were comparable. Interestingly, the timer ticks with the fix seemed more accurate: The sixteen-thread test with the fix received 149543 ticks for 0.024 seconds per tick, while the same test without the fix received 58720 for 0.061 seconds per tick. Both cases were configured for an interval of 0.01 seconds. Again, the other tests were comparable. Each thread in this test computed the primes up to 25,000,000. I also did a test with a large number of threads, 100,000 threads, which is impossible without the fix. In this case each thread computed the primes only up to 10,000 (to make the runtime manageable). System time dominated, at 1546.968 seconds out of a total 2176.906 seconds (giving a user time of 629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite accurate. There is obviously no comparable test without the fix. Signed-off-by: NFrank Mayhar <fmayhar@google.com> Cc: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 28 8月, 2008 1 次提交
-
-
由 KaiGai Kohei 提交于
The purpose of this patch is to assign per-thread security context under a constraint. It enables multi-threaded server application to kick a request handler with its fair security context, and helps some of userspace object managers to handle user's request. When we assign a per-thread security context, it must not have wider permissions than the original one. Because a multi-threaded process shares a single local memory, an arbitary per-thread security context also means another thread can easily refer violated information. The constraint on a per-thread security context requires a new domain has to be equal or weaker than its original one, when it tries to assign a per-thread security context. Bounds relationship between two types is a way to ensure a domain can never have wider permission than its bounds. We can define it in two explicit or implicit ways. The first way is using new TYPEBOUNDS statement. It enables to define a boundary of types explicitly. The other one expand the concept of existing named based hierarchy. If we defines a type with "." separated name like "httpd_t.php", toolchain implicitly set its bounds on "httpd_t". This feature requires a new policy version. The 24th version (POLICYDB_VERSION_BOUNDARY) enables to ship them into kernel space, and the following patch enables to handle it. Signed-off-by: NKaiGai Kohei <kaigai@ak.jp.nec.com> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 14 8月, 2008 1 次提交
-
-
由 David Howells 提交于
Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags the target process if that is not the current process and it is trying to change its own flags in a different way at the same time. __capable() is using neither atomic ops nor locking to protect t->flags. This patch removes __capable() and introduces has_capability() that doesn't set PF_SUPERPRIV on the process being queried. This patch further splits security_ptrace() in two: (1) security_ptrace_may_access(). This passes judgement on whether one process may access another only (PTRACE_MODE_ATTACH for ptrace() and PTRACE_MODE_READ for /proc), and takes a pointer to the child process. current is the parent. (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only, and takes only a pointer to the parent process. current is the child. In Smack and commoncap, this uses has_capability() to determine whether the parent will be permitted to use PTRACE_ATTACH if normal checks fail. This does not set PF_SUPERPRIV. Two of the instances of __capable() actually only act on current, and so have been changed to calls to capable(). Of the places that were using __capable(): (1) The OOM killer calls __capable() thrice when weighing the killability of a process. All of these now use has_capability(). (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see whether the parent was allowed to trace any process. As mentioned above, these have been split. For PTRACE_ATTACH and /proc, capable() is now used, and for PTRACE_TRACEME, has_capability() is used. (3) cap_safe_nice() only ever saw current, so now uses capable(). (4) smack_setprocattr() rejected accesses to tasks other than current just after calling __capable(), so the order of these two tests have been switched and capable() is used instead. (5) In smack_file_send_sigiotask(), we need to allow privileged processes to receive SIGIO on files they're manipulating. (6) In smack_task_wait(), we let a process wait for a privileged process, whether or not the process doing the waiting is privileged. I've tested this with the LTP SELinux and syscalls testscripts. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NCasey Schaufler <casey@schaufler-ca.com> Acked-by: NAndrew G. Morgan <morgan@kernel.org> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 05 8月, 2008 2 次提交
-
-
由 David Howells 提交于
Fix a potentially uninitialised variable in SELinux hooks that's given a pointer to the network address by selinux_parse_skb() passing a pointer back through its argument list. By restructuring selinux_parse_skb(), the compiler can see that the error case need not set it as the caller will return immediately. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Adrian Bunk 提交于
This patch makes the needlessly global selinux_write_opts() static. Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 30 7月, 2008 1 次提交
-
-
由 Eric Paris 提交于
Given a hosed SELinux config in which a system never loads policy or disables SELinux we currently just return -EINVAL for anyone trying to read /proc/mounts. This is a configuration problem but we can certainly be more graceful. This patch just ignores -EINVAL when displaying LSM options and causes /proc/mounts display everything else it can. If policy isn't loaded the obviously there are no options, so we aren't really loosing any information here. This is safe as the only other return of EINVAL comes from security_sid_to_context_core() in the case of an invalid sid. Even if a FS was mounted with a now invalidated context that sid should have been remapped to unlabeled and so we won't hit the EINVAL and will work like we should. (yes, I tested to make sure it worked like I thought) Signed-off-by: NEric Paris <eparis@redhat.com> Tested-by: NMarc Dionne <marc.c.dionne@gmail.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 27 7月, 2008 3 次提交
-
-
由 Al Viro 提交于
... and get rid of the last "let's deduce mask from nameidata->flags" bit. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roland McGrath 提交于
This adds the tracehook_tracer_task() hook to consolidate all forms of "Who is using ptrace on me?" logic. This is used for "TracerPid:" in /proc and for permission checks. We also clean up the selinux code the called an identical accessor. Signed-off-by: NRoland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Reviewed-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 7月, 2008 1 次提交
-
-
由 James Morris 提交于
This reverts commit 811f3799. From Eric Paris: "Please drop this patch for now. It deadlocks on ntfs-3g. I need to rework it to handle fuse filesystems better. (casey was right)"
-
- 14 7月, 2008 5 次提交
-
-
由 James Morris 提交于
The register security hook is no longer required, as the capability module is always registered. LSMs wishing to stack capability as a secondary module should do so explicitly. Signed-off-by: NJames Morris <jmorris@namei.org> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Miklos Szeredi 提交于
The sb_get_mnt_opts() hook is unused, and is superseded by the sb_show_options() hook. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Acked-by: NJames Morris <jmorris@namei.org>
-
由 Eric Paris 提交于
This patch causes SELinux mount options to show up in /proc/mounts. As with other code in the area seq_put errors are ignored. Other LSM's will not have their mount options displayed until they fill in their own security_sb_show_options() function. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Eric Paris 提交于
Currently if a FS is mounted for which SELinux policy does not define an fs_use_* that FS will either be genfs labeled or not labeled at all. This decision is based on the existence of a genfscon rule in policy and is irrespective of the capabilities of the filesystem itself. This patch allows the kernel to check if the filesystem supports security xattrs and if so will use those if there is no fs_use_* rule in policy. An fstype with a no fs_use_* rule but with a genfs rule will use xattrs if available and will follow the genfs rule. This can be particularly interesting for things like ecryptfs which actually overlays a real underlying FS. If we define excryptfs in policy to use xattrs we will likely get this wrong at times, so with this path we just don't need to define it! Overlay ecryptfs on top of NFS with no xattr support: SELinux: initialized (dev ecryptfs, type ecryptfs), uses genfs_contexts Overlay ecryptfs on top of ext4 with xattr support: SELinux: initialized (dev ecryptfs, type ecryptfs), uses xattr It is also useful as the kernel adds new FS we don't need to add them in policy if they support xattrs and that is how we want to handle them. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 James Morris 提交于
Use do_each_thread as a proper do/while block. Sparse complained. Signed-off-by: NJames Morris <jmorris@namei.org> Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
-