- 18 11月, 2008 1 次提交
-
-
由 Kumar Gala 提交于
For some unknown reason at Steven Rostedt added in disabling of the SPE instruction generation for e500 based PPC cores in commit 6ec56232. We are removing it because: 1. It generates e500 kernels that don't work 2. its not the correct set of flags to do this 3. we handle this in the arch/powerpc/Makefile already 4. its unknown in talking to Steven why he did this Signed-off-by: NKumar Gala <galak@kernel.crashing.org> Tested-and-Acked-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 11月, 2008 1 次提交
-
-
由 Rusty Russell 提交于
Bug #11989: Suspend failure on NForce4-based boards due to chanes in stop_machine We should not access active.fnret outside the lock; in theory the next stop_machine could overwrite it. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Tested-by: N"Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 11月, 2008 2 次提交
-
-
由 Al Viro 提交于
Inotify watch removals suck violently. To kick the watch out we need (in this order) inode->inotify_mutex and ih->mutex. That's fine if we have a hold on inode; however, for all other cases we need to make damn sure we don't race with umount. We can *NOT* just grab a reference to a watch - inotify_unmount_inodes() will happily sail past it and we'll end with reference to inode potentially outliving its superblock. Ideally we just want to grab an active reference to superblock if we can; that will make sure we won't go into inotify_umount_inodes() until we are done. Cleanup is just deactivate_super(). However, that leaves a messy case - what if we *are* racing with umount() and active references to superblock can't be acquired anymore? We can bump ->s_count, grab ->s_umount, which will almost certainly wait until the superblock is shut down and the watch in question is pining for fjords. That's fine, but there is a problem - we might have hit the window between ->s_active getting to 0 / ->s_count - below S_BIAS (i.e. the moment when superblock is past the point of no return and is heading for shutdown) and the moment when deactivate_super() acquires ->s_umount. We could just do drop_super() yield() and retry, but that's rather antisocial and this stuff is luser-triggerable. OTOH, having grabbed ->s_umount and having found that we'd got there first (i.e. that ->s_root is non-NULL) we know that we won't race with inotify_umount_inodes(). So we could grab a reference to watch and do the rest as above, just with drop_super() instead of deactivate_super(), right? Wrong. We had to drop ih->mutex before we could grab ->s_umount. So the watch could've been gone already. That still can be dealt with - we need to save watch->wd, do idr_find() and compare its result with our pointer. If they match, we either have the damn thing still alive or we'd lost not one but two races at once, the watch had been killed and a new one got created with the same ->wd at the same address. That couldn't have happened in inotify_destroy(), but inotify_rm_wd() could run into that. Still, "new one got created" is not a problem - we have every right to kill it or leave it alone, whatever's more convenient. So we can use idr_find(...) == watch && watch->inode->i_sb == sb as "grab it and kill it" check. If it's been our original watch, we are fine, if it's a newcomer - nevermind, just pretend that we'd won the race and kill the fscker anyway; we are safe since we know that its superblock won't be going away. And yes, this is far beyond mere "not very pretty"; so's the entire concept of inotify to start with. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Acked-by: NGreg KH <greg@kroah.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
We don't want to get rid of the futexes just at exit() time, we want to drop them when doing an execve() too, since that gets rid of the previous VM image too. Doing it at mm_release() time means that we automatically always do it when we disassociate a VM map from the task. Reported-by: pageexec@freemail.hu Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Brad Spengler <spender@grsecurity.net> Cc: Alex Efros <powerman@powerman.name> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 11月, 2008 14 次提交
-
-
由 David Howells 提交于
Allow kernel services to override LSM settings appropriate to the actions performed by a task by duplicating a set of credentials, modifying it and then using task_struct::cred to point to it when performing operations on behalf of a task. This is used, for example, by CacheFiles which has to transparently access the cache on behalf of a process that thinks it is doing, say, NFS accesses with a potentially inappropriate (with respect to accessing the cache) set of credentials. This patch provides two LSM hooks for modifying a task security record: (*) security_kernel_act_as() which allows modification of the security datum with which a task acts on other objects (most notably files). (*) security_kernel_create_files_as() which allows modification of the security datum that is used to initialise the security data on a file that a task creates. The patch also provides four new credentials handling functions, which wrap the LSM functions: (1) prepare_kernel_cred() Prepare a set of credentials for a kernel service to use, based either on a daemon's credentials or on init_cred. All the keyrings are cleared. (2) set_security_override() Set the LSM security ID in a set of credentials to a specific security context, assuming permission from the LSM policy. (3) set_security_override_from_ctx() As (2), but takes the security context as a string. (4) set_create_files_as() Set the file creation LSM security ID in a set of credentials to be the same as that on a particular inode. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [Smack changes] Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Differentiate the objective and real subjective credentials from the effective subjective credentials on a task by introducing a second credentials pointer into the task_struct. task_struct::real_cred then refers to the objective and apparent real subjective credentials of a task, as perceived by the other tasks in the system. task_struct::cred then refers to the effective subjective credentials of a task, as used by that task when it's actually running. These are not visible to the other tasks in the system. __task_cred(task) then refers to the objective/real credentials of the task in question. current_cred() refers to the effective subjective credentials of the current task. prepare_creds() uses the objective creds as a base and commit_creds() changes both pointers in the task_struct (indeed commit_creds() requires them to be the same). override_creds() and revert_creds() change the subjective creds pointer only, and the former returns the old subjective creds. These are used by NFSD, faccessat() and do_coredump(), and will by used by CacheFiles. In SELinux, current_has_perm() is provided as an alternative to task_has_perm(). This uses the effective subjective context of current, whereas task_has_perm() uses the objective/real context of the subject. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Document credentials and the new credentials API. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into *bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: (*) security_bprm_alloc(), ->bprm_alloc_security() (*) security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. (*) security_bprm_apply_creds(), ->bprm_apply_creds() (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). (*) security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). (*) security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. (*) security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Rename is_single_threaded() to is_wq_single_threaded() so that a new is_single_threaded() can be created that refers to tasks rather than waitqueues. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Separate per-task-group keyrings from signal_struct and dangle their anchor from the cred struct rather than the signal_struct. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Use RCU to access another task's creds and to release a task's own creds. This means that it will be possible for the credentials of a task to be replaced without another task (a) requiring a full lock to read them, and (b) seeing deallocated memory. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Detach the credentials from task_struct, duplicating them in copy_process() and releasing them in __put_task_struct(). Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Take away the ability for sys_capset() to affect processes other than current. This means that current will not need to lock its own credentials when reading them against interference by other processes. This has effectively been the case for a while anyway, since: (1) Without LSM enabled, sys_capset() is disallowed. (2) With file-based capabilities, sys_capset() is neutered. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NAndrew G. Morgan <morgan@kernel.org> Acked-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Alter the use of the key instantiation and negation functions' link-to-keyring arguments. Currently this specifies a keyring in the target process to link the key into, creating the keyring if it doesn't exist. This, however, can be a problem for copy-on-write credentials as it means that the instantiating process can alter the credentials of the requesting process. This patch alters the behaviour such that: (1) If keyctl_instantiate_key() or keyctl_negate_key() are given a specific keyring by ID (ringid >= 0), then that keyring will be used. (2) If keyctl_instantiate_key() or keyctl_negate_key() are given one of the special constants that refer to the requesting process's keyrings (KEY_SPEC_*_KEYRING, all <= 0), then: (a) If sys_request_key() was given a keyring to use (destringid) then the key will be attached to that keyring. (b) If sys_request_key() was given a NULL keyring, then the key being instantiated will be attached to the default keyring as set by keyctl_set_reqkey_keyring(). (3) No extra link will be made. Decision point (1) follows current behaviour, and allows those instantiators who've searched for a specifically named keyring in the requestor's keyring so as to partition the keys by type to still have their named keyrings. Decision point (2) allows the requestor to make sure that the key or keys that get produced by request_key() go where they want, whilst allowing the instantiator to request that the key is retained. This is mainly useful for situations where the instantiator makes a secondary request, the key for which should be retained by the initial requestor: +-----------+ +--------------+ +--------------+ | | | | | | | Requestor |------->| Instantiator |------->| Instantiator | | | | | | | +-----------+ +--------------+ +--------------+ request_key() request_key() This might be useful, for example, in Kerberos, where the requestor requests a ticket, and then the ticket instantiator requests the TGT, which someone else then has to go and fetch. The TGT, however, should be retained in the keyrings of the requestor, not the first instantiator. To make this explict an extra special keyring constant is also added. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NJames Morris <jmorris@namei.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-audit@redhat.com Cc: containers@lists.linux-foundation.org Cc: linux-mm@kvack.org Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 13 11月, 2008 5 次提交
-
-
由 Andrew Morton 提交于
We only need the cacheline padding on SMP kernels. Saves 6k: text data bss dec hex filename 5713 388 8840 14941 3a5d kernel/kprobes.o 5713 388 2632 8733 221d kernel/kprobes.o Acked-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masami Hiramatsu 提交于
__register_kprobe() can be preempted after checking probing address but before module_text_address() or try_module_get(), and in this interval the module can be unloaded. In that case, try_module_get(probed_mod) will access to invalid address, or kprobe will probe invalid address. This patch uses preempt_disable() to protect it and uses __module_text_address() and __kernel_text_address(). Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Zefan 提交于
With this change, control file 'freezer.state' doesn't exist in root cgroup, making root cgroup unfreezable. I think it's reasonable to disallow freeze tasks in the root cgroup. And then we can avoid fork overhead when freezer subsystem is compiled but not used. Also make writing invalid value to freezer.state returns EINVAL rather than EIO. This is more consistent with other cgroup subsystem. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Acked-by: NPaul Menage <menage@google.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Zefan 提交于
In theory the task can be moved to another cgroup and the freezer will be freed right after task_lock is dropped, so the lock results in zero protection. But in the case of freezer_fork() no lock is needed, since the task is not in tasklist yet so it won't be moved to another cgroup, so task->cgroups won't be changed or invalidated. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ingo Molnar 提交于
Maciej Rutecki reported: > I have this bug during suspend to disk: > > [ 188.592151] Enabling non-boot CPUs ... > [ 188.592151] SMP alternatives: switching to SMP code > [ 188.666058] BUG: using smp_processor_id() in preemptible > [00000000] > code: suspend_to_disk/2934 > [ 188.666064] caller is native_sched_clock+0x2b/0x80 Which, as noted by Linus, was caused by me, via: 7cbaef9c "sched: optimize sched_clock() a bit" Move the rq locking a bit earlier in the initialization sequence, that will make the sched_clock() call in init_idle() non-preemptible. Reported-by: NMaciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 11月, 2008 2 次提交
-
-
由 Balbir Singh 提交于
Impact: fix load balancer load average calculation accuracy cpu_avg_load_per_task() returns a stale value when nr_running is 0. It returns an older stale (caculated when nr_running was non zero) value. This patch returns and sets rq->avg_load_per_task to zero when nr_running is 0. Compile and boot tested on a x86_64 box. Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Impact: cleanup git grep HRTIMER_CB_IRQSAFE revealed half the callback modes are actually unused. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 11月, 2008 11 次提交
-
-
由 Eric Paris 提交于
If an invalid (large) capability is requested the capabilities system may panic as it is dereferencing an array of fixed (short) length. Its possible (and actually often happens) that the capability system accidentally stumbled into a valid memory region but it also regularly happens that it hits invalid memory and BUGs. If such an operation does get past cap_capable then the selinux system is sure to have problems as it already does a (simple) validity check and BUG. This is known to happen by the broken and buggy firegl driver. This patch cleanly checks all capable calls and BUG if a call is for an invalid capability. This will likely break the firegl driver for some situations, but it is the right thing to do. Garbage into a security system gets you killed/bugged Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NArjan van de Ven <arjan@linux.intel.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NAndrew G. Morgan <morgan@kernel.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Peter Zijlstra 提交于
Clear buddies on yield, so that the buddy rules don't schedule them despite them being placed right-most. This fixed a performance regression with yield-happy binary JVMs. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu> Tested-by: NLin Ming <ming.m.lin@intel.com>
-
由 Eric Paris 提交于
actual capbilities being added/removed. This patch adds a new record type which emits the target pid and the eff, inh, and perm cap sets. example output if you audit capset syscalls would be: type=SYSCALL msg=audit(1225743140.465:76): arch=c000003e syscall=126 success=yes exit=0 a0=17f2014 a1=17f201c a2=80000000 a3=7fff2ab7f060 items=0 ppid=2160 pid=2223 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=1 comm="setcap" exe="/usr/sbin/setcap" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1322] msg=audit(1225743140.465:76): pid=0 cap_pi=ffffffffffffffff cap_pp=ffffffffffffffff cap_pe=ffffffffffffffff Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Eric Paris 提交于
non-zero pE we will crate a new audit record which contains the entire set of known information about the executable in question, fP, fI, fE, fversion and includes the process's pE, pI, pP. Before and after the bprm capability are applied. This record type will only be emitted from execve syscalls. an example of making ping use fcaps instead of setuid: setcap "cat_net_raw+pe" /bin/ping type=SYSCALL msg=audit(1225742021.015:236): arch=c000003e syscall=59 success=yes exit=0 a0=1457f30 a1=14606b0 a2=1463940 a3=321b770a70 items=2 ppid=2929 pid=2963 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1321] msg=audit(1225742021.015:236): fver=2 fp=0000000000002000 fi=0000000000000000 fe=1 old_pp=0000000000000000 old_pi=0000000000000000 old_pe=0000000000000000 new_pp=0000000000002000 new_pi=0000000000000000 new_pe=0000000000002000 type=EXECVE msg=audit(1225742021.015:236): argc=2 a0="ping" a1="127.0.0.1" type=CWD msg=audit(1225742021.015:236): cwd="/home/test" type=PATH msg=audit(1225742021.015:236): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fe=1 cap_fver=2 type=PATH msg=audit(1225742021.015:236): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0 Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Eric Paris 提交于
records of any file that has file capabilities set. Files which do not have fcaps set will not have different PATH records. An example audit record if you run: setcap "cap_net_admin+pie" /bin/bash /bin/bash type=SYSCALL msg=audit(1225741937.363:230): arch=c000003e syscall=59 success=yes exit=0 a0=2119230 a1=210da30 a2=20ee290 a3=8 items=2 ppid=2149 pid=2923 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=EXECVE msg=audit(1225741937.363:230): argc=2 a0="ping" a1="www.google.com" type=CWD msg=audit(1225741937.363:230): cwd="/root" type=PATH msg=audit(1225741937.363:230): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0104755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fi=0000000000002000 cap_fe=1 cap_fver=2 type=PATH msg=audit(1225741937.363:230): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0 Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Gautham R Shenoy 提交于
Impact: fix incorrect locking triggered during hotplug-intense stress-tests While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline, we queue them on the cb_pending list, so that they won't go stale. Thus, when the callbacks of the timers run from the softirq context, they could run into potential deadlocks, since these callbacks assume that they're running with irq's disabled, thereby annoying lockdep! Fix this by emulating hardirq context while running these callbacks from the hrtimer softirq. ================================= [ INFO: inconsistent lock state ] 2.6.27 #2 -------------------------------- inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes: (&rq->lock){++..}, at: [<c011db84>] sched_rt_period_timer+0x9e/0x1fc {in-hardirq-W} state was registered at: [<c014103c>] __lock_acquire+0x549/0x121e [<c0107890>] native_sched_clock+0x88/0x99 [<c013aa12>] clocksource_get_next+0x39/0x3f [<c0139abc>] update_wall_time+0x616/0x7df [<c0141d6b>] lock_acquire+0x5a/0x74 [<c0121724>] scheduler_tick+0x3a/0x18d [<c047ed45>] _spin_lock+0x1c/0x45 [<c0121724>] scheduler_tick+0x3a/0x18d [<c0121724>] scheduler_tick+0x3a/0x18d [<c012c436>] update_process_times+0x3a/0x44 [<c013c044>] tick_periodic+0x63/0x6d [<c013c062>] tick_handle_periodic+0x14/0x5e [<c010568c>] timer_interrupt+0x44/0x4a [<c0150c9f>] handle_IRQ_event+0x13/0x3d [<c0151c14>] handle_level_irq+0x79/0xbd [<c0105634>] do_IRQ+0x69/0x7d [<c01041e4>] common_interrupt+0x28/0x30 [<c047007b>] aac_probe_one+0x1a3/0x3f3 [<c047ec2d>] _spin_unlock_irqrestore+0x36/0x39 [<c01512b4>] setup_irq+0x1be/0x1f9 [<c065d70b>] start_kernel+0x259/0x2c5 [<ffffffff>] 0xffffffff irq event stamp: 50102 hardirqs last enabled at (50102): [<c047ebf4>] _spin_unlock_irq+0x20/0x23 hardirqs last disabled at (50101): [<c047edc2>] _spin_lock_irq+0xa/0x4b softirqs last enabled at (50088): [<c0128ba6>] do_softirq+0x37/0x4d softirqs last disabled at (50099): [<c0128ba6>] do_softirq+0x37/0x4d other info that might help us debug this: no locks held by ksoftirqd/0/4. stack backtrace: Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2 [<c013f6cb>] print_usage_bug+0x13e/0x147 [<c013fef5>] mark_lock+0x493/0x797 [<c01410b1>] __lock_acquire+0x5be/0x121e [<c0141d6b>] lock_acquire+0x5a/0x74 [<c011db84>] sched_rt_period_timer+0x9e/0x1fc [<c047ed45>] _spin_lock+0x1c/0x45 [<c011db84>] sched_rt_period_timer+0x9e/0x1fc [<c011db84>] sched_rt_period_timer+0x9e/0x1fc [<c01210fd>] finish_task_switch+0x41/0xbd [<c0107890>] native_sched_clock+0x88/0x99 [<c011dae6>] sched_rt_period_timer+0x0/0x1fc [<c0136dda>] run_hrtimer_pending+0x54/0xe5 [<c011dae6>] sched_rt_period_timer+0x0/0x1fc [<c0128afb>] __do_softirq+0x7b/0xef [<c0128ba6>] do_softirq+0x37/0x4d [<c0128c12>] ksoftirqd+0x56/0xc5 [<c0128bbc>] ksoftirqd+0x0/0xc5 [<c0134649>] kthread+0x38/0x5d [<c0134611>] kthread+0x0/0x5d [<c0104477>] kernel_thread_helper+0x7/0x10 ======================= Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Oleg Nesterov 提交于
Impact: fix hang/crash on ia64 under high load This is ugly, but the simplest patch by far. Unlike other similar routines, account_group_exec_runtime() could be called "implicitly" from within scheduler after exit_notify(). This means we can race with the parent doing release_task(), we can't just check ->signal != NULL. Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock) before __cleanup_signal() to make sure ->signal can't be freed under task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care about the case when tsk changes cpu/rq under us, this should be OK. Thanks to Ingo who nacked my previous buggy patch. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu> Reported-by: NDoug Chapman <doug.chapman@hp.com>
-
由 Steven Rostedt 提交于
Impact: removal of unnecessary looping The lockless part of the ring buffer allows for reentry into the code from interrupts. A timestamp is taken, a test is preformed and if it detects that an interrupt occurred that did tracing, it tries again. The problem arises if the timestamp code itself causes a trace. The detection will detect this and loop again. The difference between this and an interrupt doing tracing, is that this will fail every time, and cause an infinite loop. Currently, we test if the loop happens 1000 times, and if so, it will produce a warning and disable the ring buffer. The problem with this approach is that it makes it difficult to perform some types of tracing (tracing the timestamp code itself). Each trace entry has a delta timestamp from the previous entry. If a trace entry is reserved but and interrupt occurs and traces before the previous entry is commited, the delta timestamp for that entry will be zero. This actually makes sense in terms of tracing, because the interrupt entry happened before the preempted entry was commited, so one may consider the two happening at the same time. The order is still preserved in the buffer. With this idea, instead of trying to get a new timestamp if an interrupt made it in between the timestamp and the test, the entry could simply make the delta zero and continue. This will prevent interrupts or tracers in the timer code from causing the above loop. Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
-
由 Steven Rostedt 提交于
Impact: fix for bug on resize This patch addresses the bug found here: http://bugzilla.kernel.org/show_bug.cgi?id=11996 When ftrace converted to the new unified trace buffer, the resizing of the buffer was not protected as much as it was originally. If tracing is performed while the resize occurs, then the buffer can be corrupted. This patch disables all ftrace buffer modifications before a resize takes place. Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
-
由 Thomas Gleixner 提交于
Impact: nohz powersavings and wakeup regression commit fb02fbc1 (NOHZ: restart tick device from irq_enter()) causes a serious wakeup regression. While the patch is correct it does not take into account that spurious wakeups happen on x86. A fix for this issue is available, but we just revert to the .27 behaviour and let long running softirqs screw themself. Disable it for now. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Impact: avoid spurious ksoftirqd wakeups The tick idle check which is called from irq_enter() was run before the call to __irq_enter() which did not set the in_interrupt() bits in preempt_count. That way the raise of a softirq woke up softirqd for nothing as the softirq was handled on return from interrupt. Call __irq_enter() before calling into the tick idle check code. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 11月, 2008 1 次提交
-
-
由 Peter Zijlstra 提交于
Impact: clean up and fix debug info printout While looking over the sched_debug code I noticed that we printed the rq schedstats for every cfs_rq, ammend this. Also change nr_spead_over into an int, and fix a little buglet in min_vruntime printing. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 11月, 2008 3 次提交
-
-
由 Li Zefan 提交于
Impact: fix rare memory leak in the sched-domains manual reconfiguration code In the failure path, rd is not attached to a sched domain, so it causes a leak. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Li Zefan 提交于
Impact: re-add incorrectly eliminated sched domain layers (1) on i386 with SCHED_SMT and SCHED_MC enabled # mount -t cgroup -o cpuset xxx /mnt # echo 0 > /mnt/cpuset.sched_load_balance # mkdir /mnt/0 # echo 0 > /mnt/0/cpuset.cpus # dmesg CPU0 attaching sched-domain: domain 0: span 0 level CPU groups: 0 (2) on i386 with SCHED_MC enabled but SCHED_SMT disabled # same with (1) # dmesg CPU0 attaching NULL sched-domain. The bug is that some sched domains may be skipped unintentionally when degenerating (optimizing) sched domains. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Li Zefan 提交于
This fixes an oops when reading /proc/sched_debug. A cgroup won't be removed completely until finishing cgroup_diput(), so we shouldn't invalidate cgrp->dentry in cgroup_rmdir(). Otherwise, when a group is being removed while cgroup_path() gets called, we may trigger NULL dereference BUG. The bug can be reproduced: # cat test.sh #!/bin/sh mount -t cgroup -o cpu xxx /mnt for (( ; ; )) { mkdir /mnt/sub rmdir /mnt/sub } # ./test.sh & # cat /proc/sched_debug BUG: unable to handle kernel NULL pointer dereference at 00000038 IP: [<c045a47f>] cgroup_path+0x39/0x90 ... Call Trace: [<c0420344>] ? print_cfs_rq+0x6e/0x75d [<c0421160>] ? sched_debug_show+0x72d/0xc1e ... Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Acked-by: NPaul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-