- 03 10月, 2012 2 次提交
-
-
由 David Howells 提交于
Reduce the initial permissions on new keys to grant the possessor everything, view permission only to the user (so the keys can be seen in /proc/keys) and nothing else. This gives the creator a chance to adjust the permissions mask before other processes can access the new key or create a link to it. To aid with this, keyring_alloc() now takes a permission argument rather than setting the permissions itself. The following permissions are now set: (1) The user and user-session keyrings grant the user that owns them full permissions and grant a possessor everything bar SETATTR. (2) The process and thread keyrings grant the possessor full permissions but only grant the user VIEW. This permits the user to see them in /proc/keys, but not to do anything with them. (3) Anonymous session keyrings grant the possessor full permissions, but only grant the user VIEW and READ. This means that the user can see them in /proc/keys and can list them, but nothing else. Possibly READ shouldn't be provided either. (4) Named session keyrings grant everything an anonymous session keyring does, plus they grant the user LINK permission. The whole point of named session keyrings is that others can also subscribe to them. Possibly this should be a separate permission to LINK. (5) The temporary session keyring created by call_sbin_request_key() gets the same permissions as an anonymous session keyring. (6) Keys created by add_key() get VIEW, SEARCH, LINK and SETATTR for the possessor, plus READ and/or WRITE if the key type supports them. The used only gets VIEW now. (7) Keys created by request_key() now get the same as those created by add_key(). Reported-by: NLennart Poettering <lennart@poettering.net> Reported-by: NStef Walter <stefw@redhat.com> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Make the session keyring per-thread rather than per-process, but still inherited from the parent thread to solve a problem with PAM and gdm. The problem is that join_session_keyring() will reject attempts to change the session keyring of a multithreaded program but gdm is now multithreaded before it gets to the point of starting PAM and running pam_keyinit to create the session keyring. See: https://bugs.freedesktop.org/show_bug.cgi?id=49211 The reason that join_session_keyring() will only change the session keyring under a single-threaded environment is that it's hard to alter the other thread's credentials to effect the change in a multi-threaded program. The problems are such as: (1) How to prevent two threads both running join_session_keyring() from racing. (2) Another thread's credentials may not be modified directly by this process. (3) The number of threads is uncertain whilst we're not holding the appropriate spinlock, making preallocation slightly tricky. (4) We could use TIF_NOTIFY_RESUME and key_replace_session_keyring() to get another thread to replace its keyring, but that means preallocating for each thread. A reasonable way around this is to make the session keyring per-thread rather than per-process and just document that if you want a common session keyring, you must get it before you spawn any threads - which is the current situation anyway. Whilst we're at it, we can the process keyring behave in the same way. This means we can clean up some of the ickyness in the creds code. Basically, after this patch, the session, process and thread keyrings are about inheritance rules only and not about sharing changes of keyring. Reported-by: NMantas M. <grawity@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NRay Strode <rstrode@redhat.com>
-
- 28 9月, 2012 2 次提交
-
-
由 Alan Cox 提交于
On an error iov may still have been reallocated and need freeing Signed-off-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 Alan Cox 提交于
We set ret to NULL then test it. Remove the bogus test Signed-off-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 23 8月, 2012 1 次提交
-
-
由 Kent Yoder 提交于
Move the tpm_get_random api from the trusted keys code into the TPM device driver itself so that other callers can make use of it. Also, change the api slightly so that the number of bytes read is returned in the call, since the TPM command can potentially return fewer bytes than requested. Acked-by: NDavid Safford <safford@linux.vnet.ibm.com> Reviewed-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NKent Yoder <key@linux.vnet.ibm.com>
-
- 23 7月, 2012 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
task_work and rcu_head are identical now; merge them (calling the result struct callback_head, rcu_head #define'd to it), kill separate allocation in security/keys since we can just use cred->rcu now. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
get rid of the only user of ->data; this is _not_ the final variant - in the end we'll have task_work and rcu_head identical and just use cred->rcu, at which point the separate allocation will be gone completely. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 6月, 2012 3 次提交
-
-
由 Christopher Yeoh 提交于
A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after changes made to support CMA in an earlier patch. Rather than having an additional check_access parameter to these functions, the first paramater type is overloaded to allow the caller to specify CHECK_IOVEC_ONLY which means check that the contents of the iovec are valid, but do not check the memory that they point to. This is used by process_vm_readv/writev where we need to validate that a iovec passed to the syscall is valid but do not want to check the memory that it points to at this point because it refers to an address space in another process. Signed-off-by: NChris Yeoh <yeohc@au1.ibm.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Boaz Harrosh 提交于
Both kernel/sys.c && security/keys/request_key.c where inlining the exact same code as call_usermodehelper_fns(); So simply convert these sites to directly use call_usermodehelper_fns(). Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
This allocation may be large. The code is probing to see if it will succeed and if not, it falls back to vmalloc(). We should suppress any page-allocation failure messages when the fallback happens. Reported-by: NDave Jones <davej@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 5月, 2012 1 次提交
-
-
由 David Howells 提交于
Fix some sparse warnings in the keyrings code: (1) compat_keyctl_instantiate_key_iov() should be static. (2) There were a couple of places where a pointer was being compared against integer 0 rather than NULL. (3) keyctl_instantiate_key_common() should not take a __user-labelled iovec pointer as the caller must have copied the iovec to kernel space. (4) __key_link_begin() takes and __key_link_end() releases keyring_serialise_link_sem under some circumstances and so this should be declared. Note that adding __acquires() and __releases() for this doesn't help cure the warnings messages - something only commenting out both helps. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 24 5月, 2012 2 次提交
-
-
由 Oleg Nesterov 提交于
Change keyctl_session_to_parent() to use task_work_add() and move key_replace_session_keyring() logic into task_work->func(). Note that we do task_work_cancel() before task_work_add() to ensure that only one work can be pending at any time. This is important, we must not allow user-space to abuse the parent's ->task_works list. The callback, replace_session_keyring(), checks PF_EXITING. I guess this is not really needed but looks better. As a side effect, this fixes the (unlikely) race. The callers of key_replace_session_keyring() and keyctl_session_to_parent() lack the necessary barriers, the parent can miss the request. Now we can remove task_struct->replacement_session_keyring and related code. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Smith <dsmith@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 5月, 2012 1 次提交
-
-
由 David Howells 提交于
Don't bother checking for NULL key pointer in key_validate() as all of the places that call it will crash anyway if the relevant key pointer is NULL by the time they call key_validate(). Therefore, the checking must be done prior to calling here. Whilst we're at it, simplify the key_validate() function a bit and mark its argument const. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 11 5月, 2012 7 次提交
-
-
由 David Howells 提交于
Add support for invalidating a key - which renders it immediately invisible to further searches and causes the garbage collector to immediately wake up, remove it from keyrings and then destroy it when it's no longer referenced. It's better not to do this with keyctl_revoke() as that marks the key to start returning -EKEYREVOKED to searches when what is actually desired is to have the key refetched. To invalidate a key the caller must be granted SEARCH permission by the key. This may be too strict. It may be better to also permit invalidation if the caller has any of READ, WRITE or SETATTR permission. The primary use for this is to evict keys that are cached in special keyrings, such as the DNS resolver or an ID mapper. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Do an LRU discard in keyrings that are full rather than returning ENFILE. To perform this, a time_t is added to the key struct and updated by the creation of a link to a key and by a key being found as the result of a search. At the completion of a successful search, the keyrings in the path between the root of the search and the first found link to it also have their last-used times updated. Note that discarding a link to a key from a keyring does not necessarily destroy the key as there may be references held by other places. An alternate discard method that might suffice is to perform FIFO discard from the keyring, using the spare 2-byte hole in the keylist header as the index of the next link to be discarded. This is useful when using a keyring as a cache for DNS results or foreign filesystem IDs. This can be tested by the following. As root do: echo 1000 >/proc/sys/kernel/keys/root_maxkeys kr=`keyctl newring foo @s` for ((i=0; i<2000; i++)); do keyctl add user a$i a $kr; done Without this patch ENFILE should be reported when the keyring fills up. With this patch, the keyring discards keys in an LRU fashion. Note that the stored LRU time has a granularity of 1s. After doing this, /proc/key-users can be observed and should show that most of the 2000 keys have been discarded: [root@andromeda ~]# cat /proc/key-users 0: 517 516/516 513/1000 5249/20000 The "513/1000" here is the number of quota-accounted keys present for this user out of the maximum permitted. In /proc/keys, the keyring shows the number of keys it has and the number of slots it has allocated: [root@andromeda ~]# grep foo /proc/keys 200c64c4 I--Q-- 1 perm 3b3f0000 0 0 keyring foo: 509/509 The maximum is (PAGE_SIZE - header) / key pointer size. That's typically 509 on a 64-bit system and 1020 on a 32-bit system. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Make use of the previous patch that makes the garbage collector perform RCU synchronisation before destroying defunct keys. Key pointers can now be replaced in-place without creating a new keyring payload and replacing the whole thing as the discarded keys will not be destroyed until all currently held RCU read locks are released. If the keyring payload space needs to be expanded or contracted, then a replacement will still need allocating, and the original will still have to be freed by RCU. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Make the keys garbage collector invoke synchronize_rcu() prior to destroying keys with a zero usage count. This means that a key can be examined under the RCU read lock in the safe knowledge that it won't get deallocated until after the lock is released - even if its usage count becomes zero whilst we're looking at it. This is useful in keyring search vs key link. Consider a keyring containing a link to a key. That link can be replaced in-place in the keyring without requiring an RCU copy-and-replace on the keyring contents without breaking a search underway on that keyring when the displaced key is released, provided the key is actually destroyed only after the RCU read lock held by the search algorithm is released. This permits __key_link() to replace a key without having to reallocate the key payload. A key gets replaced if a new key being linked into a keyring has the same type and description. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com>
-
由 David Howells 提交于
Announce the (un)registration of a key type in the core key code rather than in the callers. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NMimi Zohar <zohar@us.ibm.com>
-
由 David Howells 提交于
Reorganise the keys directory Makefile to put all the core bits together and the type-specific bits after. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NMimi Zohar <zohar@us.ibm.com>
-
由 David Howells 提交于
Move the key config into security/keys/Kconfig as there are going to be a lot of key-related options. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NMimi Zohar <zohar@us.ibm.com>
-
- 03 5月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
As a first step to converting struct cred to be all kuid_t and kgid_t values convert the group values stored in group_info to always be kgid_t values. Unless user namespaces are used this change should have no effect. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 08 4月, 2012 2 次提交
-
-
由 Eric W. Biederman 提交于
struct user_struct will shortly loose it's user_ns reference so make the cred user_ns reference a proper reference complete with reference counting. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Optimize performance and prepare for the removal of the user_ns reference from user_struct. Remove the slow long walk through cred->user->user_ns and instead go straight to cred->user_ns. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 24 3月, 2012 1 次提交
-
-
由 Oleg Nesterov 提交于
No functional changes. It is not sane to use UMH_KILLABLE with enum umh_wait, but obviously we do not want another argument in call_usermodehelper_* helpers. Kill this enum, use the plain int. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 3月, 2012 1 次提交
-
-
由 Dan Carpenter 提交于
The test for "if (cred->request_key_auth->flags & KEY_FLAG_REVOKED) {" should actually testing that the (1 << KEY_FLAG_REVOKED) bit is set. The current code actually checks for KEY_FLAG_DEAD. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 02 3月, 2012 1 次提交
-
-
由 Bryan Schumaker 提交于
The keyctl_set_timeout function isn't exported to other parts of the kernel, but I want to use it for the NFS idmapper. I already have the key, but I wanted a generic way to set the timeout. Signed-off-by: NBryan Schumaker <bjschuma@netapp.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 19 1月, 2012 2 次提交
-
-
由 Mimi Zohar 提交于
Replace the rcu_assign_pointer() calls with rcu_assign_keypointer(). Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
The kernel contains some special internal keyrings, for instance the DNS resolver keyring : 2a93faf1 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: empty It would occasionally be useful to allow the contents of such keyrings to be flushed by root (cache invalidation). Allow a flag to be set on a keyring to mark that someone possessing the sysadmin capability can clear the keyring, even without normal write access to the keyring. Set this flag on the special keyrings created by the DNS resolver, the NFS identity mapper and the CIFS identity mapper. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 18 1月, 2012 4 次提交
-
-
由 Jeff Layton 提交于
For CIFS, we want to be able to store NTLM credentials (aka username and password) in the keyring. We do not, however want to allow users to fetch those keys back out of the keyring since that would be a security risk. Unfortunately, due to the nuances of key permission bits, it's not possible to do this. We need to grant search permissions so the kernel can find these keys, but that also implies permissions to read the payload. Resolve this by adding a new key_type. This key type is essentially the same as key_type_user, but does not define a .read op. This prevents the payload from ever being visible from userspace. This key type also vets the description to ensure that it's "qualified" by checking to ensure that it has a ':' in it that is preceded by other characters. Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <smfrench@gmail.com>
-
由 Mimi Zohar 提交于
Enabling CONFIG_PROVE_RCU and CONFIG_SPARSE_RCU_POINTER resulted in "suspicious rcu_dereference_check() usage!" and "incompatible types in comparison expression (different address spaces)" messages. Access the masterkey directly when holding the rwsem. Changelog v1: - Use either rcu_read_lock()/rcu_derefence_key()/rcu_read_unlock() or remove the unnecessary rcu_derefence() - David Howells Reported-by: NDmitry Kasatkin <dmitry.kasatkin@intel.com> Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Mimi Zohar 提交于
Define rcu_assign_keypointer(), which uses the key payload.rcudata instead of payload.data, to resolve the CONFIG_SPARSE_RCU_POINTER message: "incompatible types in comparison expression (different address spaces)" Replace the rcu_assign_pointer() calls in encrypted/trusted keys with rcu_assign_keypointer(). Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Add missing smp_rmb() primitives to the keyring search code. When keyring payloads are appended to without replacement (thus using up spare slots in the key pointer array), an smp_wmb() is issued between the pointer assignment and the increment of the key count (nkeys). There should be corresponding read barriers between the read of nkeys and dereferences of keys[n] when n is dependent on the value of nkeys. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 17 11月, 2011 3 次提交
-
-
由 David Howells 提交于
Give keys their own lockdep class to differentiate them from each other in case a key of one type has to refer to a key of another type. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NMimi Zohar <zohar@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Mimi Zohar 提交于
Encrypted keys are encrypted/decrypted using either a trusted or user-defined key type, which is referred to as the 'master' key. The master key may be of type trusted iff the trusted key is builtin or both the trusted key and encrypted keys are built as modules. This patch resolves the build dependency problem. - Use "masterkey-$(CONFIG_TRUSTED_KEYS)-$(CONFIG_ENCRYPTED_KEYS)" construct to encapsulate the above logic. (Suggested by Dimtry Kasatkin.) - Fixing the encrypted-keys Makefile, results in a module name change from encrypted.ko to encrypted-keys.ko. - Add module dependency for request_trusted_key() definition Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
-
由 Mimi Zohar 提交于
Fix request_master_key() error return code. Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
-
- 16 11月, 2011 1 次提交
-
-
由 David Howells 提交于
Fix a NULL pointer deref in the user-defined key type whereby updating a negative key into a fully instantiated key will cause an oops to occur when the code attempts to free the non-existent old payload. This results in an oops that looks something like the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff81085fa1>] __call_rcu+0x11/0x13e PGD 3391d067 PUD 3894a067 PMD 0 Oops: 0002 [#1] SMP CPU 1 Pid: 4354, comm: keyctl Not tainted 3.1.0-fsdevel+ #1140 /DG965RY RIP: 0010:[<ffffffff81085fa1>] [<ffffffff81085fa1>] __call_rcu+0x11/0x13e RSP: 0018:ffff88003d591df8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000006e RDX: ffffffff8161d0c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88003d591e18 R08: 0000000000000000 R09: ffffffff8152fa6c R10: 0000000000000000 R11: 0000000000000300 R12: ffff88003b8f9538 R13: ffffffff8161d0c0 R14: ffff88003b8f9d50 R15: ffff88003c69f908 FS: 00007f97eb18c720(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000003d47a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process keyctl (pid: 4354, threadinfo ffff88003d590000, task ffff88003c78a040) Stack: ffff88003e0ffde0 ffff88003b8f9538 0000000000000001 ffff88003b8f9d50 ffff88003d591e28 ffffffff810860f0 ffff88003d591e68 ffffffff8117bfea ffff88003d591e68 ffffffff00000000 ffff88003e0ffde1 ffff88003e0ffde0 Call Trace: [<ffffffff810860f0>] call_rcu_sched+0x10/0x12 [<ffffffff8117bfea>] user_update+0x8d/0xa2 [<ffffffff8117723a>] key_create_or_update+0x236/0x270 [<ffffffff811789b1>] sys_add_key+0x123/0x17e [<ffffffff813b84bb>] system_call_fastpath+0x16/0x1b Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com> Acked-by: NNeil Horman <nhorman@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 11月, 2011 2 次提交
-
-
由 Andy Shevchenko 提交于
There is no functional change. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Mimi Zohar <zohar@us.ibm.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christopher Yeoh 提交于
The basic idea behind cross memory attach is to allow MPI programs doing intra-node communication to do a single copy of the message rather than a double copy of the message via shared memory. The following patch attempts to achieve this by allowing a destination process, given an address and size from a source process, to copy memory directly from the source process into its own address space via a system call. There is also a symmetrical ability to copy from the current process's address space into a destination process's address space. - Use of /proc/pid/mem has been considered, but there are issues with using it: - Does not allow for specifying iovecs for both src and dest, assuming preadv or pwritev was implemented either the area read from or written to would need to be contiguous. - Currently mem_read allows only processes who are currently ptrace'ing the target and are still able to ptrace the target to read from the target. This check could possibly be moved to the open call, but its not clear exactly what race this restriction is stopping (reason appears to have been lost) - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix domain socket is a bit ugly from a userspace point of view, especially when you may have hundreds if not (eventually) thousands of processes that all need to do this with each other - Doesn't allow for some future use of the interface we would like to consider adding in the future (see below) - Interestingly reading from /proc/pid/mem currently actually involves two copies! (But this could be fixed pretty easily) As mentioned previously use of vmsplice instead was considered, but has problems. Since you need the reader and writer working co-operatively if the pipe is not drained then you block. Which requires some wrapping to do non blocking on the send side or polling on the receive. In all to all communication it requires ordering otherwise you can deadlock. And in the example of many MPI tasks writing to one MPI task vmsplice serialises the copying. There are some cases of MPI collectives where even a single copy interface does not get us the performance gain we could. For example in an MPI_Reduce rather than copy the data from the source we would like to instead use it directly in a mathops (say the reduce is doing a sum) as this would save us doing a copy. We don't need to keep a copy of the data from the source. I haven't implemented this, but I think this interface could in the future do all this through the use of the flags - eg could specify the math operation and type and the kernel rather than just copying the data would apply the specified operation between the source and destination and store it in the destination. Although we don't have a "second user" of the interface (though I've had some nibbles from people who may be interested in using it for intra process messaging which is not MPI). This interface is something which hardware vendors are already doing for their custom drivers to implement fast local communication. And so in addition to this being useful for OpenMPI it would mean the driver maintainers don't have to fix things up when the mm changes. There was some discussion about how much faster a true zero copy would go. Here's a link back to the email with some testing I did on that: http://marc.info/?l=linux-mm&m=130105930902915&w=2 There is a basic man page for the proposed interface here: http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt This has been implemented for x86 and powerpc, other architecture should mainly (I think) just need to add syscall numbers for the process_vm_readv and process_vm_writev. There are 32 bit compatibility versions for 64-bit kernels. For arch maintainers there are some simple tests to be able to quickly verify that the syscalls are working correctly here: http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <linux-man@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-