1. 02 3月, 2017 1 次提交
  2. 25 12月, 2016 1 次提交
  3. 06 12月, 2016 1 次提交
    • A
      [iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8
      Al Viro 提交于
      copy_from_iter_full(), copy_from_iter_full_nocache() and
      csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
      et.al., advancing iterator only in case of successful full copy
      and returning whether it had been successful or not.
      
      Convert some obvious users.  *NOTE* - do not blindly assume that
      something is a good candidate for those unless you are sure that
      not advancing iov_iter in failure case is the right thing in
      this case.  Anything that does short read/short write kind of
      stuff (or is in a loop, etc.) is unlikely to be a good one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cbbd26b8
  4. 03 6月, 2016 1 次提交
  5. 13 4月, 2016 1 次提交
    • M
      KEYS: Add KEYCTL_DH_COMPUTE command · ddbb4114
      Mat Martineau 提交于
      This adds userspace access to Diffie-Hellman computations through a
      new keyctl() syscall command to calculate shared secrets or public
      keys using input parameters stored in the keyring.
      
      Input key ids are provided in a struct due to the current 5-arg limit
      for the keyctl syscall. Only user keys are supported in order to avoid
      exposing the content of logon or encrypted keys.
      
      The output is written to the provided buffer, based on the assumption
      that the values are only needed in userspace.
      
      Future support for other types of key derivation would involve a new
      command, like KEYCTL_ECDH_COMPUTE.
      
      Once Diffie-Hellman support is included in the crypto API, this code
      can be converted to use the crypto API to take advantage of possible
      hardware acceleration and reduce redundant code.
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ddbb4114
  6. 08 1月, 2016 1 次提交
  7. 19 12月, 2015 1 次提交
    • D
      KEYS: Fix race between read and revoke · b4a1b4f5
      David Howells 提交于
      This fixes CVE-2015-7550.
      
      There's a race between keyctl_read() and keyctl_revoke().  If the revoke
      happens between keyctl_read() checking the validity of a key and the key's
      semaphore being taken, then the key type read method will see a revoked key.
      
      This causes a problem for the user-defined key type because it assumes in
      its read method that there will always be a payload in a non-revoked key
      and doesn't check for a NULL pointer.
      
      Fix this by making keyctl_read() check the validity of a key after taking
      semaphore instead of before.
      
      I think the bug was introduced with the original keyrings code.
      
      This was discovered by a multithreaded test program generated by syzkaller
      (http://github.com/google/syzkaller).  Here's a cleaned up version:
      
      	#include <sys/types.h>
      	#include <keyutils.h>
      	#include <pthread.h>
      	void *thr0(void *arg)
      	{
      		key_serial_t key = (unsigned long)arg;
      		keyctl_revoke(key);
      		return 0;
      	}
      	void *thr1(void *arg)
      	{
      		key_serial_t key = (unsigned long)arg;
      		char buffer[16];
      		keyctl_read(key, buffer, 16);
      		return 0;
      	}
      	int main()
      	{
      		key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING);
      		pthread_t th[5];
      		pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key);
      		pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key);
      		pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key);
      		pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key);
      		pthread_join(th[0], 0);
      		pthread_join(th[1], 0);
      		pthread_join(th[2], 0);
      		pthread_join(th[3], 0);
      		return 0;
      	}
      
      Build as:
      
      	cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread
      
      Run as:
      
      	while keyctl-race; do :; done
      
      as it may need several iterations to crash the kernel.  The crash can be
      summarised as:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      	IP: [<ffffffff81279b08>] user_read+0x56/0xa3
      	...
      	Call Trace:
      	 [<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7
      	 [<ffffffff81277815>] SyS_keyctl+0x83/0xe0
      	 [<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      b4a1b4f5
  8. 15 12月, 2015 1 次提交
    • M
      KEYS: prevent keys from being removed from specified keyrings · d3600bcf
      Mimi Zohar 提交于
      Userspace should not be allowed to remove keys from certain keyrings
      (eg. blacklist), though the keys themselves can expire.
      
      This patch defines a new key flag named KEY_FLAG_KEEP to prevent
      userspace from being able to unlink, revoke, invalidate or timed
      out a key on a keyring.  When this flag is set on the keyring, all
      keys subsequently added are flagged.
      
      In addition, when this flag is set, the keyring itself can not be
      cleared.
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      d3600bcf
  9. 21 10月, 2015 2 次提交
  10. 12 4月, 2015 1 次提交
  11. 02 12月, 2014 1 次提交
    • D
      KEYS: Fix the size of the key description passed to/from userspace · aa9d4437
      David Howells 提交于
      When a key description argument is imported into the kernel from userspace, as
      happens in add_key(), request_key(), KEYCTL_JOIN_SESSION_KEYRING,
      KEYCTL_SEARCH, the description is copied into a buffer up to PAGE_SIZE in size.
      PAGE_SIZE, however, is a variable quantity, depending on the arch.  Fix this at
      4096 instead (ie. 4095 plus a NUL termination) and define a constant
      (KEY_MAX_DESC_SIZE) to this end.
      
      When reading the description back with KEYCTL_DESCRIBE, a PAGE_SIZE internal
      buffer is allocated into which the information and description will be
      rendered.  This means that the description will get truncated if an extremely
      long description it has to be crammed into the buffer with the stringified
      information.  There is no particular need to copy the description into the
      buffer, so just copy it directly to userspace in a separate operation.
      Reported-by: NChristian Kastner <debian@kvr.at>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NChristian Kastner <debian@kvr.at>
      aa9d4437
  12. 17 9月, 2014 1 次提交
    • D
      KEYS: Reinstate EPERM for a key type name beginning with a '.' · 54e2c2c1
      David Howells 提交于
      Reinstate the generation of EPERM for a key type name beginning with a '.' in
      a userspace call.  Types whose name begins with a '.' are internal only.
      
      The test was removed by:
      
      	commit a4e3b8d7
      	Author: Mimi Zohar <zohar@linux.vnet.ibm.com>
      	Date:   Thu May 22 14:02:23 2014 -0400
      	Subject: KEYS: special dot prefixed keyring name bug fix
      
      I think we want to keep the restriction on type name so that userspace can't
      add keys of a special internal type.
      
      Note that removal of the test causes several of the tests in the keyutils
      testsuite to fail.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      54e2c2c1
  13. 18 7月, 2014 1 次提交
    • D
      KEYS: Allow special keys (eg. DNS results) to be invalidated by CAP_SYS_ADMIN · 0c7774ab
      David Howells 提交于
      Special kernel keys, such as those used to hold DNS results for AFS, CIFS and
      NFS and those used to hold idmapper results for NFS, used to be
      'invalidateable' with key_revoke().  However, since the default permissions for
      keys were reduced:
      
      	Commit: 96b5c8fe
      	KEYS: Reduce initial permissions on keys
      
      it has become impossible to do this.
      
      Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be
      invalidated by root.  This should not be used for system keyrings as the
      garbage collector will try and remove any invalidate key.  For system keyrings,
      KEY_FLAG_ROOT_CAN_CLEAR can be used instead.
      
      After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be
      used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and
      idmapper keys.  Invalidated keys are immediately garbage collected and will be
      immediately rerequested if needed again.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NSteve Dickson <steved@redhat.com>
      0c7774ab
  14. 17 7月, 2014 1 次提交
  15. 15 3月, 2014 1 次提交
  16. 24 9月, 2013 1 次提交
    • D
      KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches · f36f8c75
      David Howells 提交于
      Add support for per-user_namespace registers of persistent per-UID kerberos
      caches held within the kernel.
      
      This allows the kerberos cache to be retained beyond the life of all a user's
      processes so that the user's cron jobs can work.
      
      The kerberos cache is envisioned as a keyring/key tree looking something like:
      
      	struct user_namespace
      	  \___ .krb_cache keyring		- The register
      		\___ _krb.0 keyring		- Root's Kerberos cache
      		\___ _krb.5000 keyring		- User 5000's Kerberos cache
      		\___ _krb.5001 keyring		- User 5001's Kerberos cache
      			\___ tkt785 big_key	- A ccache blob
      			\___ tkt12345 big_key	- Another ccache blob
      
      Or possibly:
      
      	struct user_namespace
      	  \___ .krb_cache keyring		- The register
      		\___ _krb.0 keyring		- Root's Kerberos cache
      		\___ _krb.5000 keyring		- User 5000's Kerberos cache
      		\___ _krb.5001 keyring		- User 5001's Kerberos cache
      			\___ tkt785 keyring	- A ccache
      				\___ krbtgt/REDHAT.COM@REDHAT.COM big_key
      				\___ http/REDHAT.COM@REDHAT.COM user
      				\___ afs/REDHAT.COM@REDHAT.COM user
      				\___ nfs/REDHAT.COM@REDHAT.COM user
      				\___ krbtgt/KERNEL.ORG@KERNEL.ORG big_key
      				\___ http/KERNEL.ORG@KERNEL.ORG big_key
      
      What goes into a particular Kerberos cache is entirely up to userspace.  Kernel
      support is limited to giving you the Kerberos cache keyring that you want.
      
      The user asks for their Kerberos cache by:
      
      	krb_cache = keyctl_get_krbcache(uid, dest_keyring);
      
      The uid is -1 or the user's own UID for the user's own cache or the uid of some
      other user's cache (requires CAP_SETUID).  This permits rpc.gssd or whatever to
      mess with the cache.
      
      The cache returned is a keyring named "_krb.<uid>" that the possessor can read,
      search, clear, invalidate, unlink from and add links to.  Active LSMs get a
      chance to rule on whether the caller is permitted to make a link.
      
      Each uid's cache keyring is created when it first accessed and is given a
      timeout that is extended each time this function is called so that the keyring
      goes away after a while.  The timeout is configurable by sysctl but defaults to
      three days.
      
      Each user_namespace struct gets a lazily-created keyring that serves as the
      register.  The cache keyrings are added to it.  This means that standard key
      search and garbage collection facilities are available.
      
      The user_namespace struct's register goes away when it does and anything left
      in it is then automatically gc'd.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NSimo Sorce <simo@redhat.com>
      cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
      cc: Eric W. Biederman <ebiederm@xmission.com>
      f36f8c75
  17. 08 5月, 2013 1 次提交
  18. 08 10月, 2012 1 次提交
    • D
      KEYS: Add payload preparsing opportunity prior to key instantiate or update · cf7f601c
      David Howells 提交于
      Give the key type the opportunity to preparse the payload prior to the
      instantiation and update routines being called.  This is done with the
      provision of two new key type operations:
      
      	int (*preparse)(struct key_preparsed_payload *prep);
      	void (*free_preparse)(struct key_preparsed_payload *prep);
      
      If the first operation is present, then it is called before key creation (in
      the add/update case) or before the key semaphore is taken (in the update and
      instantiate cases).  The second operation is called to clean up if the first
      was called.
      
      preparse() is given the opportunity to fill in the following structure:
      
      	struct key_preparsed_payload {
      		char		*description;
      		void		*type_data[2];
      		void		*payload;
      		const void	*data;
      		size_t		datalen;
      		size_t		quotalen;
      	};
      
      Before the preparser is called, the first three fields will have been cleared,
      the payload pointer and size will be stored in data and datalen and the default
      quota size from the key_type struct will be stored into quotalen.
      
      The preparser may parse the payload in any way it likes and may store data in
      the type_data[] and payload fields for use by the instantiate() and update()
      ops.
      
      The preparser may also propose a description for the key by attaching it as a
      string to the description field.  This can be used by passing a NULL or ""
      description to the add_key() system call or the key_create_or_update()
      function.  This cannot work with request_key() as that required the description
      to tell the upcall about the key to be created.
      
      This, for example permits keys that store PGP public keys to generate their own
      name from the user ID and public key fingerprint in the key.
      
      The instantiate() and update() operations are then modified to look like this:
      
      	int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
      	int (*update)(struct key *key, struct key_preparsed_payload *prep);
      
      and the new payload data is passed in *prep, whether or not it was preparsed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      cf7f601c
  19. 03 10月, 2012 1 次提交
    • D
      KEYS: Make the session and process keyrings per-thread · 3a50597d
      David Howells 提交于
      Make the session keyring per-thread rather than per-process, but still
      inherited from the parent thread to solve a problem with PAM and gdm.
      
      The problem is that join_session_keyring() will reject attempts to change the
      session keyring of a multithreaded program but gdm is now multithreaded before
      it gets to the point of starting PAM and running pam_keyinit to create the
      session keyring.  See:
      
      	https://bugs.freedesktop.org/show_bug.cgi?id=49211
      
      The reason that join_session_keyring() will only change the session keyring
      under a single-threaded environment is that it's hard to alter the other
      thread's credentials to effect the change in a multi-threaded program.  The
      problems are such as:
      
       (1) How to prevent two threads both running join_session_keyring() from
           racing.
      
       (2) Another thread's credentials may not be modified directly by this process.
      
       (3) The number of threads is uncertain whilst we're not holding the
           appropriate spinlock, making preallocation slightly tricky.
      
       (4) We could use TIF_NOTIFY_RESUME and key_replace_session_keyring() to get
           another thread to replace its keyring, but that means preallocating for
           each thread.
      
      A reasonable way around this is to make the session keyring per-thread rather
      than per-process and just document that if you want a common session keyring,
      you must get it before you spawn any threads - which is the current situation
      anyway.
      
      Whilst we're at it, we can the process keyring behave in the same way.  This
      means we can clean up some of the ickyness in the creds code.
      
      Basically, after this patch, the session, process and thread keyrings are about
      inheritance rules only and not about sharing changes of keyring.
      Reported-by: NMantas M. <grawity@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRay Strode <rstrode@redhat.com>
      3a50597d
  20. 28 9月, 2012 1 次提交
  21. 14 9月, 2012 1 次提交
    • E
      userns: Convert security/keys to the new userns infrastructure · 9a56c2db
      Eric W. Biederman 提交于
      - Replace key_user ->user_ns equality checks with kuid_has_mapping checks.
      - Use from_kuid to generate key descriptions
      - Use kuid_t and kgid_t and the associated helpers instead of uid_t and gid_t
      - Avoid potential problems with file descriptor passing by displaying
        keys in the user namespace of the opener of key status proc files.
      
      Cc: linux-security-module@vger.kernel.org
      Cc: keyrings@linux-nfs.org
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      9a56c2db
  22. 13 9月, 2012 2 次提交
    • O
      task_work: Revert "hold task_lock around checks in keyctl" · b3f68f16
      Oleg Nesterov 提交于
      This reverts commit d35abdb2.
      
      task_lock() was added to ensure exit_mm() and thus exit_task_work() is
      not possible before task_work_add().
      
      This is wrong, task_lock() must not be nested with write_lock(tasklist).
      And this is no longer needed, task_work_add() now fails if it is called
      after exit_task_work().
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20120826191214.GA4231@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3f68f16
    • D
      KEYS: Add payload preparsing opportunity prior to key instantiate or update · d4f65b5d
      David Howells 提交于
      Give the key type the opportunity to preparse the payload prior to the
      instantiation and update routines being called.  This is done with the
      provision of two new key type operations:
      
      	int (*preparse)(struct key_preparsed_payload *prep);
      	void (*free_preparse)(struct key_preparsed_payload *prep);
      
      If the first operation is present, then it is called before key creation (in
      the add/update case) or before the key semaphore is taken (in the update and
      instantiate cases).  The second operation is called to clean up if the first
      was called.
      
      preparse() is given the opportunity to fill in the following structure:
      
      	struct key_preparsed_payload {
      		char		*description;
      		void		*type_data[2];
      		void		*payload;
      		const void	*data;
      		size_t		datalen;
      		size_t		quotalen;
      	};
      
      Before the preparser is called, the first three fields will have been cleared,
      the payload pointer and size will be stored in data and datalen and the default
      quota size from the key_type struct will be stored into quotalen.
      
      The preparser may parse the payload in any way it likes and may store data in
      the type_data[] and payload fields for use by the instantiate() and update()
      ops.
      
      The preparser may also propose a description for the key by attaching it as a
      string to the description field.  This can be used by passing a NULL or ""
      description to the add_key() system call or the key_create_or_update()
      function.  This cannot work with request_key() as that required the description
      to tell the upcall about the key to be created.
      
      This, for example permits keys that store PGP public keys to generate their own
      name from the user ID and public key fingerprint in the key.
      
      The instantiate() and update() operations are then modified to look like this:
      
      	int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
      	int (*update)(struct key *key, struct key_preparsed_payload *prep);
      
      and the new payload data is passed in *prep, whether or not it was preparsed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d4f65b5d
  23. 23 7月, 2012 3 次提交
  24. 01 6月, 2012 2 次提交
  25. 25 5月, 2012 1 次提交
    • D
      KEYS: Fix some sparse warnings · 423b9788
      David Howells 提交于
      Fix some sparse warnings in the keyrings code:
      
       (1) compat_keyctl_instantiate_key_iov() should be static.
      
       (2) There were a couple of places where a pointer was being compared against
           integer 0 rather than NULL.
      
       (3) keyctl_instantiate_key_common() should not take a __user-labelled iovec
           pointer as the caller must have copied the iovec to kernel space.
      
       (4) __key_link_begin() takes and __key_link_end() releases
           keyring_serialise_link_sem under some circumstances and so this should be
           declared.
      
           Note that adding __acquires() and __releases() for this doesn't help cure
           the warnings messages - something only commenting out both helps.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      423b9788
  26. 24 5月, 2012 2 次提交
    • O
      keys: change keyctl_session_to_parent() to use task_work_add() · 413cd3d9
      Oleg Nesterov 提交于
      Change keyctl_session_to_parent() to use task_work_add() and move
      key_replace_session_keyring() logic into task_work->func().
      
      Note that we do task_work_cancel() before task_work_add() to ensure that
      only one work can be pending at any time.  This is important, we must not
      allow user-space to abuse the parent's ->task_works list.
      
      The callback, replace_session_keyring(), checks PF_EXITING.  I guess this
      is not really needed but looks better.
      
      As a side effect, this fixes the (unlikely) race.  The callers of
      key_replace_session_keyring() and keyctl_session_to_parent() lack the
      necessary barriers, the parent can miss the request.
      
      Now we can remove task_struct->replacement_session_keyring and related
      code.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Smith <dsmith@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      413cd3d9
    • A
      TIF_NOTIFY_RESUME is defined on all targets now · 1227dd77
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1227dd77
  27. 11 5月, 2012 1 次提交
    • D
      KEYS: Add invalidation support · fd75815f
      David Howells 提交于
      Add support for invalidating a key - which renders it immediately invisible to
      further searches and causes the garbage collector to immediately wake up,
      remove it from keyrings and then destroy it when it's no longer referenced.
      
      It's better not to do this with keyctl_revoke() as that marks the key to start
      returning -EKEYREVOKED to searches when what is actually desired is to have the
      key refetched.
      
      To invalidate a key the caller must be granted SEARCH permission by the key.
      This may be too strict.  It may be better to also permit invalidation if the
      caller has any of READ, WRITE or SETATTR permission.
      
      The primary use for this is to evict keys that are cached in special keyrings,
      such as the DNS resolver or an ID mapper.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fd75815f
  28. 02 3月, 2012 1 次提交
  29. 19 1月, 2012 1 次提交
  30. 01 11月, 2011 1 次提交
    • C
      Cross Memory Attach · fcf63409
      Christopher Yeoh 提交于
      The basic idea behind cross memory attach is to allow MPI programs doing
      intra-node communication to do a single copy of the message rather than a
      double copy of the message via shared memory.
      
      The following patch attempts to achieve this by allowing a destination
      process, given an address and size from a source process, to copy memory
      directly from the source process into its own address space via a system
      call.  There is also a symmetrical ability to copy from the current
      process's address space into a destination process's address space.
      
      - Use of /proc/pid/mem has been considered, but there are issues with
        using it:
        - Does not allow for specifying iovecs for both src and dest, assuming
          preadv or pwritev was implemented either the area read from or
        written to would need to be contiguous.
        - Currently mem_read allows only processes who are currently
        ptrace'ing the target and are still able to ptrace the target to read
        from the target. This check could possibly be moved to the open call,
        but its not clear exactly what race this restriction is stopping
        (reason  appears to have been lost)
        - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
        domain socket is a bit ugly from a userspace point of view,
        especially when you may have hundreds if not (eventually) thousands
        of processes  that all need to do this with each other
        - Doesn't allow for some future use of the interface we would like to
        consider adding in the future (see below)
        - Interestingly reading from /proc/pid/mem currently actually
        involves two copies! (But this could be fixed pretty easily)
      
      As mentioned previously use of vmsplice instead was considered, but has
      problems.  Since you need the reader and writer working co-operatively if
      the pipe is not drained then you block.  Which requires some wrapping to
      do non blocking on the send side or polling on the receive.  In all to all
      communication it requires ordering otherwise you can deadlock.  And in the
      example of many MPI tasks writing to one MPI task vmsplice serialises the
      copying.
      
      There are some cases of MPI collectives where even a single copy interface
      does not get us the performance gain we could.  For example in an
      MPI_Reduce rather than copy the data from the source we would like to
      instead use it directly in a mathops (say the reduce is doing a sum) as
      this would save us doing a copy.  We don't need to keep a copy of the data
      from the source.  I haven't implemented this, but I think this interface
      could in the future do all this through the use of the flags - eg could
      specify the math operation and type and the kernel rather than just
      copying the data would apply the specified operation between the source
      and destination and store it in the destination.
      
      Although we don't have a "second user" of the interface (though I've had
      some nibbles from people who may be interested in using it for intra
      process messaging which is not MPI).  This interface is something which
      hardware vendors are already doing for their custom drivers to implement
      fast local communication.  And so in addition to this being useful for
      OpenMPI it would mean the driver maintainers don't have to fix things up
      when the mm changes.
      
      There was some discussion about how much faster a true zero copy would
      go. Here's a link back to the email with some testing I did on that:
      
      http://marc.info/?l=linux-mm&m=130105930902915&w=2
      
      There is a basic man page for the proposed interface here:
      
      http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
      
      This has been implemented for x86 and powerpc, other architecture should
      mainly (I think) just need to add syscall numbers for the process_vm_readv
      and process_vm_writev. There are 32 bit compatibility versions for
      64-bit kernels.
      
      For arch maintainers there are some simple tests to be able to quickly
      verify that the syscalls are working correctly here:
      
      http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: <linux-man@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf63409
  31. 17 3月, 2011 1 次提交
    • D
      KEYS: Make request_key() and co. return an error for a negative key · 4aab1e89
      David Howells 提交于
      Make request_key() and co. return an error for a negative or rejected key.  If
      the key was simply negated, then return ENOKEY, otherwise return the error
      with which it was rejected.
      
      Without this patch, the following command returns a key number (with the latest
      keyutils):
      
      	[root@andromeda ~]# keyctl request2 user debug:foo rejected @s
      	586569904
      
      Trying to print the key merely gets you a permission denied error:
      
      	[root@andromeda ~]# keyctl print 586569904
      	keyctl_read_alloc: Permission denied
      
      Doing another request_key() call does get you the error, as long as it hasn't
      expired yet:
      
      	[root@andromeda ~]# keyctl request user debug:foo
      	request_key: Key was rejected by service
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      4aab1e89
  32. 08 3月, 2011 2 次提交
  33. 22 1月, 2011 1 次提交