1. 29 1月, 2008 10 次提交
  2. 29 11月, 2007 1 次提交
    • F
      [UNIX]: EOF on non-blocking SOCK_SEQPACKET · 0a112258
      Florian Zumbiehl 提交于
      I am not absolutely sure whether this actually is a bug (as in: I've got
      no clue what the standards say or what other implementations do), but at
      least I was pretty surprised when I noticed that a recv() on a
      non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection
      oriented, after all) where the remote end has closed the connection
      returned -1 (EAGAIN) rather than 0 to indicate end of file.
      
      This is a test case:
      
      | #include <sys/types.h>
      | #include <unistd.h>
      | #include <sys/socket.h>
      | #include <sys/un.h>
      | #include <fcntl.h>
      | #include <string.h>
      | #include <stdlib.h>
      | 
      | int main(){
      | 	int sock;
      | 	struct sockaddr_un addr;
      | 	char buf[4096];
      | 	int pfds[2];
      | 
      | 	pipe(pfds);
      | 	sock=socket(PF_UNIX,SOCK_SEQPACKET,0);
      | 	addr.sun_family=AF_UNIX;
      | 	strcpy(addr.sun_path,"/tmp/foobar_testsock");
      | 	bind(sock,(struct sockaddr *)&addr,sizeof(addr));
      | 	listen(sock,1);
      | 	if(fork()){
      | 		close(sock);
      | 		sock=socket(PF_UNIX,SOCK_SEQPACKET,0);
      | 		connect(sock,(struct sockaddr *)&addr,sizeof(addr));
      | 		fcntl(sock,F_SETFL,fcntl(sock,F_GETFL)|O_NONBLOCK);
      | 		close(pfds[1]);
      | 		read(pfds[0],buf,sizeof(buf));
      | 		recv(sock,buf,sizeof(buf),0); // <-- this one
      | 	}else accept(sock,NULL,NULL);
      | 	exit(0);
      | }
      
      If you try it, make sure /tmp/foobar_testsock doesn't exist.
      
      The marked recv() returns -1 (EAGAIN) on 2.6.23.9. Below you find a
      patch that fixes that.
      Signed-off-by: NFlorian Zumbiehl <florz@florz.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0a112258
  3. 11 11月, 2007 3 次提交
  4. 01 11月, 2007 1 次提交
  5. 20 10月, 2007 1 次提交
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
  6. 15 10月, 2007 1 次提交
    • I
      sched: affine sync wakeups · 71e20f18
      Ingo Molnar 提交于
      make sync wakeups affine for cache-cold tasks: if a cache-cold task
      is woken up by a sync wakeup then use the opportunity to migrate it
      straight away. (the two tasks are 'related' because they communicate)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71e20f18
  7. 11 10月, 2007 3 次提交
    • P
      [NET]: Make core networking code use seq_open_private · cf7732e4
      Pavel Emelyanov 提交于
      This concerns the ipv4 and ipv6 code mostly, but also the netlink
      and unix sockets.
      
      The netlink code is an example of how to use the __seq_open_private()
      call - it saves the net namespace on this private.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf7732e4
    • E
      [NET]: Make socket creation namespace safe. · 1b8d7ae4
      Eric W. Biederman 提交于
      This patch passes in the namespace a new socket should be created in
      and has the socket code do the appropriate reference counting.  By
      virtue of this all socket create methods are touched.  In addition
      the socket create methods are modified so that they will fail if
      you attempt to create a socket in a non-default network namespace.
      
      Failing if we attempt to create a socket outside of the default
      network namespace ensures that as we incrementally make the network stack
      network namespace aware we will not export functionality that someone
      has not audited and made certain is network namespace safe.
      Allowing us to partially enable network namespaces before all of the
      exotic protocols are supported.
      
      Any protocol layers I have missed will fail to compile because I now
      pass an extra parameter into the socket creation code.
      
      [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b8d7ae4
    • E
      [NET]: Make /proc/net per network namespace · 457c4cbc
      Eric W. Biederman 提交于
      This patch makes /proc/net per network namespace.  It modifies the global
      variables proc_net and proc_net_stat to be per network namespace.
      The proc_net file helpers are modified to take a network namespace argument,
      and all of their callers are fixed to pass &init_net for that argument.
      This ensures that all of the /proc/net files are only visible and
      usable in the initial network namespace until the code behind them
      has been updated to be handle multiple network namespaces.
      
      Making /proc/net per namespace is necessary as at least some files
      in /proc/net depend upon the set of network devices which is per
      network namespace, and even more files in /proc/net have contents
      that are relevant to a single network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      457c4cbc
  8. 31 7月, 2007 1 次提交
  9. 12 7月, 2007 1 次提交
    • M
      [AF_UNIX]: Rewrite garbage collector, fixes race. · 1fd05ba5
      Miklos Szeredi 提交于
      Throw out the old mark & sweep garbage collector and put in a
      refcounting cycle detecting one.
      
      The old one had a race with recvmsg, that resulted in false positives
      and hence data loss.  The old algorithm operated on all unix sockets
      in the system, so any additional locking would have meant performance
      problems for all users of these.
      
      The new algorithm instead only operates on "in flight" sockets, which
      are very rare, and the additional locking for these doesn't negatively
      impact the vast majority of users.
      
      In fact it's probable, that there weren't *any* heavy senders of
      sockets over sockets, otherwise the above race would have been
      discovered long ago.
      
      The patch works OK with the app that exposed the race with the old
      code.  The garbage collection has also been verified to work in a few
      simple cases.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fd05ba5
  10. 11 7月, 2007 1 次提交
  11. 08 6月, 2007 1 次提交
    • M
      [AF_UNIX]: Fix stream recvmsg() race. · 3c0d2f37
      Miklos Szeredi 提交于
      A recv() on an AF_UNIX, SOCK_STREAM socket can race with a
      send()+close() on the peer, causing recv() to return zero, even though
      the sent data should be received.
      
      This happens if the send() and the close() is performed between
      skb_dequeue() and checking sk->sk_shutdown in unix_stream_recvmsg():
      
      process A  skb_dequeue() returns NULL, there's no data in the socket queue
      process B  new data is inserted onto the queue by unix_stream_sendmsg()
      process B  sk->sk_shutdown is set to SHUTDOWN_MASK by unix_release_sock()
      process A  sk->sk_shutdown is checked, unix_release_sock() returns zero
      
      I'm surprised nobody noticed this, it's not hard to trigger.  Maybe
      it's just (un)luck with the timing.
      
      It's possible to work around this bug in userspace, by retrying the
      recv() once in case of a zero return value.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c0d2f37
  12. 04 6月, 2007 2 次提交
    • D
      [AF_UNIX]: Fix datagram connect race causing an OOPS. · 278a3de5
      David S. Miller 提交于
      Based upon an excellent bug report and initial patch by
      Frederik Deweerdt.
      
      The UNIX datagram connect code blindly dereferences other->sk_socket
      via the call down to the security_unix_may_send() function.
      
      Without locking 'other' that pointer can go NULL via unix_release_sock()
      which does sock_orphan() which also marks the socket SOCK_DEAD.
      
      So we have to lock both 'sk' and 'other' yet avoid all kinds of
      potential deadlocks (connect to self is OK for datagram sockets and it
      is possible for two datagram sockets to perform a simultaneous connect
      to each other).  So what we do is have a "double lock" function similar
      to how we handle this situation in other areas of the kernel.  We take
      the lock of the socket pointer with the smallest address first in
      order to avoid ABBA style deadlocks.
      
      Once we have them both locked, we check to see if SOCK_DEAD is set
      for 'other' and if so, drop everything and retry the lookup.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      278a3de5
    • D
      [AF_UNIX]: Make socket locking much less confusing. · 1c92b4e5
      David S. Miller 提交于
      The unix_state_*() locking macros imply that there is some
      rwlock kind of thing going on, but the implementation is
      actually a spinlock which makes the code more confusing than
      it needs to be.
      
      So use plain unix_state_lock and unix_state_unlock.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c92b4e5
  13. 09 5月, 2007 1 次提交
  14. 26 4月, 2007 1 次提交
  15. 07 3月, 2007 1 次提交
  16. 03 3月, 2007 1 次提交
  17. 15 2月, 2007 2 次提交
    • E
      [PATCH] sysctl: remove insert_at_head from register_sysctl · 0b4d4147
      Eric W. Biederman 提交于
      The semantic effect of insert_at_head is that it would allow new registered
      sysctl entries to override existing sysctl entries of the same name.  Which is
      pain for caching and the proc interface never implemented.
      
      I have done an audit and discovered that none of the current users of
      register_sysctl care as (excpet for directories) they do not register
      duplicate sysctl entries.
      
      So this patch simply removes the support for overriding existing entries in
      the sys_sysctl interface since no one uses it or cares and it makes future
      enhancments harder.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b4d4147
    • T
      [PATCH] remove many unneeded #includes of sched.h · cd354f1a
      Tim Schmielau 提交于
      After Al Viro (finally) succeeded in removing the sched.h #include in module.h
      recently, it makes sense again to remove other superfluous sched.h includes.
      There are quite a lot of files which include it but don't actually need
      anything defined in there.  Presumably these includes were once needed for
      macros that used to live in sched.h, but moved to other header files in the
      course of cleaning it up.
      
      To ease the pain, this time I did not fiddle with any header files and only
      removed #includes from .c-files, which tend to cause less trouble.
      
      Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
      arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
      allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
      configs in arch/arm/configs on arm.  I also checked that no new warnings were
      introduced by the patch (actually, some warnings are removed that were emitted
      by unnecessarily included header files).
      Signed-off-by: NTim Schmielau <tim@physik3.uni-rostock.de>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd354f1a
  18. 13 2月, 2007 1 次提交
  19. 11 2月, 2007 1 次提交
  20. 09 12月, 2006 1 次提交
  21. 03 12月, 2006 1 次提交
  22. 23 9月, 2006 2 次提交
  23. 03 8月, 2006 1 次提交
    • C
      [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch · dc49c1f9
      Catherine Zhang 提交于
      From: Catherine Zhang <cxzhang@watson.ibm.com>
      
      This patch implements a cleaner fix for the memory leak problem of the
      original unix datagram getpeersec patch.  Instead of creating a
      security context each time a unix datagram is sent, we only create the
      security context when the receiver requests it.
      
      This new design requires modification of the current
      unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
      secid_to_secctx and release_secctx.  The former retrieves the security
      context and the latter releases it.  A hook is required for releasing
      the security context because it is up to the security module to decide
      how that's done.  In the case of Selinux, it's a simple kfree
      operation.
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc49c1f9
  24. 22 7月, 2006 1 次提交