1. 27 11月, 2008 1 次提交
  2. 10 11月, 2008 1 次提交
    • M
      net: unix: fix inflight counting bug in garbage collector · 6209344f
      Miklos Szeredi 提交于
      Previously I assumed that the receive queues of candidates don't
      change during the GC.  This is only half true, nothing can be received
      from the queues (see comment in unix_gc()), but buffers could be added
      through the other half of the socket pair, which may still have file
      descriptors referring to it.
      
      This can result in inc_inflight_move_tail() erronously increasing the
      "inflight" counter for a unix socket for which dec_inflight() wasn't
      previously called.  This in turn can trigger the "BUG_ON(total_refs <
      inflight_refs)" in a later garbage collection run.
      
      Fix this by only manipulating the "inflight" counter for sockets which
      are candidates themselves.  Duplicating the file references in
      unix_attach_fds() is also needed to prevent a socket becoming a
      candidate for GC while the skb that contains it is not yet queued.
      Reported-by: NAndrea Bittau <a.bittau@cs.ucl.ac.uk>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6209344f
  3. 02 11月, 2008 1 次提交
  4. 23 10月, 2008 1 次提交
  5. 14 10月, 2008 1 次提交
  6. 27 7月, 2008 1 次提交
  7. 26 7月, 2008 1 次提交
  8. 28 6月, 2008 1 次提交
    • R
      af_unix: fix 'poll for write'/connected DGRAM sockets · ec0d215f
      Rainer Weikusat 提交于
      For n:1 'datagram connections' (eg /dev/log), the unix_dgram_sendmsg
      routine implements a form of receiver-imposed flow control by
      comparing the length of the receive queue of the 'peer socket' with
      the max_ack_backlog value stored in the corresponding sock structure,
      either blocking the thread which caused the send-routine to be called
      or returning EAGAIN. This routine is used by both SOCK_DGRAM and
      SOCK_SEQPACKET sockets. The poll-implementation for these socket types
      is datagram_poll from core/datagram.c. A socket is deemed to be
      writeable by this routine when the memory presently consumed by
      datagrams owned by it is less than the configured socket send buffer
      size. This is always wrong for PF_UNIX non-stream sockets connected to
      server sockets dealing with (potentially) multiple clients if the
      abovementioned receive queue is currently considered to be full.
      'poll' will then return, indicating that the socket is writeable, but
      a subsequent write result in EAGAIN, effectively causing an (usual)
      application to 'poll for writeability by repeated send request with
      O_NONBLOCK set' until it has consumed its time quantum.
      
      The change below uses a suitably modified variant of the datagram_poll
      routines for both type of PF_UNIX sockets, which tests if the
      recv-queue of the peer a socket is connected to is presently
      considered to be 'full' as part of the 'is this socket
      writeable'-checking code. The socket being polled is additionally
      put onto the peer_wait wait queue associated with its peer, because the
      unix_dgram_recvmsg routine does a wake up on this queue after a
      datagram was received and the 'other wakeup call' is done implicitly
      as part of skb destruction, meaning, a process blocked in poll
      because of a full peer receive queue could otherwise sleep forever
      if no datagram owned by its socket was already sitting on this queue.
      Among this change is a small (inline) helper routine named
      'unix_recvq_full', which consolidates the actual testing code (in three
      different places) into a single location.
      Signed-off-by: NRainer Weikusat <rweikusat@mssgmbh.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0d215f
  9. 18 6月, 2008 1 次提交
    • R
      af_unix: fix 'poll for write'/ connected DGRAM sockets · 3c73419c
      Rainer Weikusat 提交于
      The unix_dgram_sendmsg routine implements a (somewhat crude)
      form of receiver-imposed flow control by comparing the length of the
      receive queue of the 'peer socket' with the max_ack_backlog value
      stored in the corresponding sock structure, either blocking
      the thread which caused the send-routine to be called or returning
      EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
      sockets. The poll-implementation for these socket types is
      datagram_poll from core/datagram.c. A socket is deemed to be writeable
      by this routine when the memory presently consumed by datagrams
      owned by it is less than the configured socket send buffer size. This
      is always wrong for connected PF_UNIX non-stream sockets when the
      abovementioned receive queue is currently considered to be full.
      'poll' will then return, indicating that the socket is writeable, but
      a subsequent write result in EAGAIN, effectively causing an
      (usual) application to 'poll for writeability by repeated send request
      with O_NONBLOCK set' until it has consumed its time quantum.
      
      The change below uses a suitably modified variant of the datagram_poll
      routines for both type of PF_UNIX sockets, which tests if the
      recv-queue of the peer a socket is connected to is presently
      considered to be 'full' as part of the 'is this socket
      writeable'-checking code. The socket being polled is additionally
      put onto the peer_wait wait queue associated with its peer, because the
      unix_dgram_sendmsg routine does a wake up on this queue after a
      datagram was received and the 'other wakeup call' is done implicitly
      as part of skb destruction, meaning, a process blocked in poll
      because of a full peer receive queue could otherwise sleep forever
      if no datagram owned by its socket was already sitting on this queue.
      Among this change is a small (inline) helper routine named
      'unix_recvq_full', which consolidates the actual testing code (in three
      different places) into a single location.
      Signed-off-by: NRainer Weikusat <rweikusat@mssgmbh.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c73419c
  10. 12 6月, 2008 1 次提交
  11. 24 4月, 2008 1 次提交
  12. 19 4月, 2008 1 次提交
  13. 13 4月, 2008 1 次提交
  14. 26 3月, 2008 3 次提交
  15. 06 3月, 2008 1 次提交
  16. 15 2月, 2008 2 次提交
  17. 29 1月, 2008 9 次提交
  18. 29 11月, 2007 1 次提交
    • F
      [UNIX]: EOF on non-blocking SOCK_SEQPACKET · 0a112258
      Florian Zumbiehl 提交于
      I am not absolutely sure whether this actually is a bug (as in: I've got
      no clue what the standards say or what other implementations do), but at
      least I was pretty surprised when I noticed that a recv() on a
      non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection
      oriented, after all) where the remote end has closed the connection
      returned -1 (EAGAIN) rather than 0 to indicate end of file.
      
      This is a test case:
      
      | #include <sys/types.h>
      | #include <unistd.h>
      | #include <sys/socket.h>
      | #include <sys/un.h>
      | #include <fcntl.h>
      | #include <string.h>
      | #include <stdlib.h>
      | 
      | int main(){
      | 	int sock;
      | 	struct sockaddr_un addr;
      | 	char buf[4096];
      | 	int pfds[2];
      | 
      | 	pipe(pfds);
      | 	sock=socket(PF_UNIX,SOCK_SEQPACKET,0);
      | 	addr.sun_family=AF_UNIX;
      | 	strcpy(addr.sun_path,"/tmp/foobar_testsock");
      | 	bind(sock,(struct sockaddr *)&addr,sizeof(addr));
      | 	listen(sock,1);
      | 	if(fork()){
      | 		close(sock);
      | 		sock=socket(PF_UNIX,SOCK_SEQPACKET,0);
      | 		connect(sock,(struct sockaddr *)&addr,sizeof(addr));
      | 		fcntl(sock,F_SETFL,fcntl(sock,F_GETFL)|O_NONBLOCK);
      | 		close(pfds[1]);
      | 		read(pfds[0],buf,sizeof(buf));
      | 		recv(sock,buf,sizeof(buf),0); // <-- this one
      | 	}else accept(sock,NULL,NULL);
      | 	exit(0);
      | }
      
      If you try it, make sure /tmp/foobar_testsock doesn't exist.
      
      The marked recv() returns -1 (EAGAIN) on 2.6.23.9. Below you find a
      patch that fixes that.
      Signed-off-by: NFlorian Zumbiehl <florz@florz.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0a112258
  19. 11 11月, 2007 2 次提交
  20. 01 11月, 2007 1 次提交
  21. 20 10月, 2007 1 次提交
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
  22. 15 10月, 2007 1 次提交
    • I
      sched: affine sync wakeups · 71e20f18
      Ingo Molnar 提交于
      make sync wakeups affine for cache-cold tasks: if a cache-cold task
      is woken up by a sync wakeup then use the opportunity to migrate it
      straight away. (the two tasks are 'related' because they communicate)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71e20f18
  23. 11 10月, 2007 3 次提交
    • P
      [NET]: Make core networking code use seq_open_private · cf7732e4
      Pavel Emelyanov 提交于
      This concerns the ipv4 and ipv6 code mostly, but also the netlink
      and unix sockets.
      
      The netlink code is an example of how to use the __seq_open_private()
      call - it saves the net namespace on this private.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf7732e4
    • E
      [NET]: Make socket creation namespace safe. · 1b8d7ae4
      Eric W. Biederman 提交于
      This patch passes in the namespace a new socket should be created in
      and has the socket code do the appropriate reference counting.  By
      virtue of this all socket create methods are touched.  In addition
      the socket create methods are modified so that they will fail if
      you attempt to create a socket in a non-default network namespace.
      
      Failing if we attempt to create a socket outside of the default
      network namespace ensures that as we incrementally make the network stack
      network namespace aware we will not export functionality that someone
      has not audited and made certain is network namespace safe.
      Allowing us to partially enable network namespaces before all of the
      exotic protocols are supported.
      
      Any protocol layers I have missed will fail to compile because I now
      pass an extra parameter into the socket creation code.
      
      [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b8d7ae4
    • E
      [NET]: Make /proc/net per network namespace · 457c4cbc
      Eric W. Biederman 提交于
      This patch makes /proc/net per network namespace.  It modifies the global
      variables proc_net and proc_net_stat to be per network namespace.
      The proc_net file helpers are modified to take a network namespace argument,
      and all of their callers are fixed to pass &init_net for that argument.
      This ensures that all of the /proc/net files are only visible and
      usable in the initial network namespace until the code behind them
      has been updated to be handle multiple network namespaces.
      
      Making /proc/net per namespace is necessary as at least some files
      in /proc/net depend upon the set of network devices which is per
      network namespace, and even more files in /proc/net have contents
      that are relevant to a single network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      457c4cbc
  24. 31 7月, 2007 1 次提交
  25. 12 7月, 2007 1 次提交
    • M
      [AF_UNIX]: Rewrite garbage collector, fixes race. · 1fd05ba5
      Miklos Szeredi 提交于
      Throw out the old mark & sweep garbage collector and put in a
      refcounting cycle detecting one.
      
      The old one had a race with recvmsg, that resulted in false positives
      and hence data loss.  The old algorithm operated on all unix sockets
      in the system, so any additional locking would have meant performance
      problems for all users of these.
      
      The new algorithm instead only operates on "in flight" sockets, which
      are very rare, and the additional locking for these doesn't negatively
      impact the vast majority of users.
      
      In fact it's probable, that there weren't *any* heavy senders of
      sockets over sockets, otherwise the above race would have been
      discovered long ago.
      
      The patch works OK with the app that exposed the race with the old
      code.  The garbage collection has also been verified to work in a few
      simple cases.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fd05ba5
  26. 11 7月, 2007 1 次提交