1. 25 1月, 2016 1 次提交
  2. 11 1月, 2016 1 次提交
  3. 05 1月, 2016 1 次提交
    • R
      af_unix: Fix splice-bind deadlock · c845acb3
      Rainer Weikusat 提交于
      On 2015/11/06, Dmitry Vyukov reported a deadlock involving the splice
      system call and AF_UNIX sockets,
      
      http://lists.openwall.net/netdev/2015/11/06/24
      
      The situation was analyzed as
      
      (a while ago) A: socketpair()
      B: splice() from a pipe to /mnt/regular_file
      	does sb_start_write() on /mnt
      C: try to freeze /mnt
      	wait for B to finish with /mnt
      A: bind() try to bind our socket to /mnt/new_socket_name
      	lock our socket, see it not bound yet
      	decide that it needs to create something in /mnt
      	try to do sb_start_write() on /mnt, block (it's
      	waiting for C).
      D: splice() from the same pipe to our socket
      	lock the pipe, see that socket is connected
      	try to lock the socket, block waiting for A
      B:	get around to actually feeding a chunk from
      	pipe to file, try to lock the pipe.  Deadlock.
      
      on 2015/11/10 by Al Viro,
      
      http://lists.openwall.net/netdev/2015/11/10/4
      
      The patch fixes this by removing the kern_path_create related code from
      unix_mknod and executing it as part of unix_bind prior acquiring the
      readlock of the socket in question. This means that A (as used above)
      will sb_start_write on /mnt before it acquires the readlock, hence, it
      won't indirectly block B which first did a sb_start_write and then
      waited for a thread trying to acquire the readlock. Consequently, A
      being blocked by C waiting for B won't cause a deadlock anymore
      (effectively, both A and B acquire two locks in opposite order in the
      situation described above).
      
      Dmitry Vyukov(<dvyukov@google.com>) tested the original patch.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c845acb3
  4. 18 12月, 2015 1 次提交
  5. 07 12月, 2015 1 次提交
    • R
      af_unix: fix unix_dgram_recvmsg entry locking · 64874280
      Rainer Weikusat 提交于
      The current unix_dgram_recvsmg code acquires the u->readlock mutex in
      order to protect access to the peek offset prior to calling
      __skb_recv_datagram for actually receiving data. This implies that a
      blocking reader will go to sleep with this mutex held if there's
      presently no data to return to userspace. Two non-desirable side effects
      of this are that a later non-blocking read call on the same socket will
      block on the ->readlock mutex until the earlier blocking call releases it
      (or the readers is interrupted) and that later blocking read calls
      will wait longer than the effective socket read timeout says they
      should: The timeout will only start 'ticking' once such a reader hits
      the schedule_timeout in wait_for_more_packets (core.c) while the time it
      already had to wait until it could acquire the mutex is unaccounted for.
      
      The patch avoids both by using the __skb_try_recv_datagram and
      __skb_wait_for_more packets functions created by the first patch to
      implement a unix_dgram_recvmsg read loop which releases the readlock
      mutex prior to going to sleep and reacquires it as needed
      afterwards. Non-blocking readers will thus immediately return with
      -EAGAIN if there's no data available regardless of any concurrent
      blocking readers and all blocking readers will end up sleeping via
      schedule_timeout, thus honouring the configured socket receive timeout.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64874280
  6. 02 12月, 2015 2 次提交
  7. 01 12月, 2015 2 次提交
  8. 24 11月, 2015 1 次提交
    • R
      unix: avoid use-after-free in ep_remove_wait_queue · 7d267278
      Rainer Weikusat 提交于
      Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
      An AF_UNIX datagram socket being the client in an n:1 association with
      some server socket is only allowed to send messages to the server if the
      receive queue of this socket contains at most sk_max_ack_backlog
      datagrams. This implies that prospective writers might be forced to go
      to sleep despite none of the message presently enqueued on the server
      receive queue were sent by them. In order to ensure that these will be
      woken up once space becomes again available, the present unix_dgram_poll
      routine does a second sock_poll_wait call with the peer_wait wait queue
      of the server socket as queue argument (unix_dgram_recvmsg does a wake
      up on this queue after a datagram was received). This is inherently
      problematic because the server socket is only guaranteed to remain alive
      for as long as the client still holds a reference to it. In case the
      connection is dissolved via connect or by the dead peer detection logic
      in unix_dgram_sendmsg, the server socket may be freed despite "the
      polling mechanism" (in particular, epoll) still has a pointer to the
      corresponding peer_wait queue. There's no way to forcibly deregister a
      wait queue with epoll.
      
      Based on an idea by Jason Baron, the patch below changes the code such
      that a wait_queue_t belonging to the client socket is enqueued on the
      peer_wait queue of the server whenever the peer receive queue full
      condition is detected by either a sendmsg or a poll. A wake up on the
      peer queue is then relayed to the ordinary wait queue of the client
      socket via wake function. The connection to the peer wait queue is again
      dissolved if either a wake up is about to be relayed or the client
      socket reconnects or a dead peer is detected or the client socket is
      itself closed. This enables removing the second sock_poll_wait from
      unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
      that no blocked writer sleeps forever.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Fixes: ec0d215f ("af_unix: fix 'poll for write'/connected DGRAM sockets")
      Reviewed-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d267278
  9. 18 11月, 2015 1 次提交
  10. 17 11月, 2015 1 次提交
  11. 16 11月, 2015 1 次提交
    • H
      af-unix: fix use-after-free with concurrent readers while splicing · 73ed5d25
      Hannes Frederic Sowa 提交于
      During splicing an af-unix socket to a pipe we have to drop all
      af-unix socket locks. While doing so we allow another reader to enter
      unix_stream_read_generic which can read, copy and finally free another
      skb. If exactly this skb is just in process of being spliced we get a
      use-after-free report by kasan.
      
      First, we must make sure to not have a free while the skb is used during
      the splice operation. We simply increment its use counter before unlocking
      the reader lock.
      
      Stream sockets have the nice characteristic that we don't care about
      zero length writes and they never reach the peer socket's queue. That
      said, we can take the UNIXCB.consumed field as the indicator if the
      skb was already freed from the socket's receive queue. If the skb was
      fully consumed after we locked the reader side again we know it has been
      dropped by a second reader. We indicate a short read to user space and
      abort the current splice operation.
      
      This bug has been found with syzkaller
      (http://github.com/google/syzkaller) by Dmitry Vyukov.
      
      Fixes: 2b514574 ("net: af_unix: implement splice for stream af_unix sockets")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73ed5d25
  12. 25 10月, 2015 1 次提交
  13. 05 10月, 2015 1 次提交
  14. 30 9月, 2015 1 次提交
  15. 11 6月, 2015 1 次提交
  16. 27 5月, 2015 1 次提交
  17. 25 5月, 2015 2 次提交
  18. 11 5月, 2015 1 次提交
  19. 16 4月, 2015 2 次提交
  20. 03 3月, 2015 1 次提交
  21. 29 1月, 2015 1 次提交
    • C
      net: remove sock_iocb · 7cc05662
      Christoph Hellwig 提交于
      The sock_iocb structure is allocate on stack for each read/write-like
      operation on sockets, and contains various fields of which only the
      embedded msghdr and sometimes a pointer to the scm_cookie is ever used.
      Get rid of the sock_iocb and put a msghdr directly on the stack and pass
      the scm_cookie explicitly to netlink_mmap_sendmsg.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cc05662
  22. 10 12月, 2014 1 次提交
    • A
      put iov_iter into msghdr · c0371da6
      Al Viro 提交于
      Note that the code _using_ ->msg_iter at that point will be very
      unhappy with anything other than unshifted iovec-backed iov_iter.
      We still need to convert users to proper primitives.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c0371da6
  23. 24 11月, 2014 1 次提交
  24. 06 11月, 2014 1 次提交
    • D
      net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b
      David S. Miller 提交于
      This encapsulates all of the skb_copy_datagram_iovec() callers
      with call argument signature "skb, offset, msghdr->msg_iov, length".
      
      When we move to iov_iters in the networking, the iov_iter object will
      sit in the msghdr.
      
      Having a helper like this means there will be less places to touch
      during that transformation.
      
      Based upon descriptions and patch from Al Viro.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51f3d02b
  25. 17 5月, 2014 1 次提交
  26. 18 4月, 2014 1 次提交
  27. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  28. 27 3月, 2014 1 次提交
  29. 07 3月, 2014 1 次提交
    • A
      net: unix socket code abuses csum_partial · 0a13404d
      Anton Blanchard 提交于
      The unix socket code is using the result of csum_partial to
      hash into a lookup table:
      
      	unix_hash_fold(csum_partial(sunaddr, len, 0));
      
      csum_partial is only guaranteed to produce something that can be
      folded into a checksum, as its prototype explains:
      
       * returns a 32-bit number suitable for feeding into itself
       * or csum_tcpudp_magic
      
      The 32bit value should not be used directly.
      
      Depending on the alignment, the ppc64 csum_partial will return
      different 32bit partial checksums that will fold into the same
      16bit checksum.
      
      This difference causes the following testcase (courtesy of
      Gustavo) to sometimes fail:
      
      #include <sys/socket.h>
      #include <stdio.h>
      
      int main()
      {
      	int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	int i = 1;
      	setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);
      
      	struct sockaddr addr;
      	addr.sa_family = AF_LOCAL;
      	bind(fd, &addr, 2);
      
      	listen(fd, 128);
      
      	struct sockaddr_storage ss;
      	socklen_t sslen = (socklen_t)sizeof(ss);
      	getsockname(fd, (struct sockaddr*)&ss, &sslen);
      
      	fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
      		perror(NULL);
      		return 1;
      	}
      	printf("OK\n");
      	return 0;
      }
      
      As suggested by davem, fix this by using csum_fold to fold the
      partial 32bit checksum into a 16bit checksum before using it.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a13404d
  30. 19 1月, 2014 1 次提交
  31. 18 12月, 2013 1 次提交
  32. 11 12月, 2013 1 次提交
  33. 07 12月, 2013 1 次提交
  34. 21 11月, 2013 1 次提交
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
  35. 20 10月, 2013 1 次提交
    • D
      net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race · 90c6bd34
      Daniel Borkmann 提交于
      In the case of credentials passing in unix stream sockets (dgram
      sockets seem not affected), we get a rather sparse race after
      commit 16e57262 ("af_unix: dont send SCM_CREDENTIALS by default").
      
      We have a stream server on receiver side that requests credential
      passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
      on each spawned/accepted socket on server side to 1 first (as it's
      not inherited), it can happen that in the time between accept() and
      setsockopt() we get interrupted, the sender is being scheduled and
      continues with passing data to our receiver. At that time SO_PASSCRED
      is neither set on sender nor receiver side, hence in cmsg's
      SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
      (== overflow{u,g}id) instead of what we actually would like to see.
      
      On the sender side, here nc -U, the tests in maybe_add_creds()
      invoked through unix_stream_sendmsg() would fail, as at that exact
      time, as mentioned, the sender has neither SO_PASSCRED on his side
      nor sees it on the server side, and we have a valid 'other' socket
      in place. Thus, sender believes it would just look like a normal
      connection, not needing/requesting SO_PASSCRED at that time.
      
      As reverting 16e57262 would not be an option due to the significant
      performance regression reported when having creds always passed,
      one way/trade-off to prevent that would be to set SO_PASSCRED on
      the listener socket and allow inheriting these flags to the spawned
      socket on server side in accept(). It seems also logical to do so
      if we'd tell the listener socket to pass those flags onwards, and
      would fix the race.
      
      Before, strace:
      
      recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
              msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
              cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
              msg_flags=0}, 0) = 5
      
      After, strace:
      
      recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
              msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
              cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
              msg_flags=0}, 0) = 5
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90c6bd34
  36. 12 8月, 2013 1 次提交