1. 20 7月, 2012 1 次提交
    • Y
      net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) · cf60af03
      Yuchung Cheng 提交于
      sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
      and write(2). The application should replace connect() with it to
      send data in the opening SYN packet.
      
      For blocking socket, sendmsg() blocks until all the data are buffered
      locally and the handshake is completed like connect() call. It
      returns similar errno like connect() if the TCP handshake fails.
      
      For non-blocking socket, it returns the number of bytes queued (and
      transmitted in the SYN-data packet) if cookie is available. If cookie
      is not available, it transmits a data-less SYN packet with Fast Open
      cookie request option and returns -EINPROGRESS like connect().
      
      Using MSG_FASTOPEN on connecting or connected socket will result in
      simlar errno like repeating connect() calls. Therefore the application
      should only use this flag on new sockets.
      
      The buffer size of sendmsg() is independent of the MSS of the connection.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf60af03
  2. 16 4月, 2012 1 次提交
  3. 06 4月, 2012 1 次提交
    • E
      tcp: tcp_sendpages() should call tcp_push() once · 35f9c09f
      Eric Dumazet 提交于
      commit 2f533844 (tcp: allow splice() to build full TSO packets) added
      a regression for splice() calls using SPLICE_F_MORE.
      
      We need to call tcp_flush() at the end of the last page processed in
      tcp_sendpages(), or else transmits can be deferred and future sends
      stall.
      
      Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
      with different semantic.
      
      For all sendpage() providers, its a transparent change. Only
      sock_sendpage() and tcp_sendpages() can differentiate the two different
      flags provided by pipe_to_sendpage()
      Reported-by: NTom Herbert <therbert@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail&gt;com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f9c09f
  4. 12 3月, 2012 1 次提交
  5. 08 8月, 2011 1 次提交
  6. 06 7月, 2011 1 次提交
  7. 06 5月, 2011 1 次提交
    • A
      net: Add sendmmsg socket system call · 228e548e
      Anton Blanchard 提交于
      This patch adds a multiple message send syscall and is the send
      version of the existing recvmmsg syscall. This is heavily
      based on the patch by Arnaldo that added recvmmsg.
      
      I wrote a microbenchmark to test the performance gains of using
      this new syscall:
      
      http://ozlabs.org/~anton/junkcode/sendmmsg_test.c
      
      The test was run on a ppc64 box with a 10 Gbit network card. The
      benchmark can send both UDP and RAW ethernet packets.
      
      64B UDP
      
      batch   pkts/sec
      1       804570
      2       872800 (+ 8 %)
      4       916556 (+14 %)
      8       939712 (+17 %)
      16      952688 (+18 %)
      32      956448 (+19 %)
      64      964800 (+20 %)
      
      64B raw socket
      
      batch   pkts/sec
      1       1201449
      2       1350028 (+12 %)
      4       1461416 (+22 %)
      8       1513080 (+26 %)
      16      1541216 (+28 %)
      32      1553440 (+29 %)
      64      1557888 (+30 %)
      
      We see a 20% improvement in throughput on UDP send and 30%
      on raw socket send.
      
      [ Add sparc syscall entries. -DaveM ]
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      228e548e
  8. 31 3月, 2011 1 次提交
  9. 07 1月, 2011 1 次提交
  10. 19 11月, 2010 1 次提交
  11. 29 10月, 2010 1 次提交
    • D
      net: Limit socket I/O iovec total length to INT_MAX. · 8acfe468
      David S. Miller 提交于
      This helps protect us from overflow issues down in the
      individual protocol sendmsg/recvmsg handlers.  Once
      we hit INT_MAX we truncate out the rest of the iovec
      by setting the iov_len members to zero.
      
      This works because:
      
      1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
         writes are allowed and the application will just continue
         with another write to send the rest of the data.
      
      2) For datagram oriented sockets, where there must be a
         one-to-one correspondance between write() calls and
         packets on the wire, INT_MAX is going to be far larger
         than the packet size limit the protocol is going to
         check for and signal with -EMSGSIZE.
      
      Based upon a patch by Linus Torvalds.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8acfe468
  12. 21 10月, 2010 1 次提交
  13. 28 9月, 2010 1 次提交
  14. 17 6月, 2010 1 次提交
  15. 31 3月, 2010 1 次提交
  16. 27 3月, 2010 1 次提交
  17. 29 10月, 2009 1 次提交
  18. 13 10月, 2009 1 次提交
    • A
      net: Introduce recvmmsg socket syscall · a2e27255
      Arnaldo Carvalho de Melo 提交于
      Meaning receive multiple messages, reducing the number of syscalls and
      net stack entry/exit operations.
      
      Next patches will introduce mechanisms where protocols that want to
      optimize this operation will provide an unlocked_recvmsg operation.
      
      This takes into account comments made by:
      
      . Paul Moore: sock_recvmsg is called only for the first datagram,
        sock_recvmsg_nosec is used for the rest.
      
      . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
        works in the same fashion as the ppoll one.
      
        If the underlying protocol returns a datagram with MSG_OOB set, this
        will make recvmmsg return right away with as many datagrams (+ the OOB
        one) it has received so far.
      
      . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
        datagrams and then recvmsg returns an error, recvmmsg will return
        the successfully received datagrams, store the error and return it
        in the next call.
      
      This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
      where we will be able to acquire the lock only at batch start and end, not at
      every underlying recvmsg call.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2e27255
  19. 05 10月, 2009 1 次提交
  20. 09 6月, 2009 1 次提交
  21. 23 4月, 2009 1 次提交
  22. 21 4月, 2009 2 次提交
  23. 27 2月, 2009 1 次提交
  24. 03 2月, 2009 1 次提交
  25. 06 10月, 2008 1 次提交
  26. 23 9月, 2008 1 次提交
  27. 27 7月, 2008 1 次提交
  28. 20 7月, 2008 1 次提交
  29. 29 1月, 2008 2 次提交
  30. 22 10月, 2007 1 次提交
  31. 17 7月, 2007 1 次提交
    • U
      O_CLOEXEC for SCM_RIGHTS · 4a19542e
      Ulrich Drepper 提交于
      Part two in the O_CLOEXEC saga: adding support for file descriptors received
      through Unix domain sockets.
      
      The patch is once again pretty minimal, it introduces a new flag for recvmsg
      and passes it just like the existing MSG_CMSG_COMPAT flag.  I think this bit
      is not used otherwise but the networking people will know better.
      
      This new flag is not recognized by recvfrom and recv.  These functions cannot
      be used for that purpose and the asymmetry this introduces is not worse than
      the already existing MSG_CMSG_COMPAT situations.
      
      The patch must be applied on the patch which introduced O_CLOEXEC.  It has to
      remove static from the new get_unused_fd_flags function but since scm.c cannot
      live in a module the function still hasn't to be exported.
      
      Here's a test program to make sure the code works.  It's so much longer than
      the actual patch...
      
      #include <errno.h>
      #include <error.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/socket.h>
      #include <sys/un.h>
      
      #ifndef O_CLOEXEC
      # define O_CLOEXEC 02000000
      #endif
      #ifndef MSG_CMSG_CLOEXEC
      # define MSG_CMSG_CLOEXEC 0x40000000
      #endif
      
      int
      main (int argc, char *argv[])
      {
        if (argc > 1)
          {
            int fd = atol (argv[1]);
            printf ("child: fd = %d\n", fd);
            if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
              {
                puts ("file descriptor valid in child");
                return 1;
              }
            return 0;
      
          }
      
        struct sockaddr_un sun;
        strcpy (sun.sun_path, "./testsocket");
        sun.sun_family = AF_UNIX;
      
        char databuf[] = "hello";
        struct iovec iov[1];
        iov[0].iov_base = databuf;
        iov[0].iov_len = sizeof (databuf);
      
        union
        {
          struct cmsghdr hdr;
          char bytes[CMSG_SPACE (sizeof (int))];
        } buf;
        struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
                              .msg_control = buf.bytes,
                              .msg_controllen = sizeof (buf) };
        struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);
      
        cmsg->cmsg_level = SOL_SOCKET;
        cmsg->cmsg_type = SCM_RIGHTS;
        cmsg->cmsg_len = CMSG_LEN (sizeof (int));
      
        msg.msg_controllen = cmsg->cmsg_len;
      
        pid_t child = fork ();
        if (child == -1)
          error (1, errno, "fork");
        if (child == 0)
          {
            int sock = socket (PF_UNIX, SOCK_STREAM, 0);
            if (sock < 0)
              error (1, errno, "socket");
      
            if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
              error (1, errno, "bind");
            if (listen (sock, SOMAXCONN) < 0)
              error (1, errno, "listen");
      
            int conn = accept (sock, NULL, NULL);
            if (conn == -1)
              error (1, errno, "accept");
      
            *(int *) CMSG_DATA (cmsg) = sock;
            if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
              error (1, errno, "sendmsg");
      
            return 0;
          }
      
        /* For a test suite this should be more robust like a
           barrier in shared memory.  */
        sleep (1);
      
        int sock = socket (PF_UNIX, SOCK_STREAM, 0);
        if (sock < 0)
          error (1, errno, "socket");
      
        if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
          error (1, errno, "connect");
        unlink (sun.sun_path);
      
        *(int *) CMSG_DATA (cmsg) = -1;
      
        if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
          error (1, errno, "recvmsg");
      
        int fd = *(int *) CMSG_DATA (cmsg);
        if (fd == -1)
          error (1, 0, "no descriptor received");
      
        char fdname[20];
        snprintf (fdname, sizeof (fdname), "%d", fd);
        execl ("/proc/self/exe", argv[0], fdname, NULL);
        puts ("execl failed");
        return 1;
      }
      
      [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Michael Buesch <mb@bu3sch.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a19542e
  32. 11 7月, 2007 1 次提交
    • J
      [L2TP]: Changes to existing ppp and socket kernel headers for L2TP · cf14a4d0
      James Chapman 提交于
      Add struct sockaddr_pppol2tp to carry L2TP-specific address
      information for the PPPoX (PPPoL2TP) socket. Unfortunately we can't
      use the union inside struct sockaddr_pppox because the L2TP-specific
      data is larger than the current size of the union and we must preserve
      the size of struct sockaddr_pppox for binary compatibility.
      
      Also add a PPPIOCGL2TPSTATS ioctl to allow userspace to obtain
      L2TP counters and state from the kernel.
      
      Add new if_pppol2tp.h header.
      
      [ Modified to use aligned_u64 in statistics structure -DaveM ]
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf14a4d0
  33. 27 4月, 2007 1 次提交
  34. 01 3月, 2007 1 次提交
  35. 12 2月, 2007 1 次提交
  36. 09 2月, 2007 1 次提交
  37. 03 12月, 2006 2 次提交
    • A
      [NET]: Annotate csum_partial() callers in net/* · 44bb9363
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44bb9363
    • G
      [NET]: Supporting UDP-Lite (RFC 3828) in Linux · ba4e58ec
      Gerrit Renker 提交于
      This is a revision of the previously submitted patch, which alters
      the way files are organized and compiled in the following manner:
      
      	* UDP and UDP-Lite now use separate object files
      	* source file dependencies resolved via header files
      	  net/ipv{4,6}/udp_impl.h
      	* order of inclusion files in udp.c/udplite.c adapted
      	  accordingly
      
      [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)
      
      This patch adds support for UDP-Lite to the IPv4 stack, provided as an
      extension to the existing UDPv4 code:
              * generic routines are all located in net/ipv4/udp.c
              * UDP-Lite specific routines are in net/ipv4/udplite.c
              * MIB/statistics support in /proc/net/snmp and /proc/net/udplite
              * shared API with extensions for partial checksum coverage
      
      [NET/IPv6]: Extension for UDP-Lite over IPv6
      
      It extends the existing UDPv6 code base with support for UDP-Lite
      in the same manner as per UDPv4. In particular,
              * UDPv6 generic and shared code is in net/ipv6/udp.c
              * UDP-Litev6 specific extensions are in net/ipv6/udplite.c
              * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
              * support for IPV6_ADDRFORM
              * aligned the coding style of protocol initialisation with af_inet6.c
              * made the error handling in udpv6_queue_rcv_skb consistent;
                to return `-1' on error on all error cases
              * consolidation of shared code
      
      [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support
      
      The UDP-Lite patch further provides
              * API documentation for UDP-Lite
              * basic xfrm support
              * basic netfilter support for IPv4 and IPv6 (LOG target)
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba4e58ec