1. 06 7月, 2011 1 次提交
  2. 06 5月, 2011 1 次提交
    • A
      net: Add sendmmsg socket system call · 228e548e
      Anton Blanchard 提交于
      This patch adds a multiple message send syscall and is the send
      version of the existing recvmmsg syscall. This is heavily
      based on the patch by Arnaldo that added recvmmsg.
      
      I wrote a microbenchmark to test the performance gains of using
      this new syscall:
      
      http://ozlabs.org/~anton/junkcode/sendmmsg_test.c
      
      The test was run on a ppc64 box with a 10 Gbit network card. The
      benchmark can send both UDP and RAW ethernet packets.
      
      64B UDP
      
      batch   pkts/sec
      1       804570
      2       872800 (+ 8 %)
      4       916556 (+14 %)
      8       939712 (+17 %)
      16      952688 (+18 %)
      32      956448 (+19 %)
      64      964800 (+20 %)
      
      64B raw socket
      
      batch   pkts/sec
      1       1201449
      2       1350028 (+12 %)
      4       1461416 (+22 %)
      8       1513080 (+26 %)
      16      1541216 (+28 %)
      32      1553440 (+29 %)
      64      1557888 (+30 %)
      
      We see a 20% improvement in throughput on UDP send and 30%
      on raw socket send.
      
      [ Add sparc syscall entries. -DaveM ]
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      228e548e
  3. 31 3月, 2011 1 次提交
  4. 07 1月, 2011 1 次提交
  5. 19 11月, 2010 1 次提交
  6. 29 10月, 2010 1 次提交
    • D
      net: Limit socket I/O iovec total length to INT_MAX. · 8acfe468
      David S. Miller 提交于
      This helps protect us from overflow issues down in the
      individual protocol sendmsg/recvmsg handlers.  Once
      we hit INT_MAX we truncate out the rest of the iovec
      by setting the iov_len members to zero.
      
      This works because:
      
      1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
         writes are allowed and the application will just continue
         with another write to send the rest of the data.
      
      2) For datagram oriented sockets, where there must be a
         one-to-one correspondance between write() calls and
         packets on the wire, INT_MAX is going to be far larger
         than the packet size limit the protocol is going to
         check for and signal with -EMSGSIZE.
      
      Based upon a patch by Linus Torvalds.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8acfe468
  7. 21 10月, 2010 1 次提交
  8. 28 9月, 2010 1 次提交
  9. 17 6月, 2010 1 次提交
  10. 31 3月, 2010 1 次提交
  11. 27 3月, 2010 1 次提交
  12. 29 10月, 2009 1 次提交
  13. 13 10月, 2009 1 次提交
    • A
      net: Introduce recvmmsg socket syscall · a2e27255
      Arnaldo Carvalho de Melo 提交于
      Meaning receive multiple messages, reducing the number of syscalls and
      net stack entry/exit operations.
      
      Next patches will introduce mechanisms where protocols that want to
      optimize this operation will provide an unlocked_recvmsg operation.
      
      This takes into account comments made by:
      
      . Paul Moore: sock_recvmsg is called only for the first datagram,
        sock_recvmsg_nosec is used for the rest.
      
      . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
        works in the same fashion as the ppoll one.
      
        If the underlying protocol returns a datagram with MSG_OOB set, this
        will make recvmmsg return right away with as many datagrams (+ the OOB
        one) it has received so far.
      
      . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
        datagrams and then recvmsg returns an error, recvmmsg will return
        the successfully received datagrams, store the error and return it
        in the next call.
      
      This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
      where we will be able to acquire the lock only at batch start and end, not at
      every underlying recvmsg call.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2e27255
  14. 05 10月, 2009 1 次提交
  15. 09 6月, 2009 1 次提交
  16. 23 4月, 2009 1 次提交
  17. 21 4月, 2009 2 次提交
  18. 27 2月, 2009 1 次提交
  19. 03 2月, 2009 1 次提交
  20. 06 10月, 2008 1 次提交
  21. 23 9月, 2008 1 次提交
  22. 27 7月, 2008 1 次提交
  23. 20 7月, 2008 1 次提交
  24. 29 1月, 2008 2 次提交
  25. 22 10月, 2007 1 次提交
  26. 17 7月, 2007 1 次提交
    • U
      O_CLOEXEC for SCM_RIGHTS · 4a19542e
      Ulrich Drepper 提交于
      Part two in the O_CLOEXEC saga: adding support for file descriptors received
      through Unix domain sockets.
      
      The patch is once again pretty minimal, it introduces a new flag for recvmsg
      and passes it just like the existing MSG_CMSG_COMPAT flag.  I think this bit
      is not used otherwise but the networking people will know better.
      
      This new flag is not recognized by recvfrom and recv.  These functions cannot
      be used for that purpose and the asymmetry this introduces is not worse than
      the already existing MSG_CMSG_COMPAT situations.
      
      The patch must be applied on the patch which introduced O_CLOEXEC.  It has to
      remove static from the new get_unused_fd_flags function but since scm.c cannot
      live in a module the function still hasn't to be exported.
      
      Here's a test program to make sure the code works.  It's so much longer than
      the actual patch...
      
      #include <errno.h>
      #include <error.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/socket.h>
      #include <sys/un.h>
      
      #ifndef O_CLOEXEC
      # define O_CLOEXEC 02000000
      #endif
      #ifndef MSG_CMSG_CLOEXEC
      # define MSG_CMSG_CLOEXEC 0x40000000
      #endif
      
      int
      main (int argc, char *argv[])
      {
        if (argc > 1)
          {
            int fd = atol (argv[1]);
            printf ("child: fd = %d\n", fd);
            if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
              {
                puts ("file descriptor valid in child");
                return 1;
              }
            return 0;
      
          }
      
        struct sockaddr_un sun;
        strcpy (sun.sun_path, "./testsocket");
        sun.sun_family = AF_UNIX;
      
        char databuf[] = "hello";
        struct iovec iov[1];
        iov[0].iov_base = databuf;
        iov[0].iov_len = sizeof (databuf);
      
        union
        {
          struct cmsghdr hdr;
          char bytes[CMSG_SPACE (sizeof (int))];
        } buf;
        struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
                              .msg_control = buf.bytes,
                              .msg_controllen = sizeof (buf) };
        struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);
      
        cmsg->cmsg_level = SOL_SOCKET;
        cmsg->cmsg_type = SCM_RIGHTS;
        cmsg->cmsg_len = CMSG_LEN (sizeof (int));
      
        msg.msg_controllen = cmsg->cmsg_len;
      
        pid_t child = fork ();
        if (child == -1)
          error (1, errno, "fork");
        if (child == 0)
          {
            int sock = socket (PF_UNIX, SOCK_STREAM, 0);
            if (sock < 0)
              error (1, errno, "socket");
      
            if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
              error (1, errno, "bind");
            if (listen (sock, SOMAXCONN) < 0)
              error (1, errno, "listen");
      
            int conn = accept (sock, NULL, NULL);
            if (conn == -1)
              error (1, errno, "accept");
      
            *(int *) CMSG_DATA (cmsg) = sock;
            if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
              error (1, errno, "sendmsg");
      
            return 0;
          }
      
        /* For a test suite this should be more robust like a
           barrier in shared memory.  */
        sleep (1);
      
        int sock = socket (PF_UNIX, SOCK_STREAM, 0);
        if (sock < 0)
          error (1, errno, "socket");
      
        if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
          error (1, errno, "connect");
        unlink (sun.sun_path);
      
        *(int *) CMSG_DATA (cmsg) = -1;
      
        if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
          error (1, errno, "recvmsg");
      
        int fd = *(int *) CMSG_DATA (cmsg);
        if (fd == -1)
          error (1, 0, "no descriptor received");
      
        char fdname[20];
        snprintf (fdname, sizeof (fdname), "%d", fd);
        execl ("/proc/self/exe", argv[0], fdname, NULL);
        puts ("execl failed");
        return 1;
      }
      
      [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Michael Buesch <mb@bu3sch.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a19542e
  27. 11 7月, 2007 1 次提交
    • J
      [L2TP]: Changes to existing ppp and socket kernel headers for L2TP · cf14a4d0
      James Chapman 提交于
      Add struct sockaddr_pppol2tp to carry L2TP-specific address
      information for the PPPoX (PPPoL2TP) socket. Unfortunately we can't
      use the union inside struct sockaddr_pppox because the L2TP-specific
      data is larger than the current size of the union and we must preserve
      the size of struct sockaddr_pppox for binary compatibility.
      
      Also add a PPPIOCGL2TPSTATS ioctl to allow userspace to obtain
      L2TP counters and state from the kernel.
      
      Add new if_pppol2tp.h header.
      
      [ Modified to use aligned_u64 in statistics structure -DaveM ]
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf14a4d0
  28. 27 4月, 2007 1 次提交
  29. 01 3月, 2007 1 次提交
  30. 12 2月, 2007 1 次提交
  31. 09 2月, 2007 1 次提交
  32. 03 12月, 2006 2 次提交
    • A
      [NET]: Annotate csum_partial() callers in net/* · 44bb9363
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44bb9363
    • G
      [NET]: Supporting UDP-Lite (RFC 3828) in Linux · ba4e58ec
      Gerrit Renker 提交于
      This is a revision of the previously submitted patch, which alters
      the way files are organized and compiled in the following manner:
      
      	* UDP and UDP-Lite now use separate object files
      	* source file dependencies resolved via header files
      	  net/ipv{4,6}/udp_impl.h
      	* order of inclusion files in udp.c/udplite.c adapted
      	  accordingly
      
      [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)
      
      This patch adds support for UDP-Lite to the IPv4 stack, provided as an
      extension to the existing UDPv4 code:
              * generic routines are all located in net/ipv4/udp.c
              * UDP-Lite specific routines are in net/ipv4/udplite.c
              * MIB/statistics support in /proc/net/snmp and /proc/net/udplite
              * shared API with extensions for partial checksum coverage
      
      [NET/IPv6]: Extension for UDP-Lite over IPv6
      
      It extends the existing UDPv6 code base with support for UDP-Lite
      in the same manner as per UDPv4. In particular,
              * UDPv6 generic and shared code is in net/ipv6/udp.c
              * UDP-Litev6 specific extensions are in net/ipv6/udplite.c
              * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
              * support for IPV6_ADDRFORM
              * aligned the coding style of protocol initialisation with af_inet6.c
              * made the error handling in udpv6_queue_rcv_skb consistent;
                to return `-1' on error on all error cases
              * consolidation of shared code
      
      [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support
      
      The UDP-Lite patch further provides
              * API documentation for UDP-Lite
              * basic xfrm support
              * basic netfilter support for IPv4 and IPv6 (LOG target)
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba4e58ec
  33. 25 4月, 2006 1 次提交
  34. 21 3月, 2006 1 次提交
    • C
      [SECURITY]: TCP/UDP getpeersec · 2c7946a7
      Catherine Zhang 提交于
      This patch implements an application of the LSM-IPSec networking
      controls whereby an application can determine the label of the
      security association its TCP or UDP sockets are currently connected to
      via getsockopt and the auxiliary data mechanism of recvmsg.
      
      Patch purpose:
      
      This patch enables a security-aware application to retrieve the
      security context of an IPSec security association a particular TCP or
      UDP socket is using.  The application can then use this security
      context to determine the security context for processing on behalf of
      the peer at the other end of this connection.  In the case of UDP, the
      security context is for each individual packet.  An example
      application is the inetd daemon, which could be modified to start
      daemons running at security contexts dependent on the remote client.
      
      Patch design approach:
      
      - Design for TCP
      The patch enables the SELinux LSM to set the peer security context for
      a socket based on the security context of the IPSec security
      association.  The application may retrieve this context using
      getsockopt.  When called, the kernel determines if the socket is a
      connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
      cache on the socket to retrieve the security associations.  If a
      security association has a security context, the context string is
      returned, as for UNIX domain sockets.
      
      - Design for UDP
      Unlike TCP, UDP is connectionless.  This requires a somewhat different
      API to retrieve the peer security context.  With TCP, the peer
      security context stays the same throughout the connection, thus it can
      be retrieved at any time between when the connection is established
      and when it is torn down.  With UDP, each read/write can have
      different peer and thus the security context might change every time.
      As a result the security context retrieval must be done TOGETHER with
      the packet retrieval.
      
      The solution is to build upon the existing Unix domain socket API for
      retrieving user credentials.  Linux offers the API for obtaining user
      credentials via ancillary messages (i.e., out of band/control messages
      that are bundled together with a normal message).
      
      Patch implementation details:
      
      - Implementation for TCP
      The security context can be retrieved by applications using getsockopt
      with the existing SO_PEERSEC flag.  As an example (ignoring error
      checking):
      
      getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
      printf("Socket peer context is: %s\n", optbuf);
      
      The SELinux function, selinux_socket_getpeersec, is extended to check
      for labeled security associations for connected (TCP_ESTABLISHED ==
      sk->sk_state) TCP sockets only.  If so, the socket has a dst_cache of
      struct dst_entry values that may refer to security associations.  If
      these have security associations with security contexts, the security
      context is returned.
      
      getsockopt returns a buffer that contains a security context string or
      the buffer is unmodified.
      
      - Implementation for UDP
      To retrieve the security context, the application first indicates to
      the kernel such desire by setting the IP_PASSSEC option via
      getsockopt.  Then the application retrieves the security context using
      the auxiliary data mechanism.
      
      An example server application for UDP should look like this:
      
      toggle = 1;
      toggle_len = sizeof(toggle);
      
      setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
      recvmsg(sockfd, &msg_hdr, 0);
      if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
          cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
          if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
              cmsg_hdr->cmsg_level == SOL_IP &&
              cmsg_hdr->cmsg_type == SCM_SECURITY) {
              memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
          }
      }
      
      ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
      a server socket to receive security context of the peer.  A new
      ancillary message type SCM_SECURITY.
      
      When the packet is received we get the security context from the
      sec_path pointer which is contained in the sk_buff, and copy it to the
      ancillary message space.  An additional LSM hook,
      selinux_socket_getpeersec_udp, is defined to retrieve the security
      context from the SELinux space.  The existing function,
      selinux_socket_getpeersec does not suit our purpose, because the
      security context is copied directly to user space, rather than to
      kernel space.
      
      Testing:
      
      We have tested the patch by setting up TCP and UDP connections between
      applications on two machines using the IPSec policies that result in
      labeled security associations being built.  For TCP, we can then
      extract the peer security context using getsockopt on either end.  For
      UDP, the receiving end can retrieve the security context using the
      auxiliary data mechanism of recvmsg.
      Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c7946a7
  35. 13 1月, 2006 1 次提交
  36. 04 1月, 2006 1 次提交
  37. 30 8月, 2005 1 次提交