1. 13 10月, 2009 2 次提交
    • A
      net: Introduce recvmmsg socket syscall · a2e27255
      Arnaldo Carvalho de Melo 提交于
      Meaning receive multiple messages, reducing the number of syscalls and
      net stack entry/exit operations.
      
      Next patches will introduce mechanisms where protocols that want to
      optimize this operation will provide an unlocked_recvmsg operation.
      
      This takes into account comments made by:
      
      . Paul Moore: sock_recvmsg is called only for the first datagram,
        sock_recvmsg_nosec is used for the rest.
      
      . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
        works in the same fashion as the ppoll one.
      
        If the underlying protocol returns a datagram with MSG_OOB set, this
        will make recvmmsg return right away with as many datagrams (+ the OOB
        one) it has received so far.
      
      . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
        datagrams and then recvmsg returns an error, recvmmsg will return
        the successfully received datagrams, store the error and return it
        in the next call.
      
      This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
      where we will be able to acquire the lock only at batch start and end, not at
      every underlying recvmsg call.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2e27255
    • N
      net: Generalize socket rx gap / receive queue overflow cmsg · 3b885787
      Neil Horman 提交于
      Create a new socket level option to report number of queue overflows
      
      Recently I augmented the AF_PACKET protocol to report the number of frames lost
      on the socket receive queue between any two enqueued frames.  This value was
      exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
      requested that this feature be generalized so that any datagram oriented socket
      could make use of this option.  As such I've created this patch, It creates a
      new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
      SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
      overflowed between any two given frames.  It also augments the AF_PACKET
      protocol to take advantage of this new feature (as it previously did not touch
      sk->sk_drops, which this patch uses to record the overflow count).  Tested
      successfully by me.
      
      Notes:
      
      1) Unlike my previous patch, this patch simply records the sk_drops value, which
      is not a number of drops between packets, but rather a total number of drops.
      Deltas must be computed in user space.
      
      2) While this patch currently works with datagram oriented protocols, it will
      also be accepted by non-datagram oriented protocols. I'm not sure if thats
      agreeable to everyone, but my argument in favor of doing so is that, for those
      protocols which aren't applicable to this option, sk_drops will always be zero,
      and reporting no drops on a receive queue that isn't used for those
      non-participating protocols seems reasonable to me.  This also saves us having
      to code in a per-protocol opt in mechanism.
      
      3) This applies cleanly to net-next assuming that commit
      97775007 (my af packet cmsg patch) is reverted
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b885787
  2. 08 10月, 2009 1 次提交
    • J
      wext: refactor · 3d23e349
      Johannes Berg 提交于
      Refactor wext to
       * split out iwpriv handling
       * split out iwspy handling
       * split out procfs support
       * allow cfg80211 to have wireless extensions compat code
         w/o CONFIG_WIRELESS_EXT
      
      After this, drivers need to
       - select WIRELESS_EXT	- for wext support
       - select WEXT_PRIV	- for iwpriv support
       - select WEXT_SPY	- for iwspy support
      
      except cfg80211 -- which gets new hooks in wext-core.c
      and can then get wext handlers without CONFIG_WIRELESS_EXT.
      
      Wireless extensions procfs support is auto-selected
      based on PROC_FS and anything that requires the wext core
      (i.e. WIRELESS_EXT or CFG80211_WEXT).
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      3d23e349
  3. 07 10月, 2009 1 次提交
    • E
      net: speedup sk_wake_async() · bcdce719
      Eric Dumazet 提交于
      An incoming datagram must bring into cpu cache *lot* of cache lines,
      in particular : (other parts omitted (hash chains, ip route cache...))
      
      On 32bit arches :
      
      offsetof(struct sock, sk_rcvbuf)       =0x30    (read)
      offsetof(struct sock, sk_lock)         =0x34   (rw)
      
      offsetof(struct sock, sk_sleep)        =0x50 (read)
      offsetof(struct sock, sk_rmem_alloc)   =0x64   (rw)
      offsetof(struct sock, sk_receive_queue)=0x74   (rw)
      
      offsetof(struct sock, sk_forward_alloc)=0x98   (rw)
      
      offsetof(struct sock, sk_callback_lock)=0xcc    (rw)
      offsetof(struct sock, sk_drops)        =0xd8 (read if we add dropcount support, rw if frame dropped)
      offsetof(struct sock, sk_filter)       =0xf8    (read)
      
      offsetof(struct sock, sk_socket)       =0x138 (read)
      
      offsetof(struct sock, sk_data_ready)   =0x15c   (read)
      
      
      We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
      with no fasync() structures. (socket->fasync_list ptr is probably already in cache
      because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
      
      This avoids one cache line load per incoming packet for common cases (no fasync())
      
      We can leave (or even move in a future patch) sk->sk_socket in a cold location
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcdce719
  4. 01 10月, 2009 1 次提交
  5. 29 9月, 2009 1 次提交
    • A
      net: Add explicit bound checks in net/socket.c · 47379052
      Arjan van de Ven 提交于
      The sys_socketcall() function has a very clever system for the copy
      size of its arguments. Unfortunately, gcc cannot deal with this in
      terms of proving that the copy_from_user() is then always in bounds.
      This is the last (well 9th of this series, but last in the kernel) such
      case around.
      
      With this patch, we can turn on code to make having the boundary provably
      right for the whole kernel, and detect introduction of new security
      accidents of this type early on.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47379052
  6. 23 9月, 2009 1 次提交
  7. 22 9月, 2009 1 次提交
  8. 15 9月, 2009 1 次提交
  9. 14 8月, 2009 1 次提交
  10. 05 4月, 2009 1 次提交
    • E
      socket: use percpu_add() while updating sockets_in_use · 4e69489a
      Eric Dumazet 提交于
      sock_alloc() currently uses following code to update sockets_in_use
      
      get_cpu_var(sockets_in_use)++;
      put_cpu_var(sockets_in_use);
      
      This translates to :
      
      c0436274:       b8 01 00 00 00          mov    $0x1,%eax
      c0436279:       e8 42 40 df ff          call   c022a2c0 <add_preempt_count>
      c043627e:       bb 20 4f 6a c0          mov    $0xc06a4f20,%ebx
      c0436283:       e8 18 ca f0 ff          call   c0342ca0 <debug_smp_processor_id>
      c0436288:       03 1c 85 60 4a 65 c0    add    -0x3f9ab5a0(,%eax,4),%ebx
      c043628f:       ff 03                   incl   (%ebx)
      c0436291:       b8 01 00 00 00          mov    $0x1,%eax
      c0436296:       e8 75 3f df ff          call   c022a210 <sub_preempt_count>
      c043629b:       89 e0                   mov    %esp,%eax
      c043629d:       25 00 e0 ff ff          and    $0xffffe000,%eax
      c04362a2:       f6 40 08 08             testb  $0x8,0x8(%eax)
      c04362a6:       75 07                   jne    c04362af <sock_alloc+0x7f>
      c04362a8:       8d 46 d8                lea    -0x28(%esi),%eax
      c04362ab:       5b                      pop    %ebx
      c04362ac:       5e                      pop    %esi
      c04362ad:       c9                      leave
      c04362ae:       c3                      ret
      c04362af:       e8 cc 5d 09 00          call   c04cc080 <preempt_schedule>
      c04362b4:       8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
      c04362b8:       eb ee                   jmp    c04362a8 <sock_alloc+0x78>
      
      While percpu_add(sockets_in_use, 1) translates to a single instruction :
      
      c0436275:   64 83 05 20 5f 6a c0    addl   $0x1,%fs:0xc06a5f20
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e69489a
  11. 28 3月, 2009 2 次提交
  12. 16 3月, 2009 1 次提交
    • J
      Move FASYNC bit handling to f_op->fasync() · 76398425
      Jonathan Corbet 提交于
      Removing the BKL from FASYNC handling ran into the challenge of keeping the
      setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
      the underlying fasync() function.  Andi Kleen suggested moving the handling
      of that bit into fasync(); this patch does exactly that.  As a result, we
      have a couple of internal API changes: fasync() must now manage the FASYNC
      bit, and it will be called without the BKL held.
      
      As it happens, every fasync() implementation in the kernel with one
      exception calls fasync_helper().  So, if we make fasync_helper() set the
      FASYNC bit, we can avoid making any changes to the other fasync()
      functions - as long as those functions, themselves, have proper locking.
      Most fasync() implementations do nothing but call fasync_helper() - which
      has its own lock - so they are easily verified as correct.  The BKL had
      already been pushed down into the rest.
      
      The networking code has its own version of fasync_helper(), so that code
      has been augmented with explicit FASYNC bit handling.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      76398425
  13. 16 2月, 2009 1 次提交
  14. 14 1月, 2009 3 次提交
  15. 05 1月, 2009 2 次提交
  16. 19 12月, 2008 1 次提交
    • W
      net: Fix module refcount leak in kernel_accept() · 1b08534e
      Wei Yongjun 提交于
      The kernel_accept() does not hold the module refcount of newsock->ops->owner,
      so we need __module_get(newsock->ops->owner) code after call kernel_accept()
      by hand.
      In sunrpc, the module refcount is missing to hold. So this cause kernel panic.
      
      Used following script to reproduct:
      
      while [ 1 ];
      do
          mount -t nfs4 192.168.0.19:/ /mnt
          touch /mnt/file
          umount /mnt
          lsmod | grep ipv6
      done
      
      This patch fixed the problem by add __module_get(newsock->ops->owner) to
      kernel_accept(). So we do not need to used __module_get(newsock->ops->owner)
      in every place when used kernel_accept().
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b08534e
  17. 20 11月, 2008 1 次提交
    • U
      reintroduce accept4 · de11defe
      Ulrich Drepper 提交于
      Introduce a new accept4() system call.  The addition of this system call
      matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
      inotify_init1(), epoll_create1(), pipe2()) which added new system calls
      that differed from analogous traditional system calls in adding a flags
      argument that can be used to access additional functionality.
      
      The accept4() system call is exactly the same as accept(), except that
      it adds a flags bit-mask argument.  Two flags are initially implemented.
      (Most of the new system calls in 2.6.27 also had both of these flags.)
      
      SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
      for the new file descriptor returned by accept4().  This is a useful
      security feature to avoid leaking information in a multithreaded
      program where one thread is doing an accept() at the same time as
      another thread is doing a fork() plus exec().  More details here:
      http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
      Ulrich Drepper).
      
      The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
      to be enabled on the new open file description created by accept4().
      (This flag is merely a convenience, saving the use of additional calls
      fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.
      
      Here's a test program.  Works on x86-32.  Should work on x86-64, but
      I (mtk) don't have a system to hand to test with.
      
      It tests accept4() with each of the four possible combinations of
      SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
      that the appropriate flags are set on the file descriptor/open file
      description returned by accept4().
      
      I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
      and it passes according to my test program.
      
      /* test_accept4.c
      
        Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
             <mtk.manpages@gmail.com>
      
        Licensed under the GNU GPLv2 or later.
      */
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      #include <stdlib.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <string.h>
      
      #define PORT_NUM 33333
      
      #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
      
      /**********************************************************************/
      
      /* The following is what we need until glibc gets a wrapper for
        accept4() */
      
      /* Flags for socket(), socketpair(), accept4() */
      #ifndef SOCK_CLOEXEC
      #define SOCK_CLOEXEC    O_CLOEXEC
      #endif
      #ifndef SOCK_NONBLOCK
      #define SOCK_NONBLOCK   O_NONBLOCK
      #endif
      
      #ifdef __x86_64__
      #define SYS_accept4 288
      #elif __i386__
      #define USE_SOCKETCALL 1
      #define SYS_ACCEPT4 18
      #else
      #error "Sorry -- don't know the syscall # on this architecture"
      #endif
      
      static int
      accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
      {
         printf("Calling accept4(): flags = %x", flags);
         if (flags != 0) {
             printf(" (");
             if (flags & SOCK_CLOEXEC)
                 printf("SOCK_CLOEXEC");
             if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
                 printf(" ");
             if (flags & SOCK_NONBLOCK)
                 printf("SOCK_NONBLOCK");
             printf(")");
         }
         printf("\n");
      
      #if USE_SOCKETCALL
         long args[6];
      
         args[0] = fd;
         args[1] = (long) sockaddr;
         args[2] = (long) addrlen;
         args[3] = flags;
      
         return syscall(SYS_socketcall, SYS_ACCEPT4, args);
      #else
         return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
      #endif
      }
      
      /**********************************************************************/
      
      static int
      do_test(int lfd, struct sockaddr_in *conn_addr,
             int closeonexec_flag, int nonblock_flag)
      {
         int connfd, acceptfd;
         int fdf, flf, fdf_pass, flf_pass;
         struct sockaddr_in claddr;
         socklen_t addrlen;
      
         printf("=======================================\n");
      
         connfd = socket(AF_INET, SOCK_STREAM, 0);
         if (connfd == -1)
             die("socket");
         if (connect(connfd, (struct sockaddr *) conn_addr,
                     sizeof(struct sockaddr_in)) == -1)
             die("connect");
      
         addrlen = sizeof(struct sockaddr_in);
         acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
                            closeonexec_flag | nonblock_flag);
         if (acceptfd == -1) {
             perror("accept4()");
             close(connfd);
             return 0;
         }
      
         fdf = fcntl(acceptfd, F_GETFD);
         if (fdf == -1)
             die("fcntl:F_GETFD");
         fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
                    ((closeonexec_flag & SOCK_CLOEXEC) != 0);
         printf("Close-on-exec flag is %sset (%s); ",
                 (fdf & FD_CLOEXEC) ? "" : "not ",
                 fdf_pass ? "OK" : "failed");
      
         flf = fcntl(acceptfd, F_GETFL);
         if (flf == -1)
             die("fcntl:F_GETFD");
         flf_pass = ((flf & O_NONBLOCK) != 0) ==
                    ((nonblock_flag & SOCK_NONBLOCK) !=0);
         printf("nonblock flag is %sset (%s)\n",
                 (flf & O_NONBLOCK) ? "" : "not ",
                 flf_pass ? "OK" : "failed");
      
         close(acceptfd);
         close(connfd);
      
         printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
         return fdf_pass && flf_pass;
      }
      
      static int
      create_listening_socket(int port_num)
      {
         struct sockaddr_in svaddr;
         int lfd;
         int optval;
      
         memset(&svaddr, 0, sizeof(struct sockaddr_in));
         svaddr.sin_family = AF_INET;
         svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
         svaddr.sin_port = htons(port_num);
      
         lfd = socket(AF_INET, SOCK_STREAM, 0);
         if (lfd == -1)
             die("socket");
      
         optval = 1;
         if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                        sizeof(optval)) == -1)
             die("setsockopt");
      
         if (bind(lfd, (struct sockaddr *) &svaddr,
                  sizeof(struct sockaddr_in)) == -1)
             die("bind");
      
         if (listen(lfd, 5) == -1)
             die("listen");
      
         return lfd;
      }
      
      int
      main(int argc, char *argv[])
      {
         struct sockaddr_in conn_addr;
         int lfd;
         int port_num;
         int passed;
      
         passed = 1;
      
         port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;
      
         memset(&conn_addr, 0, sizeof(struct sockaddr_in));
         conn_addr.sin_family = AF_INET;
         conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
         conn_addr.sin_port = htons(port_num);
      
         lfd = create_listening_socket(port_num);
      
         if (!do_test(lfd, &conn_addr, 0, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
             passed = 0;
      
         close(lfd);
      
         exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
      }
      
      [mtk.manpages@gmail.com: rewrote changelog, updated test program]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Tested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de11defe
  18. 14 11月, 2008 1 次提交
  19. 04 11月, 2008 1 次提交
  20. 02 11月, 2008 1 次提交
    • A
      saner FASYNC handling on file close · 233e70f4
      Al Viro 提交于
      As it is, all instances of ->release() for files that have ->fasync()
      need to remember to evict file from fasync lists; forgetting that
      creates a hole and we actually have a bunch that *does* forget.
      
      So let's keep our lives simple - let __fput() check FASYNC in
      file->f_flags and call ->fasync() there if it's been set.  And lose that
      crap in ->release() instances - leaving it there is still valid, but we
      don't have to bother anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      233e70f4
  21. 17 10月, 2008 1 次提交
  22. 23 9月, 2008 1 次提交
    • M
      sys_paccept: disable paccept() until API design is resolved · 2d4c8266
      Michael Kerrisk 提交于
      The reasons for disabling paccept() are as follows:
      
      * The API is more complex than needed.  There is AFAICS no demonstrated
        use case that the sigset argument of this syscall serves that couldn't
        equally be served by the use of pselect/ppoll/epoll_pwait + traditional
        accept().  Roland seems to concur with this opinion
        (http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).  I
        have (more than once) asked Ulrich to explain otherwise
        (http://thread.gmane.org/gmane.linux.kernel/723952/focus=731018), but he
        does not respond, so one is left to assume that he doesn't know of such
        a case.
      
      * The use of a sigset argument is not consistent with other I/O APIs
        that can block on a single file descriptor (e.g., read(), recv(),
        connect()).
      
      * The behavior of paccept() when interrupted by a signal is IMO strange:
        the kernel restarts the system call if SA_RESTART was set for the
        handler.  I think that it should not do this -- that it should behave
        consistently with paccept()/ppoll()/epoll_pwait(), which never restart,
        regardless of SA_RESTART.  The reasoning here is that the very purpose
        of paccept() is to wait for a connection or a signal, and that
        restarting in the latter case is probably never useful.  (Note: Roland
        disagrees on this point, believing that rather paccept() should be
        consistent with accept() in its behavior wrt EINTR
        (http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).)
      
      I believe that instead, a simpler API, consistent with Ulrich's other
      recent additions, is preferable:
      
      accept4(int fd, struct sockaddr *sa, socklen_t *salen, ind flags);
      
      (This simpler API was originally proposed by Ulrich:
      http://thread.gmane.org/gmane.linux.network/92072)
      
      If this simpler API is added, then if we later decide that the sigset
      argument really is required, then a suitable bit in 'flags' could be added
      to indicate the presence of the sigset argument.
      
      At this point, I am hoping we either will get a counter-argument from
      Ulrich about why we really do need paccept()'s sigset argument, or that he
      will resubmit the original accept4() patch.
      Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Alan Cox <alan@redhat.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d4c8266
  23. 27 7月, 2008 1 次提交
  24. 25 7月, 2008 5 次提交
    • U
      flag parameters: check magic constants · e38b36f3
      Ulrich Drepper 提交于
      This patch adds test that ensure the boundary conditions for the various
      constants introduced in the previous patches is met.  No code is generated.
      
      [akpm@linux-foundation.org: fix alpha]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e38b36f3
    • U
      flag parameters: NONBLOCK in socket and socketpair · 77d27200
      Ulrich Drepper 提交于
      This patch introduces support for the SOCK_NONBLOCK flag in socket,
      socketpair, and  paccept.  To do this the internal function sock_attach_fd
      gets an additional parameter which it uses to set the appropriate flag for
      the file descriptor.
      
      Given that in modern, scalable programs almost all socket connections are
      non-blocking and the minimal additional cost for the new functionality
      I see no reason not to add this code.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <pthread.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <netinet/in.h>
      #include <sys/socket.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_paccept
      # ifdef __x86_64__
      #  define __NR_paccept 288
      # elif defined __i386__
      #  define SYS_PACCEPT 18
      #  define USE_SOCKETCALL 1
      # else
      #  error "need __NR_paccept"
      # endif
      #endif
      
      #ifdef USE_SOCKETCALL
      # define paccept(fd, addr, addrlen, mask, flags) \
        ({ long args[6] = { \
             (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
           syscall (__NR_socketcall, SYS_PACCEPT, args); })
      #else
      # define paccept(fd, addr, addrlen, mask, flags) \
        syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
      #endif
      
      #define PORT 57392
      
      #define SOCK_NONBLOCK O_NONBLOCK
      
      static pthread_barrier_t b;
      
      static void *
      tf (void *arg)
      {
        pthread_barrier_wait (&b);
        int s = socket (AF_INET, SOCK_STREAM, 0);
        struct sockaddr_in sin;
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
        sin.sin_port = htons (PORT);
        connect (s, (const struct sockaddr *) &sin, sizeof (sin));
        close (s);
        pthread_barrier_wait (&b);
      
        pthread_barrier_wait (&b);
        s = socket (AF_INET, SOCK_STREAM, 0);
        sin.sin_port = htons (PORT);
        connect (s, (const struct sockaddr *) &sin, sizeof (sin));
        close (s);
        pthread_barrier_wait (&b);
      
        return NULL;
      }
      
      int
      main (void)
      {
        int fd;
        fd = socket (PF_INET, SOCK_STREAM, 0);
        if (fd == -1)
          {
            puts ("socket(0) failed");
            return 1;
          }
        int fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (fl & O_NONBLOCK)
          {
            puts ("socket(0) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        fd = socket (PF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0);
        if (fd == -1)
          {
            puts ("socket(SOCK_NONBLOCK) failed");
            return 1;
          }
        fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode");
            return 1;
          }
        close (fd);
      
        int fds[2];
        if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
          {
            puts ("socketpair(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (fl & O_NONBLOCK)
              {
                printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_NONBLOCK, 0, fds) == -1)
          {
            puts ("socketpair(SOCK_NONBLOCK) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((fl & O_NONBLOCK) == 0)
              {
                printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        pthread_barrier_init (&b, NULL, 2);
      
        struct sockaddr_in sin;
        pthread_t th;
        if (pthread_create (&th, NULL, tf, NULL) != 0)
          {
            puts ("pthread_create failed");
            return 1;
          }
      
        int s = socket (AF_INET, SOCK_STREAM, 0);
        int reuse = 1;
        setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
        sin.sin_port = htons (PORT);
        bind (s, (struct sockaddr *) &sin, sizeof (sin));
        listen (s, SOMAXCONN);
      
        pthread_barrier_wait (&b);
      
        int s2 = paccept (s, NULL, 0, NULL, 0);
        if (s2 < 0)
          {
            puts ("paccept(0) failed");
            return 1;
          }
      
        fl = fcntl (s2, F_GETFL);
        if (fl & O_NONBLOCK)
          {
            puts ("paccept(0) set non-blocking mode");
            return 1;
          }
        close (s2);
        close (s);
      
        pthread_barrier_wait (&b);
      
        s = socket (AF_INET, SOCK_STREAM, 0);
        sin.sin_port = htons (PORT);
        setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
        bind (s, (struct sockaddr *) &sin, sizeof (sin));
        listen (s, SOMAXCONN);
      
        pthread_barrier_wait (&b);
      
        s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK);
        if (s2 < 0)
          {
            puts ("paccept(SOCK_NONBLOCK) failed");
            return 1;
          }
      
        fl = fcntl (s2, F_GETFL);
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode");
            return 1;
          }
        close (s2);
        close (s);
      
        pthread_barrier_wait (&b);
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77d27200
    • U
      flag parameters: paccept w/out set_restore_sigmask · c019bbc6
      Ulrich Drepper 提交于
      Some platforms do not have support to restore the signal mask in the
      return path from a syscall.  For those platforms syscalls like pselect are
      not defined at all.  This is, I think, not a good choice for paccept()
      since paccept() adds more value on top of accept() than just the signal
      mask handling.
      
      Therefore this patch defines a scaled down version of the sys_paccept
      function for those platforms.  It returns -EINVAL in case the signal mask
      is non-NULL but behaves the same otherwise.
      
      Note that I explicitly included <linux/thread_info.h>.  I saw that it is
      currently included but indirectly two levels down.  There is too much risk
      in relying on this.  The header might change and then suddenly the
      function definition would change without anyone immediately noticing.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c019bbc6
    • U
      flag parameters: paccept · aaca0bdc
      Ulrich Drepper 提交于
      This patch is by far the most complex in the series.  It adds a new syscall
      paccept.  This syscall differs from accept in that it adds (at the userlevel)
      two additional parameters:
      
      - a signal mask
      - a flags value
      
      The flags parameter can be used to set flag like SOCK_CLOEXEC.  This is
      imlpemented here as well.  Some people argued that this is a property which
      should be inherited from the file desriptor for the server but this is against
      POSIX.  Additionally, we really want the signal mask parameter as well
      (similar to pselect, ppoll, etc).  So an interface change in inevitable.
      
      The flag value is the same as for socket and socketpair.  I think diverging
      here will only create confusion.  Similar to the filesystem interfaces where
      the use of the O_* constants differs, it is acceptable here.
      
      The signal mask is handled as for pselect etc.  The mask is temporarily
      installed for the thread and removed before the call returns.  I modeled the
      code after pselect.  If there is a problem it's likely also in pselect.
      
      For architectures which use socketcall I maintained this interface instead of
      adding a system call.  The symmetry shouldn't be broken.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <errno.h>
      #include <fcntl.h>
      #include <pthread.h>
      #include <signal.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <netinet/in.h>
      #include <sys/socket.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_paccept
      # ifdef __x86_64__
      #  define __NR_paccept 288
      # elif defined __i386__
      #  define SYS_PACCEPT 18
      #  define USE_SOCKETCALL 1
      # else
      #  error "need __NR_paccept"
      # endif
      #endif
      
      #ifdef USE_SOCKETCALL
      # define paccept(fd, addr, addrlen, mask, flags) \
        ({ long args[6] = { \
             (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
           syscall (__NR_socketcall, SYS_PACCEPT, args); })
      #else
      # define paccept(fd, addr, addrlen, mask, flags) \
        syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
      #endif
      
      #define PORT 57392
      
      #define SOCK_CLOEXEC O_CLOEXEC
      
      static pthread_barrier_t b;
      
      static void *
      tf (void *arg)
      {
        pthread_barrier_wait (&b);
        int s = socket (AF_INET, SOCK_STREAM, 0);
        struct sockaddr_in sin;
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
        sin.sin_port = htons (PORT);
        connect (s, (const struct sockaddr *) &sin, sizeof (sin));
        close (s);
      
        pthread_barrier_wait (&b);
        s = socket (AF_INET, SOCK_STREAM, 0);
        sin.sin_port = htons (PORT);
        connect (s, (const struct sockaddr *) &sin, sizeof (sin));
        close (s);
        pthread_barrier_wait (&b);
      
        pthread_barrier_wait (&b);
        sleep (2);
        pthread_kill ((pthread_t) arg, SIGUSR1);
      
        return NULL;
      }
      
      static void
      handler (int s)
      {
      }
      
      int
      main (void)
      {
        pthread_barrier_init (&b, NULL, 2);
      
        struct sockaddr_in sin;
        pthread_t th;
        if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0)
          {
            puts ("pthread_create failed");
            return 1;
          }
      
        int s = socket (AF_INET, SOCK_STREAM, 0);
        int reuse = 1;
        setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
        sin.sin_port = htons (PORT);
        bind (s, (struct sockaddr *) &sin, sizeof (sin));
        listen (s, SOMAXCONN);
      
        pthread_barrier_wait (&b);
      
        int s2 = paccept (s, NULL, 0, NULL, 0);
        if (s2 < 0)
          {
            puts ("paccept(0) failed");
            return 1;
          }
      
        int coe = fcntl (s2, F_GETFD);
        if (coe & FD_CLOEXEC)
          {
            puts ("paccept(0) set close-on-exec-flag");
            return 1;
          }
        close (s2);
      
        pthread_barrier_wait (&b);
      
        s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC);
        if (s2 < 0)
          {
            puts ("paccept(SOCK_CLOEXEC) failed");
            return 1;
          }
      
        coe = fcntl (s2, F_GETFD);
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (s2);
      
        pthread_barrier_wait (&b);
      
        struct sigaction sa;
        sa.sa_handler = handler;
        sa.sa_flags = 0;
        sigemptyset (&sa.sa_mask);
        sigaction (SIGUSR1, &sa, NULL);
      
        sigset_t ss;
        pthread_sigmask (SIG_SETMASK, NULL, &ss);
        sigaddset (&ss, SIGUSR1);
        pthread_sigmask (SIG_SETMASK, &ss, NULL);
      
        sigdelset (&ss, SIGUSR1);
        alarm (4);
        pthread_barrier_wait (&b);
      
        errno = 0 ;
        s2 = paccept (s, NULL, 0, &ss, 0);
        if (s2 != -1 || errno != EINTR)
          {
            puts ("paccept did not fail with EINTR");
            return 1;
          }
      
        close (s);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: make it compile]
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aaca0bdc
    • U
      flag parameters: socket and socketpair · a677a039
      Ulrich Drepper 提交于
      This patch adds support for flag values which are ORed to the type passwd
      to socket and socketpair.  The additional code is minimal.  The flag
      values in this implementation can and must match the O_* flags.  This
      avoids overhead in the conversion.
      
      The internal functions sock_alloc_fd and sock_map_fd get a new parameters
      and all callers are changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <netinet/in.h>
      #include <sys/socket.h>
      
      #define PORT 57392
      
      /* For Linux these must be the same.  */
      #define SOCK_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd;
        fd = socket (PF_INET, SOCK_STREAM, 0);
        if (fd == -1)
          {
            puts ("socket(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("socket(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = socket (PF_INET, SOCK_STREAM|SOCK_CLOEXEC, 0);
        if (fd == -1)
          {
            puts ("socket(SOCK_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        int fds[2];
        if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
          {
            puts ("socketpair(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            coe = fcntl (fds[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, fds) == -1)
          {
            puts ("socketpair(SOCK_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            coe = fcntl (fds[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a677a039
  25. 20 7月, 2008 1 次提交
  26. 17 6月, 2008 1 次提交
  27. 23 4月, 2008 1 次提交
  28. 01 4月, 2008 1 次提交
  29. 26 3月, 2008 1 次提交
  30. 22 3月, 2008 1 次提交
  31. 15 2月, 2008 1 次提交