1. 14 8月, 2009 1 次提交
  2. 05 4月, 2009 1 次提交
    • E
      socket: use percpu_add() while updating sockets_in_use · 4e69489a
      Eric Dumazet 提交于
      sock_alloc() currently uses following code to update sockets_in_use
      
      get_cpu_var(sockets_in_use)++;
      put_cpu_var(sockets_in_use);
      
      This translates to :
      
      c0436274:       b8 01 00 00 00          mov    $0x1,%eax
      c0436279:       e8 42 40 df ff          call   c022a2c0 <add_preempt_count>
      c043627e:       bb 20 4f 6a c0          mov    $0xc06a4f20,%ebx
      c0436283:       e8 18 ca f0 ff          call   c0342ca0 <debug_smp_processor_id>
      c0436288:       03 1c 85 60 4a 65 c0    add    -0x3f9ab5a0(,%eax,4),%ebx
      c043628f:       ff 03                   incl   (%ebx)
      c0436291:       b8 01 00 00 00          mov    $0x1,%eax
      c0436296:       e8 75 3f df ff          call   c022a210 <sub_preempt_count>
      c043629b:       89 e0                   mov    %esp,%eax
      c043629d:       25 00 e0 ff ff          and    $0xffffe000,%eax
      c04362a2:       f6 40 08 08             testb  $0x8,0x8(%eax)
      c04362a6:       75 07                   jne    c04362af <sock_alloc+0x7f>
      c04362a8:       8d 46 d8                lea    -0x28(%esi),%eax
      c04362ab:       5b                      pop    %ebx
      c04362ac:       5e                      pop    %esi
      c04362ad:       c9                      leave
      c04362ae:       c3                      ret
      c04362af:       e8 cc 5d 09 00          call   c04cc080 <preempt_schedule>
      c04362b4:       8d 74 26 00             lea    0x0(%esi,%eiz,1),%esi
      c04362b8:       eb ee                   jmp    c04362a8 <sock_alloc+0x78>
      
      While percpu_add(sockets_in_use, 1) translates to a single instruction :
      
      c0436275:   64 83 05 20 5f 6a c0    addl   $0x1,%fs:0xc06a5f20
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e69489a
  3. 28 3月, 2009 2 次提交
  4. 16 3月, 2009 1 次提交
    • J
      Move FASYNC bit handling to f_op->fasync() · 76398425
      Jonathan Corbet 提交于
      Removing the BKL from FASYNC handling ran into the challenge of keeping the
      setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
      the underlying fasync() function.  Andi Kleen suggested moving the handling
      of that bit into fasync(); this patch does exactly that.  As a result, we
      have a couple of internal API changes: fasync() must now manage the FASYNC
      bit, and it will be called without the BKL held.
      
      As it happens, every fasync() implementation in the kernel with one
      exception calls fasync_helper().  So, if we make fasync_helper() set the
      FASYNC bit, we can avoid making any changes to the other fasync()
      functions - as long as those functions, themselves, have proper locking.
      Most fasync() implementations do nothing but call fasync_helper() - which
      has its own lock - so they are easily verified as correct.  The BKL had
      already been pushed down into the rest.
      
      The networking code has its own version of fasync_helper(), so that code
      has been augmented with explicit FASYNC bit handling.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      76398425
  5. 16 2月, 2009 1 次提交
  6. 14 1月, 2009 3 次提交
  7. 05 1月, 2009 2 次提交
  8. 19 12月, 2008 1 次提交
    • W
      net: Fix module refcount leak in kernel_accept() · 1b08534e
      Wei Yongjun 提交于
      The kernel_accept() does not hold the module refcount of newsock->ops->owner,
      so we need __module_get(newsock->ops->owner) code after call kernel_accept()
      by hand.
      In sunrpc, the module refcount is missing to hold. So this cause kernel panic.
      
      Used following script to reproduct:
      
      while [ 1 ];
      do
          mount -t nfs4 192.168.0.19:/ /mnt
          touch /mnt/file
          umount /mnt
          lsmod | grep ipv6
      done
      
      This patch fixed the problem by add __module_get(newsock->ops->owner) to
      kernel_accept(). So we do not need to used __module_get(newsock->ops->owner)
      in every place when used kernel_accept().
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b08534e
  9. 20 11月, 2008 1 次提交
    • U
      reintroduce accept4 · de11defe
      Ulrich Drepper 提交于
      Introduce a new accept4() system call.  The addition of this system call
      matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
      inotify_init1(), epoll_create1(), pipe2()) which added new system calls
      that differed from analogous traditional system calls in adding a flags
      argument that can be used to access additional functionality.
      
      The accept4() system call is exactly the same as accept(), except that
      it adds a flags bit-mask argument.  Two flags are initially implemented.
      (Most of the new system calls in 2.6.27 also had both of these flags.)
      
      SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
      for the new file descriptor returned by accept4().  This is a useful
      security feature to avoid leaking information in a multithreaded
      program where one thread is doing an accept() at the same time as
      another thread is doing a fork() plus exec().  More details here:
      http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
      Ulrich Drepper).
      
      The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
      to be enabled on the new open file description created by accept4().
      (This flag is merely a convenience, saving the use of additional calls
      fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.
      
      Here's a test program.  Works on x86-32.  Should work on x86-64, but
      I (mtk) don't have a system to hand to test with.
      
      It tests accept4() with each of the four possible combinations of
      SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
      that the appropriate flags are set on the file descriptor/open file
      description returned by accept4().
      
      I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
      and it passes according to my test program.
      
      /* test_accept4.c
      
        Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
             <mtk.manpages@gmail.com>
      
        Licensed under the GNU GPLv2 or later.
      */
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      #include <stdlib.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <string.h>
      
      #define PORT_NUM 33333
      
      #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
      
      /**********************************************************************/
      
      /* The following is what we need until glibc gets a wrapper for
        accept4() */
      
      /* Flags for socket(), socketpair(), accept4() */
      #ifndef SOCK_CLOEXEC
      #define SOCK_CLOEXEC    O_CLOEXEC
      #endif
      #ifndef SOCK_NONBLOCK
      #define SOCK_NONBLOCK   O_NONBLOCK
      #endif
      
      #ifdef __x86_64__
      #define SYS_accept4 288
      #elif __i386__
      #define USE_SOCKETCALL 1
      #define SYS_ACCEPT4 18
      #else
      #error "Sorry -- don't know the syscall # on this architecture"
      #endif
      
      static int
      accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
      {
         printf("Calling accept4(): flags = %x", flags);
         if (flags != 0) {
             printf(" (");
             if (flags & SOCK_CLOEXEC)
                 printf("SOCK_CLOEXEC");
             if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
                 printf(" ");
             if (flags & SOCK_NONBLOCK)
                 printf("SOCK_NONBLOCK");
             printf(")");
         }
         printf("\n");
      
      #if USE_SOCKETCALL
         long args[6];
      
         args[0] = fd;
         args[1] = (long) sockaddr;
         args[2] = (long) addrlen;
         args[3] = flags;
      
         return syscall(SYS_socketcall, SYS_ACCEPT4, args);
      #else
         return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
      #endif
      }
      
      /**********************************************************************/
      
      static int
      do_test(int lfd, struct sockaddr_in *conn_addr,
             int closeonexec_flag, int nonblock_flag)
      {
         int connfd, acceptfd;
         int fdf, flf, fdf_pass, flf_pass;
         struct sockaddr_in claddr;
         socklen_t addrlen;
      
         printf("=======================================\n");
      
         connfd = socket(AF_INET, SOCK_STREAM, 0);
         if (connfd == -1)
             die("socket");
         if (connect(connfd, (struct sockaddr *) conn_addr,
                     sizeof(struct sockaddr_in)) == -1)
             die("connect");
      
         addrlen = sizeof(struct sockaddr_in);
         acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
                            closeonexec_flag | nonblock_flag);
         if (acceptfd == -1) {
             perror("accept4()");
             close(connfd);
             return 0;
         }
      
         fdf = fcntl(acceptfd, F_GETFD);
         if (fdf == -1)
             die("fcntl:F_GETFD");
         fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
                    ((closeonexec_flag & SOCK_CLOEXEC) != 0);
         printf("Close-on-exec flag is %sset (%s); ",
                 (fdf & FD_CLOEXEC) ? "" : "not ",
                 fdf_pass ? "OK" : "failed");
      
         flf = fcntl(acceptfd, F_GETFL);
         if (flf == -1)
             die("fcntl:F_GETFD");
         flf_pass = ((flf & O_NONBLOCK) != 0) ==
                    ((nonblock_flag & SOCK_NONBLOCK) !=0);
         printf("nonblock flag is %sset (%s)\n",
                 (flf & O_NONBLOCK) ? "" : "not ",
                 flf_pass ? "OK" : "failed");
      
         close(acceptfd);
         close(connfd);
      
         printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
         return fdf_pass && flf_pass;
      }
      
      static int
      create_listening_socket(int port_num)
      {
         struct sockaddr_in svaddr;
         int lfd;
         int optval;
      
         memset(&svaddr, 0, sizeof(struct sockaddr_in));
         svaddr.sin_family = AF_INET;
         svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
         svaddr.sin_port = htons(port_num);
      
         lfd = socket(AF_INET, SOCK_STREAM, 0);
         if (lfd == -1)
             die("socket");
      
         optval = 1;
         if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                        sizeof(optval)) == -1)
             die("setsockopt");
      
         if (bind(lfd, (struct sockaddr *) &svaddr,
                  sizeof(struct sockaddr_in)) == -1)
             die("bind");
      
         if (listen(lfd, 5) == -1)
             die("listen");
      
         return lfd;
      }
      
      int
      main(int argc, char *argv[])
      {
         struct sockaddr_in conn_addr;
         int lfd;
         int port_num;
         int passed;
      
         passed = 1;
      
         port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;
      
         memset(&conn_addr, 0, sizeof(struct sockaddr_in));
         conn_addr.sin_family = AF_INET;
         conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
         conn_addr.sin_port = htons(port_num);
      
         lfd = create_listening_socket(port_num);
      
         if (!do_test(lfd, &conn_addr, 0, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
             passed = 0;
      
         close(lfd);
      
         exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
      }
      
      [mtk.manpages@gmail.com: rewrote changelog, updated test program]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Tested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de11defe
  10. 14 11月, 2008 1 次提交
  11. 04 11月, 2008 1 次提交
  12. 02 11月, 2008 1 次提交
    • A
      saner FASYNC handling on file close · 233e70f4
      Al Viro 提交于
      As it is, all instances of ->release() for files that have ->fasync()
      need to remember to evict file from fasync lists; forgetting that
      creates a hole and we actually have a bunch that *does* forget.
      
      So let's keep our lives simple - let __fput() check FASYNC in
      file->f_flags and call ->fasync() there if it's been set.  And lose that
      crap in ->release() instances - leaving it there is still valid, but we
      don't have to bother anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      233e70f4
  13. 17 10月, 2008 1 次提交
  14. 23 9月, 2008 1 次提交
    • M
      sys_paccept: disable paccept() until API design is resolved · 2d4c8266
      Michael Kerrisk 提交于
      The reasons for disabling paccept() are as follows:
      
      * The API is more complex than needed.  There is AFAICS no demonstrated
        use case that the sigset argument of this syscall serves that couldn't
        equally be served by the use of pselect/ppoll/epoll_pwait + traditional
        accept().  Roland seems to concur with this opinion
        (http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).  I
        have (more than once) asked Ulrich to explain otherwise
        (http://thread.gmane.org/gmane.linux.kernel/723952/focus=731018), but he
        does not respond, so one is left to assume that he doesn't know of such
        a case.
      
      * The use of a sigset argument is not consistent with other I/O APIs
        that can block on a single file descriptor (e.g., read(), recv(),
        connect()).
      
      * The behavior of paccept() when interrupted by a signal is IMO strange:
        the kernel restarts the system call if SA_RESTART was set for the
        handler.  I think that it should not do this -- that it should behave
        consistently with paccept()/ppoll()/epoll_pwait(), which never restart,
        regardless of SA_RESTART.  The reasoning here is that the very purpose
        of paccept() is to wait for a connection or a signal, and that
        restarting in the latter case is probably never useful.  (Note: Roland
        disagrees on this point, believing that rather paccept() should be
        consistent with accept() in its behavior wrt EINTR
        (http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).)
      
      I believe that instead, a simpler API, consistent with Ulrich's other
      recent additions, is preferable:
      
      accept4(int fd, struct sockaddr *sa, socklen_t *salen, ind flags);
      
      (This simpler API was originally proposed by Ulrich:
      http://thread.gmane.org/gmane.linux.network/92072)
      
      If this simpler API is added, then if we later decide that the sigset
      argument really is required, then a suitable bit in 'flags' could be added
      to indicate the presence of the sigset argument.
      
      At this point, I am hoping we either will get a counter-argument from
      Ulrich about why we really do need paccept()'s sigset argument, or that he
      will resubmit the original accept4() patch.
      Signed-off-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Alan Cox <alan@redhat.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d4c8266
  15. 27 7月, 2008 1 次提交
  16. 25 7月, 2008 5 次提交
    • U
      flag parameters: check magic constants · e38b36f3
      Ulrich Drepper 提交于
      This patch adds test that ensure the boundary conditions for the various
      constants introduced in the previous patches is met.  No code is generated.
      
      [akpm@linux-foundation.org: fix alpha]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e38b36f3
    • U
      flag parameters: NONBLOCK in socket and socketpair · 77d27200
      Ulrich Drepper 提交于
      This patch introduces support for the SOCK_NONBLOCK flag in socket,
      socketpair, and  paccept.  To do this the internal function sock_attach_fd
      gets an additional parameter which it uses to set the appropriate flag for
      the file descriptor.
      
      Given that in modern, scalable programs almost all socket connections are
      non-blocking and the minimal additional cost for the new functionality
      I see no reason not to add this code.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <pthread.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <netinet/in.h>
      #include <sys/socket.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_paccept
      # ifdef __x86_64__
      #  define __NR_paccept 288
      # elif defined __i386__
      #  define SYS_PACCEPT 18
      #  define USE_SOCKETCALL 1
      # else
      #  error "need __NR_paccept"
      # endif
      #endif
      
      #ifdef USE_SOCKETCALL
      # define paccept(fd, addr, addrlen, mask, flags) \
        ({ long args[6] = { \
             (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
           syscall (__NR_socketcall, SYS_PACCEPT, args); })
      #else
      # define paccept(fd, addr, addrlen, mask, flags) \
        syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
      #endif
      
      #define PORT 57392
      
      #define SOCK_NONBLOCK O_NONBLOCK
      
      static pthread_barrier_t b;
      
      static void *
      tf (void *arg)
      {
        pthread_barrier_wait (&b);
        int s = socket (AF_INET, SOCK_STREAM, 0);
        struct sockaddr_in sin;
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
        sin.sin_port = htons (PORT);
        connect (s, (const struct sockaddr *) &sin, sizeof (sin));
        close (s);
        pthread_barrier_wait (&b);
      
        pthread_barrier_wait (&b);
        s = socket (AF_INET, SOCK_STREAM, 0);
        sin.sin_port = htons (PORT);
        connect (s, (const struct sockaddr *) &sin, sizeof (sin));
        close (s);
        pthread_barrier_wait (&b);
      
        return NULL;
      }
      
      int
      main (void)
      {
        int fd;
        fd = socket (PF_INET, SOCK_STREAM, 0);
        if (fd == -1)
          {
            puts ("socket(0) failed");
            return 1;
          }
        int fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (fl & O_NONBLOCK)
          {
            puts ("socket(0) set non-blocking mode");
            return 1;
          }
        close (fd);
      
        fd = socket (PF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0);
        if (fd == -1)
          {
            puts ("socket(SOCK_NONBLOCK) failed");
            return 1;
          }
        fl = fcntl (fd, F_GETFL);
        if (fl == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode");
            return 1;
          }
        close (fd);
      
        int fds[2];
        if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
          {
            puts ("socketpair(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (fl & O_NONBLOCK)
              {
                printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_NONBLOCK, 0, fds) == -1)
          {
            puts ("socketpair(SOCK_NONBLOCK) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((fl & O_NONBLOCK) == 0)
              {
                printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        pthread_barrier_init (&b, NULL, 2);
      
        struct sockaddr_in sin;
        pthread_t th;
        if (pthread_create (&th, NULL, tf, NULL) != 0)
          {
            puts ("pthread_create failed");
            return 1;
          }
      
        int s = socket (AF_INET, SOCK_STREAM, 0);
        int reuse = 1;
        setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
        sin.sin_port = htons (PORT);
        bind (s, (struct sockaddr *) &sin, sizeof (sin));
        listen (s, SOMAXCONN);
      
        pthread_barrier_wait (&b);
      
        int s2 = paccept (s, NULL, 0, NULL, 0);
        if (s2 < 0)
          {
            puts ("paccept(0) failed");
            return 1;
          }
      
        fl = fcntl (s2, F_GETFL);
        if (fl & O_NONBLOCK)
          {
            puts ("paccept(0) set non-blocking mode");
            return 1;
          }
        close (s2);
        close (s);
      
        pthread_barrier_wait (&b);
      
        s = socket (AF_INET, SOCK_STREAM, 0);
        sin.sin_port = htons (PORT);
        setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
        bind (s, (struct sockaddr *) &sin, sizeof (sin));
        listen (s, SOMAXCONN);
      
        pthread_barrier_wait (&b);
      
        s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK);
        if (s2 < 0)
          {
            puts ("paccept(SOCK_NONBLOCK) failed");
            return 1;
          }
      
        fl = fcntl (s2, F_GETFL);
        if ((fl & O_NONBLOCK) == 0)
          {
            puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode");
            return 1;
          }
        close (s2);
        close (s);
      
        pthread_barrier_wait (&b);
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77d27200
    • U
      flag parameters: paccept w/out set_restore_sigmask · c019bbc6
      Ulrich Drepper 提交于
      Some platforms do not have support to restore the signal mask in the
      return path from a syscall.  For those platforms syscalls like pselect are
      not defined at all.  This is, I think, not a good choice for paccept()
      since paccept() adds more value on top of accept() than just the signal
      mask handling.
      
      Therefore this patch defines a scaled down version of the sys_paccept
      function for those platforms.  It returns -EINVAL in case the signal mask
      is non-NULL but behaves the same otherwise.
      
      Note that I explicitly included <linux/thread_info.h>.  I saw that it is
      currently included but indirectly two levels down.  There is too much risk
      in relying on this.  The header might change and then suddenly the
      function definition would change without anyone immediately noticing.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c019bbc6
    • U
      flag parameters: paccept · aaca0bdc
      Ulrich Drepper 提交于
      This patch is by far the most complex in the series.  It adds a new syscall
      paccept.  This syscall differs from accept in that it adds (at the userlevel)
      two additional parameters:
      
      - a signal mask
      - a flags value
      
      The flags parameter can be used to set flag like SOCK_CLOEXEC.  This is
      imlpemented here as well.  Some people argued that this is a property which
      should be inherited from the file desriptor for the server but this is against
      POSIX.  Additionally, we really want the signal mask parameter as well
      (similar to pselect, ppoll, etc).  So an interface change in inevitable.
      
      The flag value is the same as for socket and socketpair.  I think diverging
      here will only create confusion.  Similar to the filesystem interfaces where
      the use of the O_* constants differs, it is acceptable here.
      
      The signal mask is handled as for pselect etc.  The mask is temporarily
      installed for the thread and removed before the call returns.  I modeled the
      code after pselect.  If there is a problem it's likely also in pselect.
      
      For architectures which use socketcall I maintained this interface instead of
      adding a system call.  The symmetry shouldn't be broken.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <errno.h>
      #include <fcntl.h>
      #include <pthread.h>
      #include <signal.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <netinet/in.h>
      #include <sys/socket.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_paccept
      # ifdef __x86_64__
      #  define __NR_paccept 288
      # elif defined __i386__
      #  define SYS_PACCEPT 18
      #  define USE_SOCKETCALL 1
      # else
      #  error "need __NR_paccept"
      # endif
      #endif
      
      #ifdef USE_SOCKETCALL
      # define paccept(fd, addr, addrlen, mask, flags) \
        ({ long args[6] = { \
             (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
           syscall (__NR_socketcall, SYS_PACCEPT, args); })
      #else
      # define paccept(fd, addr, addrlen, mask, flags) \
        syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
      #endif
      
      #define PORT 57392
      
      #define SOCK_CLOEXEC O_CLOEXEC
      
      static pthread_barrier_t b;
      
      static void *
      tf (void *arg)
      {
        pthread_barrier_wait (&b);
        int s = socket (AF_INET, SOCK_STREAM, 0);
        struct sockaddr_in sin;
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
        sin.sin_port = htons (PORT);
        connect (s, (const struct sockaddr *) &sin, sizeof (sin));
        close (s);
      
        pthread_barrier_wait (&b);
        s = socket (AF_INET, SOCK_STREAM, 0);
        sin.sin_port = htons (PORT);
        connect (s, (const struct sockaddr *) &sin, sizeof (sin));
        close (s);
        pthread_barrier_wait (&b);
      
        pthread_barrier_wait (&b);
        sleep (2);
        pthread_kill ((pthread_t) arg, SIGUSR1);
      
        return NULL;
      }
      
      static void
      handler (int s)
      {
      }
      
      int
      main (void)
      {
        pthread_barrier_init (&b, NULL, 2);
      
        struct sockaddr_in sin;
        pthread_t th;
        if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0)
          {
            puts ("pthread_create failed");
            return 1;
          }
      
        int s = socket (AF_INET, SOCK_STREAM, 0);
        int reuse = 1;
        setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
        sin.sin_port = htons (PORT);
        bind (s, (struct sockaddr *) &sin, sizeof (sin));
        listen (s, SOMAXCONN);
      
        pthread_barrier_wait (&b);
      
        int s2 = paccept (s, NULL, 0, NULL, 0);
        if (s2 < 0)
          {
            puts ("paccept(0) failed");
            return 1;
          }
      
        int coe = fcntl (s2, F_GETFD);
        if (coe & FD_CLOEXEC)
          {
            puts ("paccept(0) set close-on-exec-flag");
            return 1;
          }
        close (s2);
      
        pthread_barrier_wait (&b);
      
        s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC);
        if (s2 < 0)
          {
            puts ("paccept(SOCK_CLOEXEC) failed");
            return 1;
          }
      
        coe = fcntl (s2, F_GETFD);
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (s2);
      
        pthread_barrier_wait (&b);
      
        struct sigaction sa;
        sa.sa_handler = handler;
        sa.sa_flags = 0;
        sigemptyset (&sa.sa_mask);
        sigaction (SIGUSR1, &sa, NULL);
      
        sigset_t ss;
        pthread_sigmask (SIG_SETMASK, NULL, &ss);
        sigaddset (&ss, SIGUSR1);
        pthread_sigmask (SIG_SETMASK, &ss, NULL);
      
        sigdelset (&ss, SIGUSR1);
        alarm (4);
        pthread_barrier_wait (&b);
      
        errno = 0 ;
        s2 = paccept (s, NULL, 0, &ss, 0);
        if (s2 != -1 || errno != EINTR)
          {
            puts ("paccept did not fail with EINTR");
            return 1;
          }
      
        close (s);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: make it compile]
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aaca0bdc
    • U
      flag parameters: socket and socketpair · a677a039
      Ulrich Drepper 提交于
      This patch adds support for flag values which are ORed to the type passwd
      to socket and socketpair.  The additional code is minimal.  The flag
      values in this implementation can and must match the O_* flags.  This
      avoids overhead in the conversion.
      
      The internal functions sock_alloc_fd and sock_map_fd get a new parameters
      and all callers are changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <netinet/in.h>
      #include <sys/socket.h>
      
      #define PORT 57392
      
      /* For Linux these must be the same.  */
      #define SOCK_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd;
        fd = socket (PF_INET, SOCK_STREAM, 0);
        if (fd == -1)
          {
            puts ("socket(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("socket(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = socket (PF_INET, SOCK_STREAM|SOCK_CLOEXEC, 0);
        if (fd == -1)
          {
            puts ("socket(SOCK_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        int fds[2];
        if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1)
          {
            puts ("socketpair(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            coe = fcntl (fds[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, fds) == -1)
          {
            puts ("socketpair(SOCK_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            coe = fcntl (fds[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a677a039
  17. 20 7月, 2008 1 次提交
  18. 17 6月, 2008 1 次提交
  19. 23 4月, 2008 1 次提交
  20. 01 4月, 2008 1 次提交
  21. 26 3月, 2008 1 次提交
  22. 22 3月, 2008 1 次提交
  23. 15 2月, 2008 1 次提交
  24. 29 1月, 2008 4 次提交
  25. 13 11月, 2007 1 次提交
  26. 30 10月, 2007 1 次提交
  27. 17 10月, 2007 2 次提交
    • D
      r/o bind mounts: filesystem helpers for custom 'struct file's · ce8d2cdf
      Dave Hansen 提交于
      Why do we need r/o bind mounts?
      
      This feature allows a read-only view into a read-write filesystem.  In the
      process of doing that, it also provides infrastructure for keeping track of
      the number of writers to any given mount.
      
      This has a number of uses.  It allows chroots to have parts of filesystems
      writable.  It will be useful for containers in the future because users may
      have root inside a container, but should not be allowed to write to
      somefilesystems.  This also replaces patches that vserver has had out of the
      tree for several years.
      
      It allows security enhancement by making sure that parts of your filesystem
      read-only (such as when you don't trust your FTP server), when you don't want
      to have entire new filesystems mounted, or when you want atime selectively
      updated.  I've been using the following script to test that the feature is
      working as desired.  It takes a directory and makes a regular bind and a r/o
      bind mount of it.  It then performs some normal filesystem operations on the
      three directories, including ones that are expected to fail, like creating a
      file on the r/o mount.
      
      This patch:
      
      Some filesystems forego the vfs and may_open() and create their own 'struct
      file's.
      
      This patch creates a couple of helper functions which can be used by these
      filesystems, and will provide a unified place which the r/o bind mount code
      may patch.
      
      Also, rename an existing, static-scope init_file() to a less generic name.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce8d2cdf
    • C
      Slab API: remove useless ctor parameter and reorder parameters · 4ba9b9d0
      Christoph Lameter 提交于
      Slab constructors currently have a flags parameter that is never used.  And
      the order of the arguments is opposite to other slab functions.  The object
      pointer is placed before the kmem_cache pointer.
      
      Convert
      
              ctor(void *object, struct kmem_cache *s, unsigned long flags)
      
      to
      
              ctor(struct kmem_cache *s, void *object)
      
      throughout the kernel
      
      [akpm@linux-foundation.org: coupla fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ba9b9d0
  28. 11 10月, 2007 1 次提交