1. 28 7月, 2010 2 次提交
  2. 16 7月, 2010 1 次提交
  3. 13 3月, 2010 2 次提交
  4. 11 12月, 2009 1 次提交
  5. 13 10月, 2009 1 次提交
    • A
      net: Introduce recvmmsg socket syscall · a2e27255
      Arnaldo Carvalho de Melo 提交于
      Meaning receive multiple messages, reducing the number of syscalls and
      net stack entry/exit operations.
      
      Next patches will introduce mechanisms where protocols that want to
      optimize this operation will provide an unlocked_recvmsg operation.
      
      This takes into account comments made by:
      
      . Paul Moore: sock_recvmsg is called only for the first datagram,
        sock_recvmsg_nosec is used for the rest.
      
      . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
        works in the same fashion as the ppoll one.
      
        If the underlying protocol returns a datagram with MSG_OOB set, this
        will make recvmmsg return right away with as many datagrams (+ the OOB
        one) it has received so far.
      
      . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
        datagrams and then recvmsg returns an error, recvmmsg will return
        the successfully received datagrams, store the error and return it
        in the next call.
      
      This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
      where we will be able to acquire the lock only at batch start and end, not at
      every underlying recvmsg call.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2e27255
  6. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  7. 01 5月, 2009 1 次提交
  8. 03 4月, 2009 1 次提交
    • G
      preadv/pwritev: Add preadv and pwritev system calls. · f3554f4b
      Gerd Hoffmann 提交于
      This patch adds preadv and pwritev system calls.  These syscalls are a
      pretty straightforward combination of pread and readv (same for write).
      They are quite useful for doing vectored I/O in threaded applications.
      Using lseek+readv instead opens race windows you'll have to plug with
      locking.
      
      Other systems have such system calls too, for example NetBSD, check
      here: http://www.daemon-systems.org/man/preadv.2.html
      
      The application-visible interface provided by glibc should look like
      this to be compatible to the existing implementations in the *BSD family:
      
        ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
        ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);
      
      This prototype has one problem though: On 32bit archs is the (64bit)
      offset argument unaligned, which the syscall ABI of several archs doesn't
      allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
      we'll need a wrappers in glibc anyway I've decided to push problem to
      glibc entriely and use a syscall prototype which works without
      arch-specific wrappers inside the kernel: The offset argument is
      explicitly splitted into two 32bit values.
      
      The patch sports the actual system call implementation and the windup in
      the x86 system call tables.  Other archs follow as separate patches.
      Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3554f4b
  9. 11 2月, 2009 1 次提交
  10. 14 1月, 2009 1 次提交
  11. 08 12月, 2008 1 次提交
    • I
      performance counters: x86 support · 241771ef
      Ingo Molnar 提交于
      Implement performance counters for x86 Intel CPUs.
      
      It's simplified right now: the PERFMON CPU feature is assumed,
      which is available in Core2 and later Intel CPUs.
      
      The design is flexible to be extended to more CPU types as well.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      241771ef
  12. 25 7月, 2008 7 次提交
    • U
      flag parameters add-on: remove epoll_create size param · 9fe5ad9c
      Ulrich Drepper 提交于
      Remove the size parameter from the new epoll_create syscall and renames the
      syscall itself.  The updated test program follows.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_epoll_create2
      # ifdef __x86_64__
      #  define __NR_epoll_create2 291
      # elif defined __i386__
      #  define __NR_epoll_create2 329
      # else
      #  error "need __NR_epoll_create2"
      # endif
      #endif
      
      #define EPOLL_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_epoll_create2, 0);
        if (fd == -1)
          {
            puts ("epoll_create2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("epoll_create2(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_epoll_create2, EPOLL_CLOEXEC);
        if (fd == -1)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fe5ad9c
    • U
      flag parameters: inotify_init · 4006553b
      Ulrich Drepper 提交于
      This patch introduces the new syscall inotify_init1 (note: the 1 stands for
      the one parameter the syscall takes, as opposed to no parameter before).  The
      values accepted for this parameter are function-specific and defined in the
      inotify.h header.  Here the values must match the O_* flags, though.  In this
      patch CLOEXEC support is introduced.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_inotify_init1
      # ifdef __x86_64__
      #  define __NR_inotify_init1 294
      # elif defined __i386__
      #  define __NR_inotify_init1 332
      # else
      #  error "need __NR_inotify_init1"
      # endif
      #endif
      
      #define IN_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd;
        fd = syscall (__NR_inotify_init1, 0);
        if (fd == -1)
          {
            puts ("inotify_init1(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("inotify_init1(0) set close-on-exit");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
        if (fd == -1)
          {
            puts ("inotify_init1(IN_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4006553b
    • U
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper 提交于
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
          {
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8cae8b
    • U
      flag parameters: dup2 · 336dd1f7
      Ulrich Drepper 提交于
      This patch adds the new dup3 syscall.  It extends the old dup2 syscall by one
      parameter which is meant to hold a flag value.  Support for the O_CLOEXEC flag
      is added in this patch.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_dup3
      # ifdef __x86_64__
      #  define __NR_dup3 292
      # elif defined __i386__
      #  define __NR_dup3 330
      # else
      #  error "need __NR_dup3"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd = syscall (__NR_dup3, 1, 4, 0);
        if (fd == -1)
          {
            puts ("dup3(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("dup3(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC);
        if (fd == -1)
          {
            puts ("dup3(O_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("dup3(O_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      336dd1f7
    • U
      flag parameters: epoll_create · a0998b50
      Ulrich Drepper 提交于
      This patch adds the new epoll_create2 syscall.  It extends the old epoll_create
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is EPOLL_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name EPOLL_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_epoll_create2
      # ifdef __x86_64__
      #  define __NR_epoll_create2 291
      # elif defined __i386__
      #  define __NR_epoll_create2 329
      # else
      #  error "need __NR_epoll_create2"
      # endif
      #endif
      
      #define EPOLL_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_epoll_create2, 1, 0);
        if (fd == -1)
          {
            puts ("epoll_create2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("epoll_create2(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_epoll_create2, 1, EPOLL_CLOEXEC);
        if (fd == -1)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0998b50
    • U
      flag parameters: eventfd · b087498e
      Ulrich Drepper 提交于
      This patch adds the new eventfd2 syscall.  It extends the old eventfd
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is EFD_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name EFD_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_eventfd2
      # ifdef __x86_64__
      #  define __NR_eventfd2 290
      # elif defined __i386__
      #  define __NR_eventfd2 328
      # else
      #  error "need __NR_eventfd2"
      # endif
      #endif
      
      #define EFD_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_eventfd2, 1, 0);
        if (fd == -1)
          {
            puts ("eventfd2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("eventfd2(0) sets close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC);
        if (fd == -1)
          {
            puts ("eventfd2(EFD_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b087498e
    • U
      flag parameters: signalfd · 9deb27ba
      Ulrich Drepper 提交于
      This patch adds the new signalfd4 syscall.  It extends the old signalfd
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is SFD_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name SFD_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <signal.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_signalfd4
      # ifdef __x86_64__
      #  define __NR_signalfd4 289
      # elif defined __i386__
      #  define __NR_signalfd4 327
      # else
      #  error "need __NR_signalfd4"
      # endif
      #endif
      
      #define SFD_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        sigset_t ss;
        sigemptyset (&ss);
        sigaddset (&ss, SIGUSR1);
        int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
        if (fd == -1)
          {
            puts ("signalfd4(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("signalfd4(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC);
        if (fd == -1)
          {
            puts ("signalfd4(SFD_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9deb27ba
  13. 06 2月, 2008 1 次提交
  14. 11 10月, 2007 2 次提交
  15. 18 7月, 2007 1 次提交
    • A
      sys_fallocate() implementation on i386, x86_64 and powerpc · 97ac7350
      Amit Arora 提交于
      fallocate() is a new system call being proposed here which will allow
      applications to preallocate space to any file(s) in a file system.
      Each file system implementation that wants to use this feature will need
      to support an inode operation called ->fallocate().
      Applications can use this feature to avoid fragmentation to certain
      level and thus get faster access speed. With preallocation, applications
      also get a guarantee of space for particular file(s) - even if later the
      the system becomes full.
      
      Currently, glibc provides an interface called posix_fallocate() which
      can be used for similar cause. Though this has the advantage of working
      on all file systems, but it is quite slow (since it writes zeroes to
      each block that has to be preallocated). Without a doubt, file systems
      can do this more efficiently within the kernel, by implementing
      the proposed fallocate() system call. It is expected that
      posix_fallocate() will be modified to call this new system call first
      and incase the kernel/filesystem does not implement it, it should fall
      back to the current implementation of writing zeroes to the new blocks.
      ToDos:
      1. Implementation on other architectures (other than i386, x86_64,
         and ppc). Patches for s390(x) and ia64 are already available from
         previous posts, but it was decided that they should be added later
         once fallocate is in the mainline. Hence not including those patches
         in this take.
      2. Changes to glibc,
         a) to support fallocate() system call
         b) to make posix_fallocate() and posix_fallocate64() call fallocate()
      Signed-off-by: NAmit Arora <aarora@in.ibm.com>
      97ac7350
  16. 11 5月, 2007 3 次提交
  17. 09 5月, 2007 1 次提交
    • U
      utimensat implementation · 1c710c89
      Ulrich Drepper 提交于
      Implement utimensat(2) which is an extension to futimesat(2) in that it
      
      a) supports nano-second resolution for the timestamps
      b) allows to selectively ignore the atime/mtime value
      c) allows to selectively use the current time for either atime or mtime
      d) supports changing the atime/mtime of a symlink itself along the lines
         of the BSD lutimes(3) functions
      
      For this change the internally used do_utimes() functions was changed to
      accept a timespec time value and an additional flags parameter.
      
      Additionally the sys_utime function was changed to match compat_sys_utime
      which already use do_utimes instead of duplicating the work.
      
      Also, the completely missing futimensat() functionality is added.  We have
      such a function in glibc but we have to resort to using /proc/self/fd/* which
      not everybody likes (chroot etc).
      
      Test application (the syscall number will need per-arch editing):
      
      #include <errno.h>
      #include <fcntl.h>
      #include <time.h>
      #include <sys/time.h>
      #include <stddef.h>
      #include <syscall.h>
      
      #define __NR_utimensat 280
      
      #define UTIME_NOW       ((1l << 30) - 1l)
      #define UTIME_OMIT      ((1l << 30) - 2l)
      
      int
      main(void)
      {
        int status = 0;
      
        int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
        if (fd == -1)
          error (1, errno, "failed to create test file \"ttt\"");
      
        struct stat64 st1;
        if (fstat64 (fd, &st1) != 0)
          error (1, errno, "fstat failed");
      
        struct timespec t[2];
        t[0].tv_sec = 0;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 0;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        struct stat64 st2;
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
          {
            puts ("atim not reset to zero");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim not reset to zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0] = st1.st_atim;
        t[1].tv_sec = 0;
        t[1].tv_nsec = UTIME_OMIT;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
            || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
          {
            puts ("atim not set");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim changed from zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = UTIME_OMIT;
        t[1] = st1.st_mtim;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
            || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
          {
            puts ("mtim changed from original time");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
            || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
          {
            puts ("mtim not set");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        sleep (2);
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = UTIME_NOW;
        t[1].tv_sec = 0;
        t[1].tv_nsec = UTIME_NOW;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        struct timeval tv;
        gettimeofday(&tv,NULL);
      
        if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
            || st2.st_atim.tv_sec > tv.tv_sec)
          {
            puts ("atim not set to NOW");
            status = 1;
          }
        if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
            || st2.st_mtim.tv_sec > tv.tv_sec)
          {
            puts ("mtim not set to NOW");
            status = 1;
          }
      
        if (symlink ("ttt", "tttsym") != 0)
          error (1, errno, "cannot create symlink");
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 0;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
          error (1, errno, "utimensat failed");
      
        if (lstat64 ("tttsym", &st2) != 0)
          error (1, errno, "lstat failed");
      
        if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
          {
            puts ("symlink atim not reset to zero");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("symlink mtim not reset to zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0].tv_sec = 1;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 1;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
          {
            puts ("atim not reset to one");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim not reset to one");
            status = 1;
          }
      
        if (status == 0)
           puts ("all OK");
      
       out:
        close (fd);
        unlink ("ttt");
        unlink ("tttsym");
      
        return status;
      }
      
      [akpm@linux-foundation.org: add missing i386 syscall table entry]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@openvz.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c710c89
  18. 12 10月, 2006 1 次提交
    • D
      [PATCH] epoll_pwait() · b611967d
      Davide Libenzi 提交于
      Implement the epoll_pwait system call, that extend the event wait mechanism
      with the same logic ppoll and pselect do.  The definition of epoll_pwait
      is:
      
      int epoll_pwait(int epfd, struct epoll_event *events, int maxevents,
                       int timeout, const sigset_t *sigmask, size_t sigsetsize);
      
      The difference between the vanilla epoll_wait and epoll_pwait is that the
      latter allows the caller to specify a signal mask to be set while waiting
      for events.  Hence epoll_pwait will wait until either one monitored event,
      or an unmasked signal happen.  If sigmask is NULL, the epoll_pwait system
      call will act exactly like epoll_wait.  For the POSIX definition of
      pselect, information is available here:
      
      http://www.opengroup.org/onlinepubs/009695399/functions/select.htmlSigned-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b611967d
  19. 26 9月, 2006 1 次提交
    • A
      [PATCH] x86: Add portable getcpu call · 3cfc348b
      Andi Kleen 提交于
      For NUMA optimization and some other algorithms it is useful to have a fast
      to get the current CPU and node numbers in user space.
      
      x86-64 added a fast way to do this in a vsyscall. This adds a generic
      syscall for other architectures to make it a generic portable facility.
      
      I expect some of them will also implement it as a faster vsyscall.
      
      The cache is an optimization for the x86-64 vsyscall optimization. Since
      what the syscall returns is an approximation anyways and user space
      often wants very fast results it can be cached for some time.  The norma
      methods to get this information in user space are relatively slow
      
      The vsyscall is in a better position to manage the cache because it has direct
      access to a fast time stamp (jiffies). For the generic syscall optimization
      it doesn't help much, but enforce a valid argument to keep programs
      portable
      
      I only added an i386 syscall entry for now. Other architectures can follow
      as needed.
      
      AK: Also added some cleanups from Andrew Morton
      Signed-off-by: NAndi Kleen <ak@suse.de>
      3cfc348b
  20. 23 6月, 2006 1 次提交
  21. 27 5月, 2006 1 次提交
  22. 11 4月, 2006 1 次提交
    • J
      [PATCH] splice: add support for sys_tee() · 70524490
      Jens Axboe 提交于
      Basically an in-kernel implementation of tee, which uses splice and the
      pipe buffers as an intelligent way to pass data around by reference.
      
      Where the user space tee consumes the input and produces a stdout and
      file output, this syscall merely duplicates the data inside a pipe to
      another pipe. No data is copied, the output just grabs a reference to the
      input pipe data.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      70524490
  23. 01 4月, 2006 1 次提交
    • A
      [PATCH] sys_sync_file_range() · f79e2abb
      Andrew Morton 提交于
      Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
      fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
      Reasons:
      
      - It's more flexible.  Things which would require two or three syscalls with
        fadvise() can be done in a single syscall.
      
      - Using fadvise() in this manner is something not covered by POSIX.
      
      The patch wires up the syscall for x86.
      
      The sycall is implemented in the new fs/sync.c.  The intention is that we can
      move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
      
      Documentation for the syscall is in fs/sync.c.
      
      A test app (sync_file_range.c) is in
      http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
      
      The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
      say NFS_DATA_SYNC or NFS_FILE_SYNC.  I can skip the ->fsync call for
      NFS_DATA_SYNC which is hopefully the more common."
      
      Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
      the queue is congested.  This is trivial to fix: add a new flag bit, set
      wbc->nonblocking.  But I'm not sure that we want to expose implementation
      details down to that level.
      
      Note: it's notable that we can sync an fd which wasn't opened for writing.
      Same with fsync() and fdatasync()).
      
      Note: the code takes some care to handle attempts to sync file contents
      outside the 16TB offset on 32-bit machines.  It makes such attempts appear to
      succeed, for best 32-bit/64-bit compatibility.  Perhaps it should make such
      requests fail...
      
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f79e2abb
  24. 31 3月, 2006 1 次提交
    • J
      [PATCH] Introduce sys_splice() system call · 5274f052
      Jens Axboe 提交于
      This adds support for the sys_splice system call. Using a pipe as a
      transport, it can connect to files or sockets (latter as output only).
      
      From the splice.c comments:
      
         "splice": joining two ropes together by interweaving their strands.
      
         This is the "extended pipe" functionality, where a pipe is used as
         an arbitrary in-memory buffer. Think of a pipe as a small kernel
         buffer that you can use to transfer data from one end to the other.
      
         The traditional unix read/write is extended with a "splice()" operation
         that transfers data buffers to or from a pipe buffer.
      
         Named by Larry McVoy, original implementation from Linus, extended by
         Jens to support splicing to files and fixing the initial implementation
         bugs.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5274f052
  25. 28 3月, 2006 1 次提交
  26. 12 2月, 2006 1 次提交
    • U
      [PATCH] fstatat64 support · cff2b760
      Ulrich Drepper 提交于
      The *at patches introduced fstatat and, due to inusfficient research, I
      used the newfstat functions generally as the guideline.  The result is that
      on 32-bit platforms we don't have all the information needed to implement
      fstatat64.
      
      This patch modifies the code to pass up 64-bit information if
      __ARCH_WANT_STAT64 is defined.  I renamed the syscall entry point to make
      this clear.  Other archs will continue to use the existing code.  On x86-64
      the compat code is implemented using a new sys32_ function.  this is what
      is done for the other stat syscalls as well.
      
      This patch might break some other archs (those which define
      __ARCH_WANT_STAT64 and which already wired up the syscall).  Yet others
      might need changes to accomodate the compatibility mode.  I really don't
      want to do that work because all this stat handling is a mess (more so in
      glibc, but the kernel is also affected).  It should be done by the arch
      maintainers.  I'll provide some stand-alone test shortly.  Those who are
      eager could compile glibc and run 'make check' (no installation needed).
      
      The patch below has been tested on x86 and x86-64.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cff2b760
  27. 08 2月, 2006 1 次提交
  28. 19 1月, 2006 2 次提交