1. 27 7月, 2008 1 次提交
  2. 25 7月, 2008 2 次提交
    • U
      flag parameters: NONBLOCK in pipe · be61a86d
      Ulrich Drepper 提交于
      This patch adds O_NONBLOCK support to pipe2.  It is minimally more involved
      than the patches for eventfd et.al but still trivial.  The interfaces of the
      create_write_pipe and create_read_pipe helper functions were changed and the
      one other caller as well.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fds[2];
        if (syscall (__NR_pipe2, fds, 0) == -1)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (fl & O_NONBLOCK)
              {
                printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1)
          {
            puts ("pipe2(O_NONBLOCK) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int fl = fcntl (fds[i], F_GETFL);
            if (fl == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((fl & O_NONBLOCK) == 0)
              {
                printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i);
                return 1;
              }
            close (fds[i]);
          }
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be61a86d
    • U
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper 提交于
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
          {
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8cae8b
  3. 23 6月, 2008 1 次提交
  4. 09 5月, 2008 1 次提交
  5. 04 5月, 2008 1 次提交
    • U
      unified (weak) sys_pipe implementation · d35c7b0e
      Ulrich Drepper 提交于
      This replaces the duplicated arch-specific versions of "sys_pipe()" with
      one unified implementation.  This removes almost 250 lines of duplicated
      code.
      
      It's marked __weak, so that *if* an architecture wants to override the
      default implementation it can do so by simply having its own replacement
      version, since many architectures use alternate calling conventions for
      the 'pipe()' system call for legacy reasons (ie traditional UNIX
      implementations often return the two file descriptors in registers)
      
      I still haven't changed the cris version even though Linus says the BKL
      isn't needed.  The arch maintainer can easily do it if there are really
      no obstacles.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d35c7b0e
  6. 23 4月, 2008 1 次提交
  7. 19 3月, 2008 1 次提交
  8. 14 2月, 2008 1 次提交
  9. 09 2月, 2008 1 次提交
  10. 15 10月, 2007 2 次提交
  11. 27 7月, 2007 1 次提交
  12. 10 7月, 2007 2 次提交
  13. 09 5月, 2007 1 次提交
    • E
      VFS: delay the dentry name generation on sockets and pipes · c23fbb6b
      Eric Dumazet 提交于
      1) Introduces a new method in 'struct dentry_operations'.  This method
         called d_dname() might be called from d_path() to build a pathname for
         special filesystems.  It is called without locks.
      
         Future patches (if we succeed in having one common dentry for all
         pipes/sockets) may need to change prototype of this method, but we now
         use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);
      
      2) Adds a dynamic_dname() helper function that eases d_dname() implementations
      
      3) Defines d_dname method for sockets : No more sprintf() at socket
         creation.  This is delayed up to the moment someone does an access to
         /proc/pid/fd/...
      
      4) Defines d_dname method for pipes : No more sprintf() at pipe
         creation.  This is delayed up to the moment someone does an access to
         /proc/pid/fd/...
      
      A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
      *nice* speedup on my Pentium(M) 1.6 Ghz :
      
      3.090 s instead of 3.450 s
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c23fbb6b
  14. 18 2月, 2007 1 次提交
    • A
      [PATCH] AUDIT_FD_PAIR · db349509
      Al Viro 提交于
      Provide an audit record of the descriptor pair returned by pipe() and
      socketpair().  Rewritten from the original posted to linux-audit by
      John D. Ramsdell <ramsdell@mitre.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      db349509
  15. 21 12月, 2006 1 次提交
  16. 14 12月, 2006 1 次提交
  17. 09 12月, 2006 1 次提交
  18. 08 12月, 2006 1 次提交
    • E
      [PATCH] don't insert pipe dentries into dentry_hashtable. · d18de5a2
      Eric Dumazet 提交于
      We currently insert pipe dentries into the global dentry hashtable.  This
      is suboptimal because there is currently no way these entries can be used
      for a lookup().  (/proc/xxx/fd/xxx uses a different mechanism).  Inserting
      them in dentry hashtable slows dcache lookups.
      
      To let __dpath() still work correctly (ie not adding a " (deleted)") after
      dentry name, we do :
      
       - Right after d_alloc(), pretend they are hashed by clearing the
         DCACHE_UNHASHED bit.
      
       - Call d_instantiate() instead of d_add() : dentry is not inserted in
         hash table.
      
      __dpath() & friends work as intended during dentry lifetime.
      
       - At dismantle time, once dput() must clear the dentry, setting again
         DCACHE_UNHASHED bit inside the custom d_delete() function provided by
         pipe code, so that dput() can just kill_it.
      
      This patch, combined with (avoid RCU for never hashed dentries) reduced
      time of { pipe(p); close(p[0]); close(p[1]);} on my UP machine (1.6GHz
      Pentium-M) from 3.23 us to 2.86 us (But this patch does not depend on other
      patches, only bench results)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d18de5a2
  19. 01 10月, 2006 2 次提交
  20. 27 9月, 2006 1 次提交
  21. 23 6月, 2006 1 次提交
    • D
      [PATCH] VFS: Permit filesystem to override root dentry on mount · 454e2398
      David Howells 提交于
      Extend the get_sb() filesystem operation to take an extra argument that
      permits the VFS to pass in the target vfsmount that defines the mountpoint.
      
      The filesystem is then required to manually set the superblock and root dentry
      pointers.  For most filesystems, this should be done with simple_set_mnt()
      which will set the superblock pointer and then set the root dentry to the
      superblock's s_root (as per the old default behaviour).
      
      The get_sb() op now returns an integer as there's now no need to return the
      superblock pointer.
      
      This patch permits a superblock to be implicitly shared amongst several mount
      points, such as can be done with NFS to avoid potential inode aliasing.  In
      such a case, simple_set_mnt() would not be called, and instead the mnt_root
      and mnt_sb would be set directly.
      
      The patch also makes the following changes:
      
       (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
           pointer argument and return an integer, so most filesystems have to change
           very little.
      
       (*) If one of the convenience function is not used, then get_sb() should
           normally call simple_set_mnt() to instantiate the vfsmount. This will
           always return 0, and so can be tail-called from get_sb().
      
       (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
           dcache upon superblock destruction rather than shrink_dcache_anon().
      
           This is required because the superblock may now have multiple trees that
           aren't actually bound to s_root, but that still need to be cleaned up. The
           currently called functions assume that the whole tree is rooted at s_root,
           and that anonymous dentries are not the roots of trees which results in
           dentries being left unculled.
      
           However, with the way NFS superblock sharing are currently set to be
           implemented, these assumptions are violated: the root of the filesystem is
           simply a dummy dentry and inode (the real inode for '/' may well be
           inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
           with child trees.
      
           [*] Anonymous until discovered from another tree.
      
       (*) The documentation has been adjusted, including the additional bit of
           changing ext2_* into foo_* in the documentation.
      
      [akpm@osdl.org: convert ipath_fs, do other stuff]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      454e2398
  22. 02 5月, 2006 4 次提交
    • J
      [PATCH] vmsplice: restrict stealing a little more · 330ab716
      Jens Axboe 提交于
      Apply the same rules as the anon pipe pages, only allow stealing
      if no one else is using the page.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      330ab716
    • J
      [PATCH] pipe: enable atomic copying of pipe data to/from user space · f6762b7a
      Jens Axboe 提交于
      The pipe ->map() method uses kmap() to virtually map the pages, which
      is both slow and has known scalability issues on SMP. This patch enables
      atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
      instead.
      
      lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
      are results from that on a UP machine with highmem (1.5GiB of RAM), running
      first a UP kernel, SMP kernel, and SMP kernel patched.
      
      Vanilla-UP:
      Pipe bandwidth: 1622.28 MB/sec
      Pipe bandwidth: 1610.59 MB/sec
      Pipe bandwidth: 1608.30 MB/sec
      Pipe latency: 7.3275 microseconds
      Pipe latency: 7.2995 microseconds
      Pipe latency: 7.3097 microseconds
      
      Vanilla-SMP:
      Pipe bandwidth: 1382.19 MB/sec
      Pipe bandwidth: 1317.27 MB/sec
      Pipe bandwidth: 1355.61 MB/sec
      Pipe latency: 9.6402 microseconds
      Pipe latency: 9.6696 microseconds
      Pipe latency: 9.6153 microseconds
      
      Patched-SMP:
      Pipe bandwidth: 1578.70 MB/sec
      Pipe bandwidth: 1579.95 MB/sec
      Pipe bandwidth: 1578.63 MB/sec
      Pipe latency: 9.1654 microseconds
      Pipe latency: 9.2266 microseconds
      Pipe latency: 9.1527 microseconds
      Signed-off-by: NJens Axboe <axboe@suse.de>
      f6762b7a
    • J
      [PATCH] pipe: introduce ->pin() buffer operation · f84d7519
      Jens Axboe 提交于
      The ->map() function is really expensive on highmem machines right now,
      since it has to use the slower kmap() instead of kmap_atomic(). Splice
      rarely needs to access the virtual address of a page, so it's a waste
      of time doing it.
      
      Introduce ->pin() to take over the responsibility of making sure the
      page data is valid. ->map() is then reduced to just kmap(). That way we
      can also share a most of the pipe buffer ops between pipe.c and splice.c
      Signed-off-by: NJens Axboe <axboe@suse.de>
      f84d7519
    • J
      [PATCH] splice: fix bugs in pipe_to_file() · 0568b409
      Jens Axboe 提交于
      Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.
      
      - Only allow full pages to go to the page cache.
      - Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
      - Remember to clear 'stolen' if add_to_page_cache() fails.
      
      And as a cleanup on that:
      
      - Make the bottom fall-through logic a little less convoluted. Also make
        the steal path hold an extra reference to the page, so we don't have
        to differentiate between stolen and non-stolen at the end.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      0568b409
  23. 30 4月, 2006 1 次提交
  24. 11 4月, 2006 5 次提交
  25. 10 4月, 2006 1 次提交
    • I
      [PATCH] introduce a "kernel-internal pipe object" abstraction · 3a326a2c
      Ingo Molnar 提交于
      separate out the 'internal pipe object' abstraction, and make it
      usable to splice. This cleans up and fixes several aspects of the
      internal splice APIs and the pipe code:
      
       - pipes: the allocation and freeing of pipe_inode_info is now more symmetric
         and more streamlined with existing kernel practices.
      
       - splice: small micro-optimization: less pointer dereferencing in splice
         methods
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      Update XFS for the ->splice_read/->splice_write changes.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      3a326a2c
  26. 03 4月, 2006 2 次提交
  27. 31 3月, 2006 2 次提交
    • J
      [PATCH] splice: add support for SPLICE_F_MOVE flag · 5abc97aa
      Jens Axboe 提交于
      This enables the caller to migrate pages from one address space page
      cache to another.  In buzz word marketing, you can do zero-copy file
      copies!
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5abc97aa
    • J
      [PATCH] Introduce sys_splice() system call · 5274f052
      Jens Axboe 提交于
      This adds support for the sys_splice system call. Using a pipe as a
      transport, it can connect to files or sockets (latter as output only).
      
      From the splice.c comments:
      
         "splice": joining two ropes together by interweaving their strands.
      
         This is the "extended pipe" functionality, where a pipe is used as
         an arbitrary in-memory buffer. Think of a pipe as a small kernel
         buffer that you can use to transfer data from one end to the other.
      
         The traditional unix read/write is extended with a "splice()" operation
         that transfers data buffers to or from a pipe buffer.
      
         Named by Larry McVoy, original implementation from Linus, extended by
         Jens to support splicing to files and fixing the initial implementation
         bugs.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5274f052