1. 07 1月, 2013 1 次提交
    • E
      tcp: fix MSG_SENDPAGE_NOTLAST logic · ae62ca7b
      Eric Dumazet 提交于
      commit 35f9c09f (tcp: tcp_sendpages() should call tcp_push() once)
      added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
      frags but the last one for a splice() call.
      
      The condition used to set the flag in pipe_to_sendpage() relied on
      splice() user passing the exact number of bytes present in the pipe,
      or a smaller one.
      
      But some programs pass an arbitrary high value, and the test fails.
      
      The effect of this bug is a lack of tcp_push() at the end of a
      splice(pipe -> socket) call, and possibly very slow or erratic TCP
      sessions.
      
      We should both test sd->total_len and fact that another fragment
      is in the pipe (pipe->nrbufs > 1)
      
      Many thanks to Willy for providing very clear bug report, bisection
      and test programs.
      Reported-by: NWilly Tarreau <w@1wt.eu>
      Bisected-by: NWilly Tarreau <w@1wt.eu>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae62ca7b
  2. 12 12月, 2012 1 次提交
  3. 27 9月, 2012 1 次提交
  4. 31 7月, 2012 1 次提交
  5. 14 6月, 2012 1 次提交
    • E
      splice: fix racy pipe->buffers uses · 047fe360
      Eric Dumazet 提交于
      Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
      by splice_shrink_spd() called from vmsplice_to_pipe()
      
      commit 35f3d14d (pipe: add support for shrinking and growing pipes)
      added capability to adjust pipe->buffers.
      
      Problem is some paths don't hold pipe mutex and assume pipe->buffers
      doesn't change for their duration.
      
      Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
      use it in place of pipe->buffers where appropriate.
      
      splice_shrink_spd() loses its struct pipe_inode_info argument.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Tom Herbert <therbert@google.com>
      Cc: stable <stable@vger.kernel.org> # 2.6.35
      Tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      047fe360
  6. 02 6月, 2012 1 次提交
    • J
      fs: introduce inode operation ->update_time · c3b2da31
      Josef Bacik 提交于
      Btrfs has to make sure we have space to allocate new blocks in order to modify
      the inode, so updating time can fail.  We've gotten around this by having our
      own file_update_time but this is kind of a pain, and Christoph has indicated he
      would like to make xfs do something different with atime updates.  So introduce
      ->update_time, where we will deal with i_version an a/m/c time updates and
      indicate which changes need to be made.  The normal version just does what it
      has always done, updates the time and marks the inode dirty, and then
      filesystems can choose to do something different.
      
      I've gone through all of the users of file_update_time and made them check for
      errors with the exception of the fault code since it's complicated and I wasn't
      quite sure what to do there, also Jan is going to be pushing the file time
      updates into page_mkwrite for those who have it so that should satisfy btrfs and
      make it not a big deal to check the file_update_time() return code in the
      generic fault path. Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      c3b2da31
  7. 20 4月, 2012 1 次提交
    • E
      vmsplice: relax alignement requirements for SPLICE_F_GIFT · bd1a68b5
      Eric Dumazet 提交于
      It seems there is no fundamental reason to limit vmsplice()
      SPLICE_F_GIFT to page aligned chunks.
      
      All helpers are prepared to cope with offsets in page.
      
      This limitation makes vmsplice() API very impractical in the zero-copy
      land.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Changli Gao <xiaosuo@gmail.com>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bd1a68b5
  8. 06 4月, 2012 1 次提交
    • E
      tcp: tcp_sendpages() should call tcp_push() once · 35f9c09f
      Eric Dumazet 提交于
      commit 2f533844 (tcp: allow splice() to build full TSO packets) added
      a regression for splice() calls using SPLICE_F_MORE.
      
      We need to call tcp_flush() at the end of the last page processed in
      tcp_sendpages(), or else transmits can be deferred and future sends
      stall.
      
      Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
      with different semantic.
      
      For all sendpage() providers, its a transparent change. Only
      sock_sendpage() and tcp_sendpages() can differentiate the two different
      flags provided by pipe_to_sendpage()
      Reported-by: NTom Herbert <therbert@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail&gt;com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f9c09f
  9. 20 3月, 2012 1 次提交
  10. 29 2月, 2012 1 次提交
  11. 04 1月, 2012 1 次提交
  12. 26 7月, 2011 1 次提交
  13. 24 5月, 2011 1 次提交
  14. 17 12月, 2010 1 次提交
  15. 29 11月, 2010 2 次提交
    • L
      Export 'get_pipe_info()' to other users · c66fb347
      Linus Torvalds 提交于
      And in particular, use it in 'pipe_fcntl()'.
      
      The other pipe functions do not need to use the 'careful' version, since
      they are only ever called for things that are already known to be pipes.
      
      The normal read/write/ioctl functions are called through the file
      operations structures, so if a file isn't a pipe, they'd never get
      called.  But pipe_fcntl() is special, and called directly from the
      generic fcntl code, and needs to use the same careful function that the
      splice code is using.
      
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c66fb347
    • L
      Rename 'pipe_info()' to 'get_pipe_info()' · 71993e62
      Linus Torvalds 提交于
      .. and change it to take the 'file' pointer instead of an inode, since
      that's what all users want anyway.
      
      The renaming is preparatory to exporting it to other users.  The old
      'pipe_info()' name was too generic and is already used elsewhere, so
      before making the function public we need to use a more specific name.
      
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71993e62
  16. 08 8月, 2010 2 次提交
  17. 30 6月, 2010 2 次提交
  18. 25 5月, 2010 1 次提交
  19. 22 5月, 2010 1 次提交
  20. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  21. 04 11月, 2009 1 次提交
  22. 14 9月, 2009 1 次提交
    • J
      vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode · 148f948b
      Jan Kara 提交于
      Introduce new function for generic inode syncing (vfs_fsync_range) and use
      it from fsync() path. Introduce also new helper for syncing after a sync
      write (generic_write_sync) using the generic function.
      
      Use these new helpers for syncing from generic VFS functions. This makes
      O_SYNC writes to block devices acquire i_mutex for syncing. If we really
      care about this, we can make block_fsync() drop the i_mutex and reacquire
      it before it returns.
      
      CC: Evgeniy Polyakov <zbr@ioremap.net>
      CC: ocfs2-devel@oss.oracle.com
      CC: Joel Becker <joel.becker@oracle.com>
      CC: Felix Blyakher <felixb@sgi.com>
      CC: xfs@oss.sgi.com
      CC: Anton Altaparmakov <aia21@cantab.net>
      CC: linux-ntfs-dev@lists.sourceforge.net
      CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      CC: linux-ext4@vger.kernel.org
      CC: tytso@mit.edu
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      148f948b
  23. 11 9月, 2009 1 次提交
  24. 19 5月, 2009 1 次提交
    • M
      splice: fix kmaps in default_file_splice_write() · b2858d7d
      Miklos Szeredi 提交于
      Unfortunately multiple kmap() within a single thread are deadlockable,
      so writing out multiple buffers with writev() isn't possible.
      
      Change the implementation so that it does a separate write() for each
      buffer.  This actually simplifies the code a lot since the
      splice_from_pipe() helper can be used.
      
      This limitation is caused by HIGHMEM pages, and so only affects a
      subset of architectures and configurations.  In the future it may be
      worth to implement default_file_splice_write() in a more efficient way
      on configs that allow it.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b2858d7d
  25. 14 5月, 2009 1 次提交
  26. 13 5月, 2009 1 次提交
  27. 11 5月, 2009 3 次提交
    • M
      splice: implement default splice_write method · 0b0a47f5
      Miklos Szeredi 提交于
      If f_op->splice_write() is not implemented, fall back to a plain write.
      Use vfs_writev() to write from the pipe buffers.
      
      This will allow splice on all filesystems and file types.  This
      includes "direct_io" files in fuse which bypass the page cache.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0b0a47f5
    • M
      splice: implement default splice_read method · 6818173b
      Miklos Szeredi 提交于
      If f_op->splice_read() is not implemented, fall back to a plain read.
      Use vfs_readv() to read into previously allocated pages.
      
      This will allow splice and functions using splice, such as the loop
      device, to work on all filesystems.  This includes "direct_io" files
      in fuse which bypass the page cache.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      6818173b
    • M
      splice: implement pipe to pipe splicing · 7c77f0b3
      Miklos Szeredi 提交于
      Allow splice(2) to work when both the input and the output is a pipe.
      
      Based on the impementation of the tee(2) syscall, but instead of
      duplicating the buffer references move the buffers from the input pipe
      to the output pipe.
      
      Moving the whole buffer only succeeds if the full length of the buffer
      is spliced.  Otherwise duplicate the buffer, just like tee(2), set the
      length of the output buffer and advance the offset on the input
      buffer.
      
      Since splice is operating on two pipes, special care needs to be taken
      with locking to prevent AN ABBA deadlock.  Again this is done
      similarly to the tee(2) syscall, first preparing the input and output
      pipes so there's data to consume and space for that data, and then
      doing the move operation while holding both locks.
      
      If other processes are doing I/O on the same pipes parallel to the
      splice, then by the time both inodes are locked there might be no
      buffers left to move, or no space to move them to.  In this case retry
      the whole operation, including the preparation phase.  This could lead
      to starvation, but I'm not sure if that's serious enough to worry
      about.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7c77f0b3
  28. 17 4月, 2009 1 次提交
  29. 15 4月, 2009 6 次提交
  30. 07 4月, 2009 1 次提交
    • M
      splice: fix deadlock in splicing to file · 7bfac9ec
      Miklos Szeredi 提交于
      There's a possible deadlock in generic_file_splice_write(),
      splice_from_pipe() and ocfs2_file_splice_write():
      
       - task A calls generic_file_splice_write()
       - this calls inode_double_lock(), which locks i_mutex on both
         pipe->inode and target inode
       - ordering depends on inode pointers, can happen that pipe->inode is
         locked first
       - __splice_from_pipe() needs more data, calls pipe_wait()
       - this releases lock on pipe->inode, goes to interruptible sleep
       - task B calls generic_file_splice_write(), similarly to the first
       - this locks pipe->inode, then tries to lock inode, but that is
         already held by task A
       - task A is interrupted, it tries to lock pipe->inode, but fails, as
         it is already held by task B
       - ABBA deadlock
      
      Fix this by explicitly ordering locks: the outer lock must be on
      target inode and the inner lock (which is later unlocked and relocked)
      must be on pipe->inode.  This is OK, pipe inodes and target inodes
      form two nonoverlapping sets, generic_file_splice_write() and friends
      are not called with a target which is a pipe.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bfac9ec