提交 · c725bfce7968009756ed2836a8cd7ba4dc163011 · bug2833 / cloud-kernel

24 11月, 2015 1 次提交

vfs: Make sendfile(2) killable even better · c725bfce

由 Jan Kara 提交于 11月 23, 2015

Commit 296291cd (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:

        int r1 = eventfd(0, 0);
        int r2 = memfd_create("", 0);
        unsigned long n = 1<<30;
        fallocate(r2, 0, 0, n);
        sendfile(r1, r2, 0, n);

There are actually quite a few tests for pending signals in sendfile
code however we data to write is always available none of them seems to
trigger. So fix the problem by adding a test for pending signal into
splice_from_pipe_next() also before the loop waiting for pipe buffers to
be available. This should fix all the lockup issues with sendfile of the
do-ton-of-tiny-writes nature.

CC: stable@vger.kernel.org
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c725bfce

07 11月, 2015 1 次提交

mm, fs: introduce mapping_gfp_constraint() · c62d2555

由 Michal Hocko 提交于 11月 06, 2015

There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c62d2555

25 6月, 2015 1 次提交

mm: do not ignore mapping_gfp_mask in page cache allocation paths · 6afdb859

由 Michal Hocko 提交于 6月 24, 2015

page_cache_read, do_generic_file_read, __generic_file_splice_read and
__ntfs_grab_cache_pages currently ignore mapping_gfp_mask when calling
add_to_page_cache_lru which might cause recursion into fs down in the
direct reclaim path if the mapping really relies on GFP_NOFS semantic.

This doesn't seem to be the case now because page_cache_read (page fault
path) doesn't seem to suffer from the reclaim recursion issues and
do_generic_file_read and __generic_file_splice_read also shouldn't be
called under fs locks which would deadlock in the reclaim path.  Anyway it
is better to obey mapping gfp mask and prevent from later breakage.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6afdb859

25 5月, 2015 1 次提交

net: af_unix: implement splice for stream af_unix sockets · 2b514574

由 Hannes Frederic Sowa 提交于 5月 21, 2015

unix_stream_recvmsg is refactored to unix_stream_read_generic in this
patch and enhanced to deal with pipe splicing. The refactoring is
inneglible, we mostly have to deal with a non-existing struct msghdr
argument.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b514574

06 5月, 2015 1 次提交

splice: sendfile() at once fails for big files · 0ff28d9f

由 Christophe Leroy 提交于 5月 06, 2015

Using sendfile with below small program to get MD5 sums of some files,
it appear that big files (over 64kbytes with 4k pages system) get a
wrong MD5 sum while small files get the correct sum.
This program uses sendfile() to send a file to an AF_ALG socket
for hashing.

/* md5sum2.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/if_alg.h>

int main(int argc, char **argv)
{
	int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
	struct stat st;
	struct sockaddr_alg sa = {
		.salg_family = AF_ALG,
		.salg_type = "hash",
		.salg_name = "md5",
	};
	int n;

	bind(sk, (struct sockaddr*)&sa, sizeof(sa));

	for (n = 1; n < argc; n++) {
		int size;
		int offset = 0;
		char buf[4096];
		int fd;
		int sko;
		int i;

		fd = open(argv[n], O_RDONLY);
		sko = accept(sk, NULL, 0);
		fstat(fd, &st);
		size = st.st_size;
		sendfile(sko, fd, &offset, size);
		size = read(sko, buf, sizeof(buf));
		for (i = 0; i < size; i++)
			printf("%2.2x", buf[i]);
		printf("  %s\n", argv[n]);
		close(fd);
		close(sko);
	}
	exit(0);
}

Test below is done using official linux patch files. First result is
with a software based md5sum. Second result is with the program above.

root@vgoip:~# ls -l patch-3.6.*
-rw-r--r--    1 root     root         64011 Aug 24 12:01 patch-3.6.2.gz
-rw-r--r--    1 root     root         94131 Aug 24 12:01 patch-3.6.3.gz

root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz

root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
5fd77b24e68bb24dcc72d6e57c64790e  patch-3.6.3.gz

After investivation, it appears that sendfile() sends the files by blocks
of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
is reset as if it was the end of the file.

This patch adds SPLICE_F_MORE to the flags when more data is pending.

With the patch applied, we get the correct sums:

root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz

root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NJens Axboe <axboe@fb.com>

0ff28d9f

16 4月, 2015 1 次提交

dax: unify ext2/4_{dax,}_file_operations · be64f884

由 Boaz Harrosh 提交于 4月 15, 2015

The original dax patchset split the ext2/4_file_operations because of the
two NULL splice_read/splice_write in the dax case.

In the vfs if splice_read/splice_write are NULL we then call
default_splice_read/write.

What we do here is make generic_file_splice_read aware of IS_DAX() so the
original ext2/4_file_operations can be used as is.

For write it appears that iter_file_splice_write is just fine.  It uses
the regular f_op->write(file,..) or new_sync_write(file, ...).
Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be64f884

12 4月, 2015 1 次提交
- A
  vmsplice_to_user(): switch to import_iovec() · 345995fa
  由 Al Viro 提交于 3月 21, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  345995fa
26 3月, 2015 1 次提交

fs: move struct kiocb to fs.h · e2e40f2c

由 Christoph Hellwig 提交于 2月 22, 2015

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e2e40f2c

29 1月, 2015 2 次提交

fs: add vfs_iter_{read,write} helpers · dbe4e192

由 Christoph Hellwig 提交于 1月 25, 2015

Simple helpers that pass an arbitrary iov_iter to filesystems.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dbe4e192

new helper: iov_iter_bvec() · 05afcb77

由 Al Viro 提交于 1月 23, 2015

similar to iov_iter_kvec(), for ITER_BVEC ones
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

05afcb77

24 10月, 2014 1 次提交

vfs: export do_splice_direct() to modules · 1c118596

由 Miklos Szeredi 提交于 10月 24, 2014

Export do_splice_direct() to modules.  Needed by overlay filesystem.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

1c118596

12 6月, 2014 3 次提交

A
kill generic_file_splice_write() · 5f073850
由 Al Viro 提交于 4月 05, 2014
```
no callers left
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5f073850

fs/splice.c: remove unneeded exports · 96f9bc8f

由 Al Viro 提交于 4月 05, 2014

ocfs2 was using a bunch of splice.c guts...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96f9bc8f

->splice_write() via ->write_iter() · 8d020765

由 Al Viro 提交于 4月 05, 2014

iter_file_splice_write() - a ->splice_write() instance that gathers the
pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
it to ->write_iter().  A bunch of simple cases coverted to that...

[AV: fixed the braino spotted by Cyrill]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8d020765

28 5月, 2014 1 次提交

vfs: fix vmplice_to_user() · b6dd6f47

由 Miklos Szeredi 提交于 5月 27, 2014

Commit 6130f531 "switch vmsplice_to_user() to copy_page_to_iter()" in
v3.15-rc1 broke vmsplice(2).

This patch fixes two bugs:

 - count is not initialized to a proper value, which resulted in no data
   being copied

 - if rw_copy_check_uvector() returns negative then the iov might be leaked.

Tested OK.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b6dd6f47

07 5月, 2014 1 次提交

start adding the tag to iov_iter · 71d8e532

由 Al Viro 提交于 3月 05, 2014

For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment.  Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

71d8e532

02 4月, 2014 2 次提交

switch vmsplice_to_user() to copy_page_to_iter() · 6130f531

由 Al Viro 提交于 2月 03, 2014

I've switched the sanity checks on iovec to rw_copy_check_uvector();
we might need to do a local analog, if any behaviour differences are
not actually bugfixes here...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6130f531

pipe: kill ->map() and ->unmap() · fbb32750

由 Al Viro 提交于 2月 02, 2014

all pipe_buffer_operations have the same instances of those...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fbb32750

23 1月, 2014 1 次提交

fuse: fix pipe_buf_operations · 28a625cb

由 Miklos Szeredi 提交于 1月 22, 2014

Having this struct in module memory could Oops when if the module is
unloaded while the buffer still persists in a pipe.

Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
merge them into nosteal_pipe_buf_ops (this is the same as
default_pipe_buf_ops except stealing the page from the buffer is not
allowed).
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org

28a625cb

25 10月, 2013 1 次提交
- A
  file->f_op is never NULL... · 72c2d531
  由 Al Viro 提交于 9月 22, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  72c2d531
29 6月, 2013 3 次提交
- A
  splice: lift checks from do_splice_from() into callers · 18c67cb9
  由 Al Viro 提交于 6月 19, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  18c67cb9
- A
  lift file_*_write out of do_splice_direct() · 50cd2c57
  由 Al Viro 提交于 5月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  50cd2c57
- A
  lift file_*_write out of do_splice_from() · 500368f7
  由 Al Viro 提交于 5月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  500368f7
24 6月, 2013 1 次提交

fs: fix new splice.c kernel-doc warning · acdb37c3

由 Randy Dunlap 提交于 6月 22, 2013

Fix new kernel-doc warning in fs/splice.c:

  Warning(fs/splice.c:1298): No description found for parameter 'opos'
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

acdb37c3

20 6月, 2013 1 次提交
- A
  splice: don't pass the address of ->f_pos to methods · 7995bd28
  由 Al Viro 提交于 6月 20, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  7995bd28
10 4月, 2013 5 次提交

A
get rid of alloc_pipe_info() argument · 7bee130e
由 Al Viro 提交于 3月 21, 2013
```
not used anymore
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7bee130e

get rid of pipe->inode · 6447a3cf

由 Al Viro 提交于 3月 21, 2013

it's used only as a flag to distinguish normal pipes/FIFOs from the
internal per-task one used by file-to-file splice.  And pipe->files
would work just as well for that purpose...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6447a3cf

A
lift sb_start_write out of ->splice_write() · 2dd8c9ad
由 Al Viro 提交于 3月 20, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2dd8c9ad
A
lift sb_start_write into default_file_splice_write() · 17338fcc
由 Al Viro 提交于 3月 20, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
17338fcc
A
lift sb_start_write() out of ->write() · 03d95eb2
由 Al Viro 提交于 3月 20, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
03d95eb2

22 3月, 2013 1 次提交

Don't bother with redoing rw_verify_area() from default_file_splice_from() · 06ae43f3

由 Al Viro 提交于 3月 20, 2013

default_file_splice_from() ends up calling vfs_write() (via very convoluted
callchain). It's an overkill, since we already have done rw_verify_area()
in the caller by the time we call vfs_write() we are under set_fs(KERNEL_DS),
so access_ok() is also pointless. Add a new helper (__kernel_write()),
use it instead of kernel_write() in there.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

06ae43f3

04 3月, 2013 1 次提交
- A
  convert vmsplice to COMPAT_SYSCALL_DEFINE · 76b021d0
  由 Al Viro 提交于 3月 02, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  76b021d0
26 2月, 2013 1 次提交
- A
  export kernel_write(), convert open-coded instances · 7bb307e8
  由 Al Viro 提交于 2月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  7bb307e8
23 2月, 2013 1 次提交
- A
  new helper: file_inode(file) · 496ad9aa
  由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  496ad9aa
07 1月, 2013 1 次提交

tcp: fix MSG_SENDPAGE_NOTLAST logic · ae62ca7b

由 Eric Dumazet 提交于 1月 06, 2013

commit 35f9c09f (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.

The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.

But some programs pass an arbitrary high value, and the test fails.

The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.

We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)

Many thanks to Willy for providing very clear bug report, bisection
and test programs.
Reported-by: NWilly Tarreau <w@1wt.eu>
Bisected-by: NWilly Tarreau <w@1wt.eu>
Tested-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae62ca7b

12 12月, 2012 1 次提交

writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr() · d0e1d66b

由 Namjae Jeon 提交于 12月 11, 2012

There is no reason to pass the nr_pages_dirtied argument, because
nr_pages_dirtied value from the caller is unused in
balance_dirty_pages_ratelimited_nr().
Signed-off-by: NNamjae Jeon <linkinjeon@gmail.com>
Signed-off-by: NVivek Trivedi <vtrivedi018@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0e1d66b

27 9月, 2012 1 次提交
- A
  switch simple cases of fget_light to fdget · 2903ff01
  由 Al Viro 提交于 8月 28, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  2903ff01
31 7月, 2012 1 次提交

fs: Protect write paths by sb_start_write - sb_end_write · 14da9200

由 Jan Kara 提交于 6月 12, 2012

There are several entry points which dirty pages in a filesystem.  mmap
(handled by block_page_mkwrite()), buffered write (handled by
__generic_file_aio_write()), splice write (generic_file_splice_write),
truncate, and fallocate (these can dirty last partial page - handled inside
each filesystem separately). Protect these places with sb_start_write() and
sb_end_write().

->page_mkwrite() calls are particularly complex since they are called with
mmap_sem held and thus we cannot use standard sb_start_write() due to lock
ordering constraints. We solve the problem by using a special freeze protection
sb_start_pagefault() which ranks below mmap_sem.

BugLink: https://bugs.launchpad.net/bugs/897421Tested-by: NKamal Mostafa <kamal@canonical.com>
Tested-by: NPeter M. Petrakis <peter.petrakis@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NMassimo Morana <massimo.morana@canonical.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

14da9200

14 6月, 2012 1 次提交

splice: fix racy pipe->buffers uses · 047fe360

由 Eric Dumazet 提交于 6月 12, 2012

Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()

commit 35f3d14d (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.

Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.

Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.

splice_shrink_spd() loses its struct pipe_inode_info argument.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Cc: stable <stable@vger.kernel.org> # 2.6.35
Tested-by: NDave Jones <davej@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

047fe360

02 6月, 2012 1 次提交

fs: introduce inode operation ->update_time · c3b2da31

由 Josef Bacik 提交于 3月 26, 2012

Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

c3b2da31

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致