提交 · 32a56afa23e157b444b6c2b943322ea0d119517b · openeuler / Kernel

12 4月, 2015 14 次提交

A
aio_setup_vectored_rw(): switch to {compat_,}import_iovec() · 32a56afa
由 Al Viro 提交于 9年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
32a56afa
A
vmsplice_to_user(): switch to import_iovec() · 345995fa
由 Al Viro 提交于 9年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
345995fa

kill aio_setup_single_vector() · d4fb392f

由 Al Viro 提交于 9年前

identical to import_single_range()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d4fb392f

aio: simplify arguments of aio_setup_..._rw() · a96114fa

由 Al Viro 提交于 9年前

We don't need req in either of those. We don't need nr_segs in caller.
We don't really need len in caller either - iov_iter_count(&iter) will do.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a96114fa

aio: lift iov_iter_init() into aio_setup_..._rw() · 4c185ce0

由 Al Viro 提交于 9年前

the only non-trivial detail is that we do it before rw_verify_area(),
so we'd better cap the length ourselves in aio_setup_single_rw()
case (for vectored case rw_copy_check_uvector() will do that for us).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4c185ce0

lift iov_iter into {compat_,}do_readv_writev() · ac15ac06

由 Al Viro 提交于 9年前

get it closer to matching {compat_,}rw_copy_check_uvector().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ac15ac06

NFS: fix BUG() crash in notify_change() with patch to chown_common() · c1b8940b

由 Andrew Elble 提交于 9年前

We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
occurs during an rsync into a filesystem that is exported via NFS.

1.) fs/attr.c:notify_change() modifies the caller's version of attr.
2.) 6de0ec00 ("VFS: make notify_change pass ATTR_KILL_S*ID to
    setattr operations") introduced a BUG() restriction such that "no
    function will ever call notify_change() with both ATTR_MODE and
    ATTR_KILL_S*ID set". Under some circumstances though, it will have
    assisted in setting the caller's version of attr to this very
    combination.
3.) 27ac0ffe ("locks: break delegations on any attribute
    modification") introduced code to handle breaking
    delegations. This can result in notify_change() being re-called. attr
    _must_ be explicitly reset to avoid triggering the BUG() established
    in #2.
4.) The path that that triggers this is via fs/open.c:chmod_common().
    The combination of attr flags set here and in the first call to
    notify_change() along with a later failed break_deleg_wait()
    results in notify_change() being called again via retry_deleg
    without resetting attr.

Solution is to move retry_deleg in chmod_common() a bit further up to
ensure attr is completely reset.

There are other places where this seemingly could occur, such as
fs/utimes.c:utimes_common(), but the attr flags are not initially
set in such a way to trigger this.

Fixes: 27ac0ffe ("locks: break delegations on any attribute modification")
Reported-by: NEric Meddaugh <etmsys@rit.edu>
Tested-by: NEric Meddaugh <etmsys@rit.edu>
Signed-off-by: NAndrew Elble <aweits@rit.edu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c1b8940b

dcache: return -ESTALE not -EBUSY on distributed fs race · 3d330dc1

由 J. Bruce Fields 提交于 9年前

On a distributed filesystem it's possible for lookup to discover that a
directory it just found is already cached elsewhere in the directory
heirarchy.  The dcache won't let us keep the directory in both places,
so we have to move the dentry to the new location from the place we
previously had it cached.

If the parent has changed, then this requires all the same locks as we'd
need to do a cross-directory rename.  But we're already in lookup
holding one parent's i_mutex, so it's too late to acquire those locks in
the right order.

The (unreliable) solution in __d_unalias is to trylock() the required
locks and return -EBUSY if it fails.

I see no particular reason for returning -EBUSY, and -ESTALE is already
the result of some other lookup races on NFS.  I think -ESTALE is the
more helpful error return.  It also allows us to take advantage of the
logic Jeff Layton added in c6a94284 "vfs: fix renameat to retry on
ESTALE errors" and ancestors, which hopefully resolves some of these
errors before they're returned to userspace.

I can reproduce these cases using NFS with:

	ssh root@$client '
		mount -olookupcache=pos '$server':'$export' /mnt/
		mkdir /mnt/TO
		mkdir /mnt/DIR
		touch /mnt/DIR/test.txt
		while true; do
			strace -e open cat /mnt/DIR/test.txt 2>&1 | grep EBUSY
		done
	'
	ssh root@$server '
		while true; do
			mv $export/DIR $export/TO/DIR
			mv $export/TO/DIR $export/DIR
		done
	'

It also helps to add some other concurrent use of the directory on the
client (e.g., "ls /mnt/TO").  And you can replace the server-side mv's
by client-side mv's that are repeatedly killed.  (If the client is
interrupted while waiting for the RENAME response then it's left with a
dentry that has to go under one parent or the other, but it doesn't yet
know which.)
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3d330dc1

A
NTFS: Version 2.1.32 - Update file write from aio_write to write_iter. · a632f559
由 Anton Altaparmakov 提交于 9年前
```
Signed-off-by: NAnton Altaparmakov <anton@tuxera.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a632f559

drop bogus check in file_open_root() · e5b811e3

由 Al Viro 提交于 9年前

For one thing, LOOKUP_DIRECTORY will be dealt with in do_last().
For another, name can be an empty string, but not NULL - no callers
pass that and it would oops immediately if they would.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e5b811e3

A
switch security_inode_getattr() to struct path * · 3f7036a0
由 Al Viro 提交于 9年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3f7036a0
A
remove incorrect comment in lookup_one_len() · 9e7543e9
由 Al Viro 提交于 9年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9e7543e9
A
namei.c: fold do_path_lookup() into both callers · 74eb8cc5
由 Al Viro 提交于 9年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
74eb8cc5

kill struct filename.separate · fd2f7cb5

由 Al Viro 提交于 9年前

just make const char iname[] the last member and compare name->name with
name->iname instead of checking name->separate

We need to make sure that out-of-line name doesn't end up allocated adjacent
to struct filename refering to it; fortunately, it's easy to achieve - just
allocate that struct filename with one byte in ->iname[], so that ->iname[0]
will be inside the same object and thus have an address different from that
of out-of-line name [spotted by Boqun Feng <boqun.feng@gmail.com>]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fd2f7cb5

26 3月, 2015 1 次提交

fs: move struct kiocb to fs.h · e2e40f2c

由 Christoph Hellwig 提交于 9年前

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e2e40f2c

25 3月, 2015 4 次提交
- A
  switch path_init() to struct filename · 6e8a1f87
  由 Al Viro 提交于 9年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  6e8a1f87
- A
  switch path_mountpoint() to struct filename · 668696dc
  由 Al Viro 提交于 9年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  668696dc
- A
  switch path_lookupat() to struct filename · 5eb6b495
  由 Al Viro 提交于 9年前
```
all callers were passing it ->name of some struct filename
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  5eb6b495
- A
  getname_flags(): clean up a bit · 94b5d262
  由 Al Viro 提交于 9年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  94b5d262
20 3月, 2015 1 次提交

Subject: nfsd: don't recursively call nfsd4_cb_layout_fail · 133d5582

由 Christoph Hellwig 提交于 9年前

Due to a merge error when creating c5c707f9 ("nfsd: implement pNFS
layout recalls"), we recursively call nfsd4_cb_layout_fail from itself,
leading to stack overflows.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Fixes:  c5c707f9 ("nfsd: implement pNFS layout recalls")
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/nfs4layouts.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 3c1bfa1..1028a06 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)

 	rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));

-	nfsd4_cb_layout_fail(ls);
-
 	printk(KERN_WARNING
 		"nfsd: client %s failed to respond to layout recall. "
 		"  Fencing..\n", addr_str);
--
1.9.1

133d5582

19 3月, 2015 1 次提交

fuse: explicitly set /dev/fuse file's private_data · 94e4fe2c

由 Tom Van Braeckel 提交于 10年前

The misc subsystem (which is used for /dev/fuse) initializes private_data to
point to the misc device when a driver has registered a custom open file
operation, and initializes it to NULL when a custom open file operation has
*not* been provided.

This subtle quirk is confusing, to the point where kernel code registers
*empty* file open operations to have private_data point to the misc device
structure. And it leads to bugs, where the addition or removal of a custom open
file operation surprisingly changes the initial contents of a file's
private_data structure.

So to simplify things in the misc subsystem, a patch [1] has been proposed to
*always* set the private_data to point to the misc device, instead of only
doing this when a custom open file operation has been registered.

But before this patch can be applied we need to modify drivers that make the
assumption that a misc device file's private_data is initialized to NULL
because they didn't register a custom open file operation, so they don't rely
on this assumption anymore. FUSE uses private_data to store the fuse_conn and
errors out if this is not initialized to NULL at mount time.

Hence, we now set a file's private_data to NULL explicitly, to be independent
of whatever value the misc subsystem initializes it to by default.

[1] https://lkml.org/lkml/2014/12/4/939Reported-by: NGiedrius Statkevicius <giedriuswork@gmail.com>
Reported-by: NThierry Reding <thierry.reding@gmail.com>
Signed-off-by: NTom Van Braeckel <tomvanbraeckel@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

94e4fe2c

18 3月, 2015 8 次提交

ovl: upper fs should not be R/O · 71cbad7e

由 hujianyang 提交于 10年前

After importing multi-lower layer support, users could mount a r/o
partition as the left most lowerdir instead of using it as upperdir.
And a r/o upperdir may cause an error like

	overlayfs: failed to create directory ./workdir/work

during mount.

This patch check the *s_flags* of upper fs and return an error if
it is a r/o partition. The checking of *upper_mnt->mnt_sb->s_flags*
can be removed now.

This patch also remove

	/* FIXME: workdir is not needed for a R/O mount */

from ovl_fill_super() because:

1) for upper fs r/o case
Setting a r/o partition as upper is prevented, no need to care about
workdir in this case.

2) for "mount overlay -o ro" with a r/w upper fs case
Users could remount overlayfs to r/w in this case, so workdir should
not be omitted.
Signed-off-by: Nhujianyang <hujianyang@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

71cbad7e

ovl: check lowerdir amount for non-upper mount · 6be4506e

由 hujianyang 提交于 10年前

Recently multi-lower layer mount support allow upperdir and workdir
to be omitted, then cause overlayfs can be mount with only one
lowerdir directory. This action make no sense and have potential risk.

This patch check the total number of lower directories to prevent
mounting overlayfs with only one directory.

Also, an error message is added to indicate lower directories exceed
OVL_MAX_STACK limit.
Signed-off-by: Nhujianyang <hujianyang@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

6be4506e

ovl: print error message for invalid mount options · bead55ef

由 hujianyang 提交于 10年前

Overlayfs should print an error message if an incorrect mount option
is caught like other filesystems.

After this patch, improper option input could be clearly known.
Reported-by: NFabian Sturm <fabian.sturm@aduu.de>
Signed-off-by: Nhujianyang <hujianyang@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

bead55ef

Btrfs: fix outstanding_extents accounting in DIO · e1cbbfa5

由 Josef Bacik 提交于 9年前

We are keeping track of how many extents we need to reserve properly based on
the amount we want to write, but we were still incrementing outstanding_extents
if we wrote less than what we requested. This isn't quite right since we will
be limited to our max extent size. So instead lets do something horrible! Keep
track of how many outstanding_extents we reserved, and decrement each time we
allocate an extent. If we use our entire reserve make sure to jack up
outstanding_extents on the inode so the accounting works out properly. Thanks,
Reported-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

e1cbbfa5

Btrfs: add sanity test for outstanding_extents accounting · 6a3891c5

由 Josef Bacik 提交于 9年前

I introduced a regression wrt outstanding_extents accounting.  These are tricky
areas that aren't easily covered by xfstests as we could change MAX_EXTENT_SIZE
at any time.  So add sanity tests to cover the various conditions that are
tricky in order to make sure we don't introduce regressions in the future.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>

6a3891c5

Btrfs: just free dummy extent buffers · bcb7e449

由 Josef Bacik 提交于 9年前

If we fail during our sanity tests we could get NULL deref's because we unload
the module before the dummy extent buffers are free'd via RCU. So check for
this case and just free the things directly. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>

bcb7e449

Btrfs: account merges/splits properly · ba117213

由 Josef Bacik 提交于 9年前

My fix

Btrfs: fix merge delalloc logic

only fixed half of the problems, it didn't fix the case where we have two large
extents on either side and then join them together with a new small extent.  We
need to instead keep track of how many extents we have accounted for with each
side of the new extent, and then see how many extents we need for the new large
extent.  If they match then we know we need to keep our reservation, otherwise
we need to drop our reservation.  This shows up with a case like this

[BTRFS_MAX_EXTENT_SIZE+4K][4K HOLE][BTRFS_MAX_EXTENT_SIZE+4K]

Previously the logic would have said that the number extents required for the
new size (3) is larger than the number of extents required for the largest side
(2) therefore we need to keep our reservation.  But this isn't the case, since
both sides require a reservation of 2 which leads to 4 for the whole range
currently reserved, but we only need 3, so we need to drop one of the
reservations.  The same problem existed for splits, we'd think we only need 3
extents when creating the hole but in reality we need 4.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>

ba117213

pagemap: do not leak physical addresses to non-privileged userspace · ab676b7d

由 Kirill A. Shutemov 提交于 9年前

As pointed by recent post[1] on exploiting DRAM physical imperfection,
/proc/PID/pagemap exposes sensitive information which can be used to do
attacks.

This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

[1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[ Eventually we might want to do anything more finegrained, but for now
  this is the simple model.   - Linus ]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Seaborn <mseaborn@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab676b7d

17 3月, 2015 2 次提交

Btrfs: prepare block group cache before writing · dcdf7f6d

由 Josef Bacik 提交于 9年前

Writing the block group cache will modify the extent tree quite a bit because it
truncates the old space cache and pre-allocates new stuff. To try and cut down
on the churn lets do the setup dance first, then later on hopefully we can avoid
looping with newly dirtied roots. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>

dcdf7f6d

kernfs: handle poll correctly on 'direct_read' files. · 7cff4b18

由 NeilBrown 提交于 9年前

Kernfs supports two styles of read: direct_read and seqfile_read.

The latter supports 'poll' correctly thanks to the update of
'->event' in kernfs_seq_show.
The former does not as '->event' is never updated on a read.

So add an appropriate update in kernfs_file_direct_read().

This was noticed because some 'md' sysfs attributes were
recently changed to use direct reads.
Reported-by: NPrakash Punnoor <prakash@punnoor.de>
Reported-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Fixes: 750f199eSigned-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7cff4b18

14 3月, 2015 9 次提交

locks: fix generic_delete_lease tracepoint to use victim pointer · a9b1b455

由 Jeff Layton 提交于 9年前

It's possible that "fl" won't point at a valid lock at this point, so
use "victim" instead which is either a valid lock or NULL.
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>

a9b1b455

Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list) · ea526d18

由 Josef Bacik 提交于 9年前

Dave could hit this assert consistently running btrfs/078. This is because
when we update the block groups we could truncate the free space, which would
try to delete the csums for that range and dirty the csum root. For this to
happen we have to have already written out the csum root so it's kind of hard to
hit this case. This patch fixes this by changing the logic to only write the
dirty block groups if the dirty_cowonly_roots list is empty. This will get us
the same effect as before since we add the extent root last, and will cover the
case that we dirty some other root again but not the extent root. Thanks,
Reported-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

ea526d18

Btrfs: account for the correct number of extents for delalloc reservations · 6a41dd09

由 Josef Bacik 提交于 9年前

Direct IO can easily pass in an buffer that is greater than
BTRFS_MAX_EXTENT_SIZE, so take this into account when reserving extents in the
delalloc reservation code.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

6a41dd09

Btrfs: fix merge delalloc logic · 8461a3de

由 Josef Bacik 提交于 9年前

My patch to properly count outstanding extents wrt MAX_EXTENT_SIZE introduced a
regression when re-dirtying already dirty areas. We have logic in split to make
sure we are taking the largest space into account but didn't have it for merge,
so it was sometimes making us think we were turning a tiny extent into a huge
extent, when in reality we already had a huge extent and needed to use the other
side in our logic. This fixes the regression that was reported by a user on
list. Thanks,
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

8461a3de

Btrfs: fix comp_oper to get right order · 48da5f0a

由 Liu Bo 提交于 9年前

Case (oper1->seq > oper2->seq) should differ with case (oper1->seq < oper2->seq).
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

48da5f0a

Btrfs: catch transaction abortion after waiting for it · b4924a0f

由 Liu Bo 提交于 9年前

This problem is uncovered by a test case: http://patchwork.ozlabs.org/patch/244297.

Fsync() can report success when it actually doesn't. When we
have several threads running fsync() at the same tiem and in one fsync() we
get a transaction abortion due to some problems(in the test case it's disk
failures), and other fsync()s may return successfully which makes userspace
programs think that data is now safely flushed into disk.

It's because that after fsyncs() fail btrfs_sync_log() due to disk failures,
they get to try btrfs_commit_transaction() where it finds that there is
already a transaction being committed, and they'll just call wait_for_commit()
and return. Note that we actually check "trans->aborted" in btrfs_end_transaction,
but it's likely that the error message is still not yet throwed out and only after
wait_for_commit() we're sure whether the transaction is committed successfully.

This add the necessary check and it now passes the test.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

b4924a0f

btrfs: fix sizeof format specifier in btrfs_check_super_valid() · d2207129

由 Fabian Frederick 提交于 9年前

This patch fixes mips compilation warning:

fs/btrfs/disk-io.c: In function 'btrfs_check_super_valid':
fs/btrfs/disk-io.c:3927:21: warning: format '%lu' expects argument
of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat]
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NChris Mason <clm@fb.com>

d2207129

fs: split generic and aio kiocb · 04b2fa9f

由 Christoph Hellwig 提交于 9年前

Most callers in the kernel want to perform synchronous file I/O, but
still have to bloat the stack with a full struct kiocb.  Split out
the parts needed in filesystem code from those in the aio code, and
only allocate those needed to pass down argument on the stack.  The
aio code embedds the generic iocb in the one it allocates and can
easily get back to it by using container_of.

Also add a ->ki_complete method to struct kiocb, this is used to call
into the aio code and thus removes the dependency on aio for filesystems
impementing asynchronous operations.  It will also allow other callers
to substitute their own completion callback.

We also add a new ->ki_flags field to work around the nasty layering
violation recently introduced in commit 5e33f6 ("usb: gadget: ffs: add
eventfd notification about ffs events").
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

04b2fa9f

fs: don't allow to complete sync iocbs through aio_complete · 599bd19b

由 Christoph Hellwig 提交于 9年前

The AIO interface is fairly complex because it tries to allow
filesystems to always work async and then wakeup a synchronous
caller through aio_complete.  It turns out that basically no one
was doing this to avoid the complexity and context switches,
and we've already fixed up the remaining users and can now
get rid of this case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

599bd19b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功