提交 · a8f3550cd228b6edc5d17fce1a9af8cc7004f185 · openeuler / Kernel

07 5月, 2014 11 次提交

bury __generic_file_aio_write() · a8f3550c

由 Al Viro 提交于 4月 03, 2014

all users converted to __generic_file_write_iter() now
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a8f3550c

A
blkdev_aio_write() - turn into blkdev_write_iter() · 1456c0a8
由 Al Viro 提交于 4月 03, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1456c0a8
A
write_iter variants of {__,}generic_file_aio_write() · 8174202b
由 Al Viro 提交于 4月 03, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8174202b

new methods: ->read_iter() and ->write_iter() · 293bc982

由 Al Viro 提交于 2月 11, 2014

Beginning to introduce those.  Just the callers for now, and it's
clumsier than it'll eventually become; once we finish converting
aio_read and aio_write instances, the things will get nicer.

For now, these guys are in parallel to ->aio_read() and ->aio_write();
they take iocb and iov_iter, with everything in iov_iter already
validated.  File offset is passed in iocb->ki_pos, iov/nr_segs -
in iov_iter.

Main concerns in that series are stack footprint and ability to
split the damn thing cleanly.

[fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

293bc982

replace checking for ->read/->aio_read presence with check in ->f_mode · 7f7f25e8

由 Al Viro 提交于 2月 11, 2014

Since we are about to introduce new methods (read_iter/write_iter), the
tests in a bunch of places would have to grow inconveniently. Check
once (at open() time) and store results in ->f_mode as FMODE_CAN_READ
and FMODE_CAN_WRITE resp. It might end up being a temporary measure -
once everything switches from ->aio_{read,write} to ->{read,write}_iter
it might make sense to return to open-coded checks. We'll see...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7f7f25e8

iov_iter_truncate() · 0c949334

由 Al Viro 提交于 3月 22, 2014

Now It Can Be Done(tm) - we don't need to do iov_shorten() in
generic_file_direct_write() anymore, now that all ->direct_IO()
instances are converted to proper iov_iter methods and honour
iter->count and iter->iov_offset properly.

Get rid of count/ocount arguments of generic_file_direct_write(),
while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0c949334

new helper: generic_file_read_iter() · ed978a81

由 Al Viro 提交于 3月 05, 2014

iov_iter-using variant of generic_file_aio_read(). Some callers
converted. Note that it's still not quite there for use as ->read_iter() -
we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately,
that's true for all converted callers (and for generic_file_aio_read() itself).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ed978a81

A
switch {__,}blockdev_direct_IO() to iov_iter · 31b14039
由 Al Viro 提交于 3月 05, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
31b14039
A
pass iov_iter to ->direct_IO() · d8d3d94b
由 Al Viro 提交于 3月 04, 2014
```
unmodified, for now
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d8d3d94b

kill generic_segment_checks() · cb66a7a1

由 Al Viro 提交于 3月 04, 2014

all callers of ->aio_read() and ->aio_write() have iov/nr_segs already
checked - generic_segment_checks() done after that is just an odd way
to spell iov_length().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cb66a7a1

A
generic_file_direct_write(): switch to iov_iter · f8579f86
由 Al Viro 提交于 3月 03, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f8579f86

24 4月, 2014 1 次提交

locks: rename FL_FILE_PVT and IS_FILE_PVT to use "*_OFDLCK" instead · cff2fce5

由 Jeff Layton 提交于 4月 22, 2014

File-private locks have been re-christened as "open file description"
locks. Finish the symbol name cleanup in the internal implementation.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

cff2fce5

04 4月, 2014 1 次提交

mm + fs: store shadow entries in page cache · 91b0abe3

由 Johannes Weiner 提交于 4月 03, 2014

Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91b0abe3

02 4月, 2014 11 次提交

A
kill generic_file_buffered_write() · ccad2365
由 Al Viro 提交于 2月 11, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ccad2365
A
export generic_perform_write(), start getting rid of generic_file_buffer_write() · 3b93f911
由 Al Viro 提交于 2月 11, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3b93f911
A
generic_file_direct_write(): get rid of ppos argument · 5cb6c6c7
由 Al Viro 提交于 2月 11, 2014
```
always equal to &iocb->ki_pos.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5cb6c6c7
A
kill the 5th argument of generic_file_buffered_write() · fcacafd2
由 Al Viro 提交于 2月 09, 2014
```
same story - it's &iocb->ki_pos in all cases
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
fcacafd2

kill the 4th argument of __generic_file_aio_write() · 41fc56d5

由 Al Viro 提交于 2月 09, 2014

It's always equal to &iocb->ki_pos, where iocb is the value of the 1st
argument.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

41fc56d5

introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read() · 6e58e79d

由 Al Viro 提交于 2月 03, 2014

generic_file_aio_read() was looping over the target iovec, with loop over
(source) pages nested inside that.  Just set an iov_iter up and pass *that*
to do_generic_file_aio_read().  With copy_page_to_iter() doing all work
of mapping and copying a page to iovec and advancing iov_iter.

Switch shmem_file_aio_read() to the same and kill file_read_actor(), while
we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6e58e79d

K
iov_iter: Move iov_iter to uio.h · 92236878
由 Kent Overstreet 提交于 11月 27, 2013
```
Signed-off-by: NKent Overstreet <kmo@daterainc.com>
```
92236878
A
switch ->is_partially_uptodate() to saner arguments · c186afb4
由 Al Viro 提交于 2月 02, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c186afb4
A
new helper: readlink_copy() · 5d826c84
由 Al Viro 提交于 3月 14, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5d826c84

mark struct file that had write access grabbed by open() · 83f936c7

由 Al Viro 提交于 3月 14, 2014

new flag in ->f_mode - FMODE_WRITER. Set by do_dentry_open() in case
when it has grabbed write access, checked by __fput() to decide whether
it wants to drop the sucker. Allows to stop bothering with mnt_clone_write()
in alloc_file(), along with fewer special_file() checks.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

83f936c7

get rid of DEBUG_WRITECOUNT · 4597e695

由 Al Viro 提交于 3月 14, 2014

it only makes control flow in __fput() and friends more convoluted.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4597e695

01 4月, 2014 1 次提交

vfs: add renameat2 syscall · 520c8b16

由 Miklos Szeredi 提交于 4月 01, 2014

Add new renameat2 syscall, which is the same as renameat with an added
flags argument.

Pass flags to vfs_rename() and to i_op->rename() as well.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>

520c8b16

31 3月, 2014 5 次提交

locks: fix locks_mandatory_locked to respect file-private locks · d7a06983

由 Jeff Layton 提交于 3月 10, 2014

As Trond pointed out, you can currently deadlock yourself by setting a
file-private lock on a file that requires mandatory locking and then
trying to do I/O on it.

Avoid this problem by plumbing some knowledge of file-private locks into
the mandatory locking code. In order to do this, we must pass down
information about the struct file that's being used to
locks_verify_locked.
Reported-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@redhat.com>

d7a06983

locks: pass the cmd value to fcntl_getlk/getlk64 · c1e62b8f

由 Jeff Layton 提交于 2月 03, 2014

Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

c1e62b8f

locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT" · c918d42a

由 Jeff Layton 提交于 2月 03, 2014

In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new FL_FILE_PVT flag.

Report these types of locks as "FLPVT" in /proc/locks to distinguish
them from "classic" POSIX locks.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

c918d42a

locks: rename locks_remove_flock to locks_remove_file · 78ed8a13

由 Jeff Layton 提交于 2月 03, 2014

This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with file-private locks too. Rename it
to locks_remove_file to indicate that it removes locks that are
associated with a particular struct file, and not just flock locks.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

78ed8a13

locks: close potential race between setlease and open · 24cbe784

由 Jeff Layton 提交于 2月 03, 2014

As Al Viro points out, there is an unlikely, but possible race between
opening a file and setting a lease on it. generic_add_lease is done with
the i_lock held, but the inode->i_flock check in break_lease is
lockless. It's possible for another task doing an open to do the entire
pathwalk and call break_lease between the point where generic_add_lease
checks for a conflicting open and adds the lease to the list. If this
occurs, we can end up with a lease set on the file with a conflicting
open.

To guard against that, check again for a conflicting open after adding
the lease to the i_flock list. If the above race occurs, then we can
simply unwind the lease setting and return -EAGAIN.

Because we take dentry references and acquire write access on the file
before calling break_lease, we know that if the i_flock list is empty
when the open caller goes to check it then the necessary refcounts have
already been incremented. Thus the additional check for a conflicting
open will see that there is one and the setlease call will fail.

Cc: Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@fieldses.org>

24cbe784

25 3月, 2014 1 次提交

ext4: atomically set inode->i_flags in ext4_set_inode_flags() · 5f16f322

由 Theodore Ts'o 提交于 3月 24, 2014

Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Reported-by: NJohn Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

5f16f322

10 3月, 2014 2 次提交

get rid of fget_light() · bd2a31d5

由 Al Viro 提交于 3月 04, 2014

instead of returning the flags by reference, we can just have the
low-level primitive return those in lower bits of unsigned long,
with struct file * derived from the rest.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bd2a31d5

vfs: atomic f_pos accesses as per POSIX · 9c225f26

由 Linus Torvalds 提交于 3月 03, 2014

Our write() system call has always been atomic in the sense that you get
the expected thread-safe contiguous write, but we haven't actually
guaranteed that concurrent writes are serialized wrt f_pos accesses, so
threads (or processes) that share a file descriptor and use "write()"
concurrently would quite likely overwrite each others data.

This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says:

 "2.9.7 Thread Interactions with Regular File Operations

  All of the following functions shall be atomic with respect to each
  other in the effects specified in POSIX.1-2008 when they operate on
  regular files or symbolic links: [...]"

and one of the effects is the file position update.

This unprotected file position behavior is not new behavior, and nobody
has ever cared.  Until now.  Yongzhi Pan reported unexpected behavior to
Michael Kerrisk that was due to this.

This resolves the issue with a f_pos-specific lock that is taken by
read/write/lseek on file descriptors that may be shared across threads
or processes.
Reported-by: NYongzhi Pan <panyongzhi@gmail.com>
Reported-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9c225f26

08 3月, 2014 1 次提交

fs: move i_readcount · d984ea60

由 Mimi Zohar 提交于 12月 11, 2013

On a 64-bit system, a hole exists in the 'inode' structure after
i_writecount.  This patch moves i_readcount to fill this hole.
Reported-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: NDavid Howells <dhowells@redhat.com>

d984ea60

10 2月, 2014 2 次提交

direct-io: add flag to allow aio writes beyond i_size · 60392573

由 Christoph Hellwig 提交于 2月 10, 2014

Some filesystems can handle direct I/O writes beyond i_size safely,
so allow them to opt into receiving them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

60392573

fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d

由 Al Viro 提交于 2月 09, 2014

It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
	pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
	pos_before_write .. pos_before_write + written - 1
instead.  Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().

All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().

The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d311d79d

06 2月, 2014 1 次提交

execve: use 'struct filename *' for executable name passing · c4ad8f98

由 Linus Torvalds 提交于 2月 05, 2014

This changes 'do_execve()' to get the executable name as a 'struct
filename', and to free it when it is done.  This is what the normal
users want, and it simplifies and streamlines their error handling.

The controlled lifetime of the executable name also fixes a
use-after-free problem with the trace_sched_process_exec tracepoint: the
lifetime of the passed-in string for kernel users was not at all
obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
the pathname allocation lifetime with the execve() having finished,
which in turn meant that the trace point that happened after
mm_release() of the old process VM ended up using already free'd memory.

To solve the kernel string lifetime issue, this simply introduces
"getname_kernel()" that works like the normal user-space getname()
function, except with the source coming from kernel memory.

As Oleg points out, this also means that we could drop the tcomm[] array
from 'struct linux_binprm', since the pathname lifetime now covers
setup_new_exec().  That would be a separate cleanup.
Reported-by: NIgor Zhbanov <i.zhbanov@samsung.com>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4ad8f98

26 1月, 2014 1 次提交

fs: add a set_acl inode operation · 893d46e4

由 Christoph Hellwig 提交于 12月 20, 2013

This will allow moving all the Posix ACL handling into the VFS and clean
up tons of cruft in the filesystems.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

893d46e4

16 11月, 2013 1 次提交

consolidate simple ->d_delete() instances · b26d4cd3

由 Al Viro 提交于 10月 25, 2013

Rename simple_delete_dentry() to always_delete_dentry() and export it.
Export simple_dentry_operations, while we are at it, and get rid of
their duplicates
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b26d4cd3

09 11月, 2013 1 次提交

locks: break delegations on any attribute modification · 27ac0ffe

由 J. Bruce Fields 提交于 9月 20, 2011

NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.

Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

27ac0ffe

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功