提交 · a800bad36619ce47ac0222004635448e6c91ff72 · openeuler / Kernel

22 7月, 2014 1 次提交

由 Miklos Szeredi 提交于 7月 22, 2014

Default s_time_gran is 1, don't overwrite that if userspace didn't
explicitly specify one.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+

a800bad3

14 7月, 2014 2 次提交

fuse: replace count*size kzalloc by kcalloc · f2b3455e

由 Fabian Frederick 提交于 6月 23, 2014

kcalloc manages count*sizeof overflow.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

f2b3455e

fuse: release temporary page if fuse_writepage_locked() failed · 27f1b363

由 Maxim Patlasov 提交于 7月 10, 2014

tmp_page to be freed if fuse_write_file_get() returns NULL.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

27f1b363

10 7月, 2014 1 次提交

fuse: restructure ->rename2() · 4237ba43

由 Miklos Szeredi 提交于 7月 10, 2014

Make ->rename2() universal, i.e. able to handle zero flags.  This is to
make future change of the API easier.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

4237ba43

07 7月, 2014 5 次提交

fuse: avoid scheduling while atomic · c55a01d3

由 Miklos Szeredi 提交于 7月 07, 2014

As reported by Richard Sharpe, an attempt to use fuse_notify_inval_entry()
triggers complains about scheduling while atomic:

  BUG: scheduling while atomic: fuse.hf/13976/0x10000001

This happens because fuse_notify_inval_entry() attempts to allocate memory
with GFP_KERNEL, holding "struct fuse_copy_state" mapped by kmap_atomic().

Introduced by commit 58bda1da "fuse/dev: use atomic maps"

Fix by moving the map/unmap to just cover the actual memcpy operation.

Original patch from Maxim Patlasov <mpatlasov@parallels.com>
Reported-by: NRichard Sharpe <realrichardsharpe@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+

c55a01d3

fuse: handle large user and group ID · 233a01fa

由 Miklos Szeredi 提交于 7月 07, 2014

If the number in "user_id=N" or "group_id=N" mount options was larger than
INT_MAX then fuse returned EINVAL.

Fix this to handle all valid uid/gid values.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org

233a01fa

fuse: inode: drop cast · 7b3d8bf7

由 Himangi Saraogi 提交于 6月 26, 2014

This patch removes the cast on data of type void * as it is not needed.
The following Coccinelle semantic patch was used for making the change:

@r@
expression x;
void* e;
type T;
identifier f;
@@

(
  *((T *)e)
|
  ((T *)x)[...]
|
  ((T *)x)->f
|
- (T *)
  e
)
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

7b3d8bf7

fuse: ignore entry-timeout on LOOKUP_REVAL · 154210cc

由 Anand Avati 提交于 6月 26, 2014

The following test case demonstrates the bug:

  sh# mount -t glusterfs localhost:meta-test /mnt/one

  sh# mount -t glusterfs localhost:meta-test /mnt/two

  sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; echo stuff > /mnt/one/file
  bash: /mnt/one/file: Stale file handle

  sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; sleep 1; echo stuff > /mnt/one/file

On the second open() on /mnt/one, FUSE would have used the old
nodeid (file handle) trying to re-open it. Gluster is returning
-ESTALE. The ESTALE propagates back to namei.c:filename_lookup()
where lookup is re-attempted with LOOKUP_REVAL. The right
behavior now, would be for FUSE to ignore the entry-timeout and
and do the up-call revalidation. Instead FUSE is ignoring
LOOKUP_REVAL, succeeding the revalidation (because entry-timeout
has not passed), and open() is again retried on the old file
handle and finally the ESTALE is going back to the application.

Fix: if revalidation is happening with LOOKUP_REVAL, then ignore
entry-timeout and always do the up-call.
Signed-off-by: NAnand Avati <avati@redhat.com>
Reviewed-by: NNiels de Vos <ndevos@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org

154210cc

fuse: timeout comparison fix · 126b9d43

由 Miklos Szeredi 提交于 7月 07, 2014

As suggested by checkpatch.pl, use time_before64() instead of direct
comparison of jiffies64 values.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org>

126b9d43

05 6月, 2014 2 次提交

mm: non-atomically mark page accessed during page cache allocation where possible · 2457aec6

由 Mel Gorman 提交于 6月 04, 2014

aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after.  Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage.  The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.

The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page.  This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.

The primary APIs of concern in this care are the following and are used
by most filesystems.

	find_get_page
	find_lock_page
	find_or_create_page
	grab_cache_page_nowait
	grab_cache_page_write_begin

All of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not.  Then
old API is preserved but is basically a thin wrapper around this core
function.

Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job.  There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted.  This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change.  It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.

The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations.  The size of the
file is 1/10th physical memory to avoid dirty page balancing.  In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO.  The sync results are expected to be
more stable.  The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.

The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts.  Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison.  As async results were variable
do to writback timings, I'm only reporting the maximum figures.  The sync
results were stable enough to make the mean and stddev uninteresting.

The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.

async dd
                                    3.15.0-rc3            3.15.0-rc3
                                       vanilla           accessed-v2
ext3    Max      elapsed     13.9900 (  0.00%)     11.5900 ( 17.16%)
tmpfs	Max      elapsed      0.5100 (  0.00%)      0.4900 (  3.92%)
btrfs   Max      elapsed     12.8100 (  0.00%)     12.7800 (  0.23%)
ext4	Max      elapsed     18.6000 (  0.00%)     13.3400 ( 28.28%)
xfs	Max      elapsed     12.5600 (  0.00%)      2.0900 ( 83.36%)

The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.

        samples percentage
ext3       86107    0.9783  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext3       23833    0.2710  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3        5036    0.0573  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4       64566    0.8961  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
ext4        5322    0.0713  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4        2869    0.0384  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs        62126    1.7675  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
xfs         1904    0.0554  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs          103    0.0030  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs      10655    0.1338  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
btrfs       2020    0.0273  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs        587    0.0079  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs      59562    3.2628  vmlinux-3.15.0-rc4-vanilla        mark_page_accessed
tmpfs       1210    0.0696  vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs         94    0.0054  vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed

[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: NPrabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2457aec6

mm: page_alloc: convert hot/cold parameter and immediate callers to bool · b745bc85

由 Mel Gorman 提交于 6月 04, 2014

cold is a bool, make it one.  Make the likely case the "if" part of the
block instead of the else as according to the optimisation manual this is
preferred.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b745bc85

02 6月, 2014 1 次提交

locks: ensure that fl_owner is always initialized properly in flock and lease codepaths · 130d1f95

由 Jeff Layton 提交于 5月 09, 2014

Currently, the fl_owner isn't set for flock locks. Some filesystems use
byte-range locks to simulate flock locks and there is a common idiom in
those that does:

    fl->fl_owner = (fl_owner_t)filp;
    fl->fl_start = 0;
    fl->fl_end = OFFSET_MAX;

Since flock locks are generally "owned" by the open file description,
move this into the common flock lock setup code. The fl_start and fl_end
fields are already set appropriately, so remove the unneeded setting of
that in flock ops in those filesystems as well.

Finally, the lease code also sets the fl_owner as if they were owned by
the process and not the open file description. This is incorrect as
leases have the same ownership semantics as flock locks. Set them the
same way. The lease code doesn't actually use the fl_owner value for
anything, so this is more for consistency's sake than a bugfix.
Reported-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJeff Layton <jlayton@poochiereds.net>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (Staging portion)
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>

130d1f95

07 5月, 2014 13 次提交

bio_vec-backed iov_iter · 62a8067a

由 Al Viro 提交于 4月 04, 2014

New variant of iov_iter - ITER_BVEC in iter->type, backed with
bio_vec array instead of iovec one.  Primitives taught to deal
with such beasts, __swap_write() switched to using that kind
of iov_iter.

Note that bio_vec is just a <page, offset, length> triple - there's
nothing block-specific about it.  I've left the definition where it
was, but took it from under ifdef CONFIG_BLOCK.

Next target: ->splice_write()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

62a8067a

A
fuse: switch to ->write_iter() · 84c3d55c
由 Al Viro 提交于 4月 03, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
84c3d55c
A
fuse_file_aio_read(): convert to ->read_iter() · 37c20f16
由 Al Viro 提交于 4月 02, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
37c20f16

iov_iter_truncate() · 0c949334

由 Al Viro 提交于 3月 22, 2014

Now It Can Be Done(tm) - we don't need to do iov_shorten() in
generic_file_direct_write() anymore, now that all ->direct_IO()
instances are converted to proper iov_iter methods and honour
iter->count and iter->iov_offset properly.

Get rid of count/ocount arguments of generic_file_direct_write(),
while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0c949334

new helper: iov_iter_npages() · f67da30c

由 Al Viro 提交于 3月 19, 2014

counts the pages covered by iov_iter, up to given limit.
do_block_direct_io() and fuse_iter_npages() switched to
it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f67da30c

A
fuse: switch to iov_iter_get_pages() · c9c37e2e
由 Al Viro 提交于 3月 16, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c9c37e2e

fuse: pull iov_iter initializations up · d22a943f

由 Al Viro 提交于 3月 16, 2014

... to fuse_direct_{read,write}().  ->direct_IO() path uses the
iov_iter passed by the caller instead.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d22a943f

start adding the tag to iov_iter · 71d8e532

由 Al Viro 提交于 3月 05, 2014

For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment.  Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

71d8e532

A
fuse_file_aio_write(): merge initializations of iov_iter · 23faa7b8
由 Al Viro 提交于 3月 05, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
23faa7b8

get rid of pointless iov_length() in ->direct_IO() · a6cbcd4a

由 Al Viro 提交于 3月 04, 2014

all callers have iov_length(iter->iov, iter->nr_segs) == iov_iter_count(iter)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a6cbcd4a

A
pass iov_iter to ->direct_IO() · d8d3d94b
由 Al Viro 提交于 3月 04, 2014
```
unmodified, for now
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d8d3d94b

kill generic_segment_checks() · cb66a7a1

由 Al Viro 提交于 3月 04, 2014

all callers of ->aio_read() and ->aio_write() have iov/nr_segs already
checked - generic_segment_checks() done after that is just an odd way
to spell iov_length().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cb66a7a1

A
generic_file_direct_write(): switch to iov_iter · f8579f86
由 Al Viro 提交于 3月 03, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f8579f86

28 4月, 2014 15 次提交

fuse: add renameat2 support · 1560c974

由 Miklos Szeredi 提交于 4月 28, 2014

Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

1560c974

fuse: clear MS_I_VERSION · 4ace1f85

由 Miklos Szeredi 提交于 4月 28, 2014

Fuse doesn't support i_version (yet).
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

4ace1f85

fuse: clear FUSE_I_CTIME_DIRTY flag on setattr · 3ad22c62

由 Maxim Patlasov 提交于 4月 28, 2014

The patch addresses two use-cases when the flag may be safely cleared:

1. fuse_do_setattr() is called with ATTR_CTIME flag set in attr->ia_valid.
In this case attr->ia_ctime bears actual value. In-kernel fuse must send it
to the userspace server and then assign the value to inode->i_ctime.

2. fuse_do_setattr() is called with ATTR_SIZE flag set in attr->ia_valid,
whereas ATTR_CTIME is not set (truncate(2)).
In this case in-kernel fuse must sent "now" to the userspace server and then
assign the value to inode->i_ctime.

In both cases we could clear I_DIRTY_SYNC, but that needs more thought.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

3ad22c62

fuse: trust kernel i_ctime only · 31f3267b

由 Maxim Patlasov 提交于 4月 28, 2014

Let the kernel maintain i_ctime locally: update i_ctime explicitly on
truncate, fallocate, open(O_TRUNC), setxattr, removexattr, link, rename,
unlink.

The inode flag I_DIRTY_SYNC serves as indication that local i_ctime should
be flushed to the server eventually.  The patch sets the flag and updates
i_ctime in course of operations listed above.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

31f3267b

fuse: remove .update_time · 8b47e73e

由 Miklos Szeredi 提交于 4月 28, 2014

This implements updating ctime as well as mtime on file_update_time().
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

8b47e73e

fuse: allow ctime flushing to userspace · ab9e13f7

由 Maxim Patlasov 提交于 4月 28, 2014

The patch extends fuse_setattr_in, and extends the flush procedure
(fuse_flush_times()) called on ->write_inode() to send the ctime as well as
mtime.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

ab9e13f7

fuse: fuse: add time_gran to INIT_OUT · e27c9d38

由 Miklos Szeredi 提交于 4月 28, 2014

Allow userspace fs to specify time granularity.

This is needed because with writeback_cache mode the kernel is responsible
for generating mtime and ctime, but if the underlying filesystem doesn't
support nanosecond granularity then the cache will contain a different
value from the one stored on the filesystem resulting in a change of times
after a cache flush.

Make the default granularity 1s.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

e27c9d38

fuse: add .write_inode · 1e18bda8

由 Miklos Szeredi 提交于 4月 28, 2014

...and flush mtime from this.  This allows us to use the kernel
infrastructure for writing out dirty metadata (mtime at this point, but
ctime in the next patches and also maybe atime).
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

1e18bda8

fuse: clean up fsync · 22401e7b

由 Miklos Szeredi 提交于 4月 28, 2014

Don't need to start I/O twice (once without i_mutex and one within).

Also make sure that even if the userspace filesystem doesn't support FSYNC
we do all the steps other than sending the message.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

22401e7b

fuse: fuse: fallocate: use file_update_time() · 93d2269d

由 Miklos Szeredi 提交于 4月 28, 2014

in preparation for getting rid of FUSE_I_MTIME_DIRTY.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

93d2269d

fuse: update mtime on open(O_TRUNC) in atomic_o_trunc mode · 75caeecd

由 Maxim Patlasov 提交于 4月 28, 2014

In case of fc->atomic_o_trunc is set, fuse does nothing in
fuse_do_setattr() while handling open(O_TRUNC). Hence, i_mtime must be
updated explicitly in fuse_finish_open(). The patch also adds extra locking
encompassing open(O_TRUNC) operation to avoid races between the truncation
and updating i_mtime.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

75caeecd

fuse: update mtime on truncate(2) · 009dd694

由 Maxim Patlasov 提交于 4月 28, 2014

Handling truncate(2), VFS doesn't set ATTR_MTIME bit in iattr structure;
only ATTR_SIZE bit is set. In-kernel fuse must handle the case by setting
mtime fields of struct fuse_setattr_in to "now" and set FATTR_MTIME bit
even though ATTR_MTIME was not set.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

009dd694

fuse: do not use uninitialized i_mode · d31433c8

由 Maxim Patlasov 提交于 4月 28, 2014

When inode is in I_NEW state, inode->i_mode is not initialized yet. Do not
use it before fuse_init_inode() is called.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

d31433c8

M
fuse: fix mtime update error in fsync · aeb4eb6b
由 Miklos Szeredi 提交于 4月 28, 2014
```
Bad case of shadowing.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
```
aeb4eb6b

fuse: check fallocate mode · 4adb8302

由 Miklos Szeredi 提交于 4月 28, 2014

Don't allow new fallocate modes until we figure out what (if anything) that
takes.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

4adb8302

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功