提交 · cb59670870d90ff8bc31f5f2efc407c6fe4938c0 · openeuler / Kernel

06 1月, 2015 2 次提交

fuse: add memory barrier to INIT · 9759bd51

由 Miklos Szeredi 提交于 1月 06, 2015

Theoretically we need to order setting of various fields in fc with
fc->initialized.

No known bug reports related to this yet.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

9759bd51

fuse: fix LOOKUP vs INIT compat handling · 21f62174

由 Miklos Szeredi 提交于 1月 06, 2015

Analysis from Marc:

 "Commit 7078187a ("fuse: introduce fuse_simple_request() helper")
  from the above pull request triggers some EIO errors for me in some tests
  that rely on fuse

  Looking at the code changes and a bit of debugging info I think there's a
  general problem here that fuse_get_req checks and possibly waits for
  fc->initialized, and this was always called first.  But this commit
  changes the ordering and in many places fc->minor is now possibly used
  before fuse_get_req, and we can't be sure that fc has been initialized.
  In my case fuse_lookup_init sets req->out.args[0].size to the wrong size
  because fc->minor at that point is still 0, leading to the EIO error."

Fix by moving the compat adjustments into fuse_simple_request() to after
fuse_get_req().

This is also more readable than the original, since now compatibility is
handled in a single function instead of cluttering each operation.
Reported-by: NMarc Dionne <marc.c.dionne@gmail.com>
Tested-by: NMarc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Fixes: 7078187a ("fuse: introduce fuse_simple_request() helper")

21f62174

12 12月, 2014 3 次提交

fuse: introduce fuse_simple_request() helper · 7078187a

由 Miklos Szeredi 提交于 12月 12, 2014

The following pattern is repeated many times:

	req = fuse_get_req_nopages(fc);
	/* Initialize req->(in|out).args */
	fuse_request_send(fc, req);
	err = req->out.h.error;
	fuse_put_request(req);

Create a new replacement helper:

	/* Initialize args */
	err = fuse_simple_request(fc, &args);

In addition to reducing the code size, this will ease moving from the
complex arg-based to a simpler page-based I/O on the fuse device.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

7078187a

fuse: flush requests on umount · 580640ba

由 Miklos Szeredi 提交于 12月 12, 2014

Use fuse_abort_conn() instead of fuse_conn_kill() in fuse_put_super().
This flushes and aborts requests still on any queues.  But since we've
already reset fc->connected, those requests would not be useful anyway and
would be flushed when the fuse device is closed.

Next patches will rely on requests being flushed before the superblock is
destroyed.

Use fuse_abort_conn() in cuse_process_init_reply() too, since it makes no
difference there, and we can get rid of fuse_conn_kill().
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

580640ba

fuse: don't wake up reserved req in fuse_conn_kill() · 0c4dd4ba

由 Miklos Szeredi 提交于 12月 12, 2014

Waking up reserved_req_waitq from fuse_conn_kill() doesn't make sense since
we aren't chaging ff->reserved_req here, which is what this waitqueue
signals.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

0c4dd4ba

22 7月, 2014 2 次提交

fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT · d7afaec0

由 Andrew Gallagher 提交于 7月 22, 2014

Here some additional changes to set a capability flag so that clients can
detect when it's appropriate to return -ENOSYS from open.

This amends the following commit introduced in 3.14:

  7678ac50  fuse: support clients that don't implement 'open'

However we can only add the flag to 3.15 and later since there was no
protocol version update in 3.14.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+

d7afaec0

fuse: s_time_gran fix · a800bad3

由 Miklos Szeredi 提交于 7月 22, 2014

Default s_time_gran is 1, don't overwrite that if userspace didn't
explicitly specify one.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+

a800bad3

07 7月, 2014 2 次提交

fuse: handle large user and group ID · 233a01fa

由 Miklos Szeredi 提交于 7月 07, 2014

If the number in "user_id=N" or "group_id=N" mount options was larger than
INT_MAX then fuse returned EINVAL.

Fix this to handle all valid uid/gid values.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org

233a01fa

fuse: inode: drop cast · 7b3d8bf7

由 Himangi Saraogi 提交于 6月 26, 2014

This patch removes the cast on data of type void * as it is not needed.
The following Coccinelle semantic patch was used for making the change:

@r@
expression x;
void* e;
type T;
identifier f;
@@

(
  *((T *)e)
|
  ((T *)x)[...]
|
  ((T *)x)->f
|
- (T *)
  e
)
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

7b3d8bf7

28 4月, 2014 5 次提交

fuse: clear MS_I_VERSION · 4ace1f85

由 Miklos Szeredi 提交于 4月 28, 2014

Fuse doesn't support i_version (yet).
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

4ace1f85

fuse: trust kernel i_ctime only · 31f3267b

由 Maxim Patlasov 提交于 4月 28, 2014

Let the kernel maintain i_ctime locally: update i_ctime explicitly on
truncate, fallocate, open(O_TRUNC), setxattr, removexattr, link, rename,
unlink.

The inode flag I_DIRTY_SYNC serves as indication that local i_ctime should
be flushed to the server eventually.  The patch sets the flag and updates
i_ctime in course of operations listed above.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

31f3267b

fuse: fuse: add time_gran to INIT_OUT · e27c9d38

由 Miklos Szeredi 提交于 4月 28, 2014

Allow userspace fs to specify time granularity.

This is needed because with writeback_cache mode the kernel is responsible
for generating mtime and ctime, but if the underlying filesystem doesn't
support nanosecond granularity then the cache will contain a different
value from the one stored on the filesystem resulting in a change of times
after a cache flush.

Make the default granularity 1s.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

e27c9d38

fuse: add .write_inode · 1e18bda8

由 Miklos Szeredi 提交于 4月 28, 2014

...and flush mtime from this.  This allows us to use the kernel
infrastructure for writing out dirty metadata (mtime at this point, but
ctime in the next patches and also maybe atime).
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

1e18bda8

fuse: do not use uninitialized i_mode · d31433c8

由 Maxim Patlasov 提交于 4月 28, 2014

When inode is in I_NEW state, inode->i_mode is not initialized yet. Do not
use it before fuse_init_inode() is called.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

d31433c8

04 4月, 2014 1 次提交

mm + fs: store shadow entries in page cache · 91b0abe3

由 Johannes Weiner 提交于 4月 03, 2014

Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91b0abe3

02 4月, 2014 3 次提交

fuse: Turn writeback cache on · 4d99ff8f

由 Pavel Emelyanov 提交于 10月 10, 2013

Introduce a bit kernel and userspace exchange between each-other on
the init stage and turn writeback on if the userspace want this and
mount option 'allow_wbcache' is present (controlled by fusermount).

Also add each writable file into per-inode write list and call the
generic_file_aio_write to make use of the Linux page cache engine.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

4d99ff8f

fuse: Trust kernel i_mtime only · b0aa7606

由 Maxim Patlasov 提交于 12月 26, 2013

Let the kernel maintain i_mtime locally:
 - clear S_NOCMTIME
 - implement i_op->update_time()
 - flush mtime on fsync and last close
 - update i_mtime explicitly on truncate and fallocate

Fuse inode flag FUSE_I_MTIME_DIRTY serves as indication that local i_mtime
should be flushed to the server eventually.
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

b0aa7606

fuse: Trust kernel i_size only · 8373200b

由 Pavel Emelyanov 提交于 10月 10, 2013

Make fuse think that when writeback is on the inode's i_size is always
up-to-date and not update it with the value received from the userspace.
This is done because the page cache code may update i_size without letting
the FS know.

This assumption implies fixing the previously introduced short-read helper --
when a short read occurs the 'hole' is filled with zeroes.

fuse_file_fallocate() is also fixed because now we should keep i_size up to
date, so it must be updated if FUSE_FALLOCATE request succeeded.
Signed-off-by: NMaxim V. Patlasov <MPatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

8373200b

13 3月, 2014 1 次提交

fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d

由 Theodore Ts'o 提交于 3月 13, 2014

Previously, the no-op "mount -o mount /dev/xxx" operation when the
file system is already mounted read-write causes an implied,
unconditional syncfs().  This seems pretty stupid, and it's certainly
documented or guaraunteed to do this, nor is it particularly useful,
except in the case where the file system was mounted rw and is getting
remounted read-only.

However, it's possible that there might be some file systems that are
actually depending on this behavior.  In most file systems, it's
probably fine to only call sync_filesystem() when transitioning from
read-write to read-only, and there are some file systems where this is
not needed at all (for example, for a pseudo-filesystem or something
like romfs).
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Jan Kara <jack@suse.cz>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Anders Larsen <al@alarsen.net>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: xfs@oss.sgi.com
Cc: linux-btrfs@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: codalist@coda.cs.cmu.edu
Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: fuse-devel@lists.sourceforge.net
Cc: cluster-devel@redhat.com
Cc: linux-mtd@lists.infradead.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: linux-nilfs@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: reiserfs-devel@vger.kernel.org

02b9984d

25 10月, 2013 2 次提交

fuse: rcu-delay freeing fuse_conn · dd3e2c55

由 Al Viro 提交于 10月 03, 2013

makes ->permission() and ->d_revalidate() safety in RCU mode independent
from vfsmount_lock.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dd3e2c55

vfs: introduce d_instantiate_no_diralias() · b70a80e7

由 Miklos Szeredi 提交于 10月 01, 2013

...which just returns -EBUSY if a directory alias would be created.

This is to be used by fuse mkdir to make sure that a buggy or malicious
userspace filesystem doesn't do anything nasty.  Previously fuse used a
private mutex for this purpose, which can now go away.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

b70a80e7

13 9月, 2013 1 次提交

truncate: drop 'oldsize' truncate_pagecache() parameter · 7caef267

由 Kirill A. Shutemov 提交于 9月 12, 2013

truncate_pagecache() doesn't care about old size since commit
cedabed4 ("vfs: Fix vmtruncate() regression").  Let's drop it.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7caef267

12 9月, 2013 1 次提交

mm/page-writeback.c: add strictlimit feature · 5a537485

由 Maxim Patlasov 提交于 9月 11, 2013

The feature prevents mistrusted filesystems (ie: FUSE mounts created by
unprivileged users) to grow a large number of dirty pages before
throttling.  For such filesystems balance_dirty_pages always check bdi
counters against bdi limits.  I.e.  even if global "nr_dirty" is under
"freerun", it's not allowed to skip bdi checks.  The only use case for now
is fuse: it sets bdi max_ratio to 1% by default and system administrators
are supposed to expect that this limit won't be exceeded.

The feature is on if a BDI is marked by BDI_CAP_STRICTLIMIT flag.  A
filesystem may set the flag when it initializes its BDI.

The problematic scenario comes from the fact that nobody pays attention to
the NR_WRITEBACK_TEMP counter (i.e.  number of pages under fuse
writeback).  The implementation of fuse writeback releases original page
(by calling end_page_writeback) almost immediately.  A fuse request queued
for real processing bears a copy of original page.  Hence, if userspace
fuse daemon doesn't finalize write requests in timely manner, an
aggressive mmap writer can pollute virtually all memory by those temporary
fuse page copies.  They are carefully accounted in NR_WRITEBACK_TEMP, but
nobody cares.

To make further explanations shorter, let me use "NR_WRITEBACK_TEMP
problem" as a shortcut for "a possibility of uncontrolled grow of amount
of RAM consumed by temporary pages allocated by kernel fuse to process
writeback".

The problem was very easy to reproduce.  There is a trivial example
filesystem implementation in fuse userspace distribution: fusexmp_fh.c.  I
added "sleep(1);" to the write methods, then recompiled and mounted it.
Then created a huge file on the mount point and run a simple program which
mmap-ed the file to a memory region, then wrote a data to the region.  An
hour later I observed almost all RAM consumed by fuse writeback.  Since
then some unrelated changes in kernel fuse made it more difficult to
reproduce, but it is still possible now.

Putting this theoretical happens-in-the-lab thing aside, there is another
thing that really hurts real world (FUSE) users.  This is write-through
page cache policy FUSE currently uses.  I.e.  handling write(2), kernel
fuse populates page cache and flushes user data to the server
synchronously.  This is excessively suboptimal.  Pavel Emelyanov's patches
("writeback cache policy") solve the problem, but they also make resolving
NR_WRITEBACK_TEMP problem absolutely necessary.  Otherwise, simply copying
a huge file to a fuse mount would result in memory starvation.  Miklos,
the maintainer of FUSE, believes strictlimit feature the way to go.

And eventually putting FUSE topics aside, there is one more use-case for
strictlimit feature.  Using a slow USB stick (mass storage) in a machine
with huge amount of RAM installed is a well-known pain.  Let's make simple
computations.  Assuming 64GB of RAM installed, existing implementation of
balance_dirty_pages will start throttling only after 9.6GB of RAM becomes
dirty (freerun == 15% of total RAM).  So, the command "cp 9GB_file
/media/my-usb-storage/" may return in a few seconds, but subsequent
"umount /media/my-usb-storage/" will take more than two hours if effective
throughput of the storage is, to say, 1MB/sec.

After inclusion of strictlimit feature, it will be trivial to add a knob
(e.g.  /sys/devices/virtual/bdi/x:y/strictlimit) to enable it on demand.
Manually or via udev rule.  May be I'm wrong, but it seems to be quite a
natural desire to limit the amount of dirty memory for some devices we are
not fully trust (in the sense of sustainable throughput).

[akpm@linux-foundation.org: fix warning in page-writeback.c]
Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a537485

03 9月, 2013 1 次提交

fuse: hotfix truncate_pagecache() issue · 06a7c3c2

由 Maxim Patlasov 提交于 8月 30, 2013

The way how fuse calls truncate_pagecache() from fuse_change_attributes()
is completely wrong. Because, w/o i_mutex held, we never sure whether
'oldsize' and 'attr->size' are valid by the time of execution of
truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
released fc->lock in the middle of fuse_change_attributes(), we completely
loose control of actions which may happen with given inode until we reach
truncate_pagecache. The list of potentially dangerous actions includes
mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.

The typical outcome of doing truncate_pagecache() with outdated arguments
is data corruption from user point of view. This is (in some sense)
acceptable in cases when the issue is triggered by a change of the file on
the server (i.e. externally wrt fuse operation), but it is absolutely
intolerable in scenarios when a single fuse client modifies a file without
any external intervention. A real life case I discovered by fsx-linux
looked like this:

1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
to the server synchronously, then calls fuse_change_attributes(). The
latter updates i_size, releases fc->lock, but before comparing oldsize vs
attr->size..
3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
updating attributes and i_size, but now oldsize is equal to
outarg.attr.size because i_size has just been updated (step 2). Hence,
fuse_do_setattr() returns w/o calling truncate_pagecache().
4. As soon as ftruncate(2) completes, the user extends file size by
write(2) making a hole in the middle of file, then reads data from the hole
either by read(2) or mmap-ed read. The user expects to get zero data from
the hole, but gets stale data because truncate_pagecache() is not executed
yet.

The scenario above illustrates one side of the problem: not truncating the
page cache even though we should. Another side corresponds to truncating
page cache too late, when the state of inode changed significantly.
Theoretically, the following is possible:

1. As in the previous scenario fuse_dentry_revalidate() discovered that
i_size changed (due to our own fuse_do_setattr()) and is going to call
truncate_pagecache() for some 'new_size' it believes valid right now. But
by the time that particular truncate_pagecache() is called ...
2. fuse_do_setattr() returns (either having called truncate_pagecache() or
not -- it doesn't matter).
3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
4. mmap-ed write makes a page in the extended region dirty.

The result will be the lost of data user wrote on the fourth step.

The patch is a hotfix resolving the issue in a simplistic way: let's skip
dangerous i_size update and truncate_pagecache if an operation changing
file size is in progress. This simplistic approach looks correct for the
cases w/o external changes. And to handle them properly, more sophisticated
and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
postpone it until the issue is well discussed on the mailing list(s).

Changed in v2:
- improved patch description to cover both sides of the issue.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org

06a7c3c2

04 7月, 2013 1 次提交

mm: use totalram_pages instead of num_physpages at runtime · 0ed5fd13

由 Jiang Liu 提交于 7月 03, 2013

The global variable num_physpages is scheduled to be removed, so use
totalram_pages instead of num_physpages at runtime.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ed5fd13

03 6月, 2013 1 次提交

fuse: fix readdirplus Oops in fuse_dentry_revalidate · 28420dad

由 Miklos Szeredi 提交于 6月 03, 2013

Fix bug introduced by commit 4582a4ab "FUSE: Adapt readdirplus to application
usage patterns".

We need to check for a positive dentry; negative dentries are not added by
readdirplus.  Secondly we need to advise the use of readdirplus on the *parent*,
otherwise the whole thing is useless.  Thirdly all this is only relevant if
"readdirplus_auto" mode is selected by the filesystem.

We advise the use of readdirplus only if the dentry was still valid.  If we had
to redo the lookup then there was no use in doing the -plus version.
Reported-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: Feng Shuo <steve.shuo.feng@gmail.com>
CC: stable@vger.kernel.org

28420dad

01 5月, 2013 1 次提交

fuse: add flag to turn on async direct IO · 60b9df7a

由 Miklos Szeredi 提交于 5月 01, 2013

Without async DIO write requests to a single file were always serialized.
With async DIO that's no longer the case.

So don't turn on async DIO by default for fear of breaking backward
compatibility.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

60b9df7a

17 4月, 2013 3 次提交

fuse: skip blocking on allocations of synchronous requests · 0aada884

由 Maxim Patlasov 提交于 3月 21, 2013

A task may have at most one synchronous request allocated. So these
requests need not be otherwise limited.

The patch re-works fuse_get_req() to follow this idea.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

0aada884

fuse: add flag fc->initialized · 796523fb

由 Maxim Patlasov 提交于 3月 21, 2013

Existing flag fc->blocked is used to suspend request allocation both in case
of many background request submitted and period of time before init_reply
arrives from userspace. Next patch will skip blocking allocations of
synchronous request (disregarding fc->blocked). This is mostly OK, but
we still need to suspend allocations if init_reply is not arrived yet. The
patch introduces flag fc->initialized which will serve this purpose.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

796523fb

fuse: make request allocations for background processing explicit · 8b41e671

由 Maxim Patlasov 提交于 3月 21, 2013

There are two types of processing requests in FUSE: synchronous (via
fuse_request_send()) and asynchronous (via adding to fc->bg_queue).

Fortunately, the type of processing is always known in advance, at the time
of request allocation. This preparatory patch utilizes this fact making
fuse_get_req() aware about the type. Next patches will use it.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

8b41e671

04 3月, 2013 1 次提交

fs: Limit sys_mount to only request filesystem modules. · 7f78e035

由 Eric W. Biederman 提交于 3月 02, 2013

Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: NKees Cook <keescook@chromium.org>
Reported-by: NKees Cook <keescook@google.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

7f78e035

26 2月, 2013 1 次提交

fs: encode_fh: return FILEID_INVALID if invalid fid_type · 94e07a75

由 Namjae Jeon 提交于 2月 17, 2013

This patch is a follow up on below patch:

[PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type
commit: 216b6cbdSigned-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NVivek Trivedi <t.vivek@samsung.com>
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Acked-by: NSage Weil <sage@inktank.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94e07a75

07 2月, 2013 1 次提交

fuse: allow control of adaptive readdirplus use · 634734b6

由 Eric Wong 提交于 2月 06, 2013

For some filesystems (e.g. GlusterFS), the cost of performing a
normal readdir and readdirplus are identical.  Since adaptively
using readdirplus has no benefit for those systems, give
users/filesystems the option to control adaptive readdirplus use.

v2 of this patch incorporates Miklos's suggestion to simplify the code,
as well as improving consistency of macro names and documentation.
Signed-off-by: NEric Wong <normalperson@yhbt.net>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

634734b6

01 2月, 2013 2 次提交

FUSE: Adapt readdirplus to application usage patterns · 4582a4ab

由 Feng Shuo 提交于 1月 15, 2013

Use the same adaptive readdirplus mechanism as NFS:

http://permalink.gmane.org/gmane.linux.nfs/49299

If the user space implementation wants to disable readdirplus
temporarily, it could just return ENOTSUPP. Then kernel will
recall it with readdir.
Signed-off-by: NFeng Shuo <steve.shuo.feng@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

4582a4ab

Do not use RCU for current process credentials · c2132c1b

由 Anatol Pomozov 提交于 1月 14, 2013

Commit c69e8d9c added rcu lock to fuse/dir.c It was assuming
that 'task' is some other process but in fact this parameter always
equals to 'current'. Inline this parameter to make it more readable
and remove RCU lock as it is not needed when access current process
credentials.
Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

c2132c1b

24 1月, 2013 3 次提交

fuse: categorize fuse_get_req() · b111c8c0

由 Maxim Patlasov 提交于 10月 26, 2012

The patch categorizes all fuse_get_req() invocations into two categories:
 - fuse_get_req_nopages(fc) - when caller doesn't care about req->pages
 - fuse_get_req(fc, n) - when caller need n page pointers (n > 0)

Adding fuse_get_req_nopages() helps to avoid numerous fuse_get_req(fc, 0)
scattered over code. Now it's clear from the first glance when a caller need
fuse_req with page pointers.

The patch doesn't make any logic changes. In multi-page case, it silly
allocates array of FUSE_MAX_PAGES_PER_REQ page pointers. This will be amended
by future patches.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

b111c8c0

fuse: general infrastructure for pages[] of variable size · 4250c066

由 Maxim Patlasov 提交于 10月 26, 2012

The patch removes inline array of FUSE_MAX_PAGES_PER_REQ page pointers from
fuse_req. Instead of that, req->pages may now point either to small inline
array or to an array allocated dynamically.

This essentially means that all callers of fuse_request_alloc[_nofs] should
pass the number of pages needed explicitly.

The patch doesn't make any logic changes.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

4250c066

fuse: implement NFS-like readdirplus support · 0b05b183

由 Anand V. Avati 提交于 8月 19, 2012

This patch implements readdirplus support in FUSE, similar to NFS.
The payload returned in the readdirplus call contains
'fuse_entry_out' structure thereby providing all the necessary inputs
for 'faking' a lookup() operation on the spot.

If the dentry and inode already existed (for e.g. in a re-run of ls -l)
then just the inode attributes timeout and dentry timeout are refreshed.

With a simple client->network->server implementation of a FUSE based
filesystem, the following performance observations were made:

Test: Performing a filesystem crawl over 20,000 files with

sh# time ls -lR /mnt

Without readdirplus:
Run 1: 18.1s
Run 2: 16.0s
Run 3: 16.2s

With readdirplus:
Run 1: 4.1s
Run 2: 3.8s
Run 3: 3.8s

The performance improvement is significant as it avoided 20,000 upcalls
calls (lookup). Cache consistency is no worse than what already is.
Signed-off-by: NAnand V. Avati <avati@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

0b05b183

15 11月, 2012 1 次提交

userns: Support fuse interacting with multiple user namespaces · 499dcf20

由 Eric W. Biederman 提交于 2月 07, 2012

Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data.

The connection between between a fuse filesystem and a fuse daemon is
established when a fuse filesystem is mounted and provided with a file
descriptor the fuse daemon created by opening /dev/fuse.

For now restrict the communication of uids and gids between the fuse
filesystem and the fuse daemon to the initial user namespace.  Enforce
this by verifying the file descriptor passed to the mount of fuse was
opened in the initial user namespace.  Ensuring the mount happens in
the initial user namespace is not necessary as mounts from non-initial
user namespaces are not yet allowed.

In fuse_req_init_context convert the currrent fsuid and fsgid into the
initial user namespace for the request that will be sent to the fuse
daemon.

In fuse_fill_attr convert the uid and gid passed from the fuse daemon
from the initial user namespace into kuids and kgids.

In iattr_to_fattr called from fuse_setattr convert kuids and kgids
into the uids and gids in the initial user namespace before passing
them to the fuse filesystem.

In fuse_change_attributes_common called from fuse_dentry_revalidate,
fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert
the uid and gid from the fuse daemon into a kuid and a kgid to store
on the fuse inode.

By default fuse mounts are restricted to task whose uid, suid, and
euid matches the fuse user_id and whose gid, sgid, and egid matches
the fuse group id.  Convert the user_id and group_id mount options
into kuids and kgids at mount time, and use uid_eq and gid_eq to
compare the in fuse_allow_task.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

499dcf20

03 10月, 2012 1 次提交

fs: push rcu_barrier() from deactivate_locked_super() to filesystems · 8c0a8537

由 Kirill A. Shutemov 提交于 9月 26, 2012

There's no reason to call rcu_barrier() on every
deactivate_locked_super().  We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths.  E.g.  on my machine exit_group() of a last process in IPC
namespace takes 0.07538s.  rcu_barrier() takes 0.05188s of that time.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8c0a8537

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功