提交 · 5d069dbe8aaf2a197142558b6fb2978189ba3454 · openeuler / Kernel

10 12月, 2020 1 次提交

由 Miklos Szeredi 提交于 12月 10, 2020

Jan Kara's analysis of the syzbot report (edited):

  The reproducer opens a directory on FUSE filesystem, it then attaches
  dnotify mark to the open directory.  After that a fuse_do_getattr() call
  finds that attributes returned by the server are inconsistent, and calls
  make_bad_inode() which, among other things does:

          inode->i_mode = S_IFREG;

  This then confuses dnotify which doesn't tear down its structures
  properly and eventually crashes.

Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode.  Also add the test to ops which the bad_inode_ops would
have caught.

This bug goes back to the initial merge of fuse in 2.6.14...

Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Tested-by: NJan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>

5d069dbe

12 11月, 2020 5 次提交

fuse: add a flag FUSE_OPEN_KILL_SUIDGID for open() request · 643a666a

由 Vivek Goyal 提交于 10月 09, 2020

With FUSE_HANDLE_KILLPRIV_V2 support, server will need to kill suid/sgid/
security.capability on open(O_TRUNC), if server supports
FUSE_ATOMIC_O_TRUNC.

But server needs to kill suid/sgid only if caller does not have CAP_FSETID.
Given server does not have this information, client needs to send this info
to server.

So add a flag FUSE_OPEN_KILL_SUIDGID to fuse_open_in request which tells
server to kill suid/sgid (only if group execute is set).

This flag is added to the FUSE_OPEN request, as well as the FUSE_CREATE
request if the create was non-exclusive, since that might result in an
existing file being opened/truncated.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

643a666a

fuse: don't send ATTR_MODE to kill suid/sgid for handle_killpriv_v2 · 8981bdfd

由 Vivek Goyal 提交于 10月 09, 2020

If client does a write() on a suid/sgid file, VFS will first call
fuse_setattr() with ATTR_KILL_S[UG]ID set. This requires sending setattr
to file server with ATTR_MODE set to kill suid/sgid. But to do that client
needs to know latest mode otherwise it is racy.

To reduce the race window, current code first call fuse_do_getattr() to get
latest ->i_mode and then resets suid/sgid bits and sends rest to server
with setattr(ATTR_MODE). This does not reduce the race completely but
narrows race window significantly.

With fc->handle_killpriv_v2 enabled, it should be possible to remove this
race completely. Do not kill suid/sgid with ATTR_MODE at all. It will be
killed by server when WRITE request is sent to server soon. This is
similar to fc->handle_killpriv logic. V2 is just more refined version of
protocol. Hence this patch does not send ATTR_MODE to kill suid/sgid if
fc->handle_killpriv_v2 is enabled.

This creates an issue if fc->writeback_cache is enabled. In that case
WRITE can be cached in guest and server might not see WRITE request and
hence will not kill suid/sgid. Miklos suggested that in such cases, we
should fallback to a writethrough WRITE instead and that will generate
WRITE request and kill suid/sgid. This patch implements that too.

But this relies on client seeing the suid/sgid set. If another client sets
suid/sgid and this client does not see it immideately, then we will not
fallback to writethrough WRITE. So this is one limitation with both
fc->handle_killpriv_v2 and fc->writeback_cache enabled. Both the options
are not fully compatible. But might be good enough for many use cases.

Note: This patch is not checking whether security.capability is set or not
when falling back to writethrough path. If suid/sgid is not set and
only security.capability is set, that will be taken care of by
file_remove_privs() call in ->writeback_cache path.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

8981bdfd

fuse: set FUSE_WRITE_KILL_SUIDGID in cached write path · b8667395

由 Vivek Goyal 提交于 10月 09, 2020

With HANDLE_KILLPRIV_V2, server will need to kill suid/sgid if caller does
not have CAP_FSETID.  We already have a flag FUSE_WRITE_KILL_SUIDGID in
WRITE request and we already set it in direct I/O path.

To make it work in cached write path also, start setting
FUSE_WRITE_KILL_SUIDGID in this path too.

Set it only if fc->handle_killpriv_v2 is set.  Otherwise client is
responsible for kill suid/sgid.

In case of direct I/O we set FUSE_WRITE_KILL_SUIDGID unconditionally
because we don't call file_remove_privs() in that path (with cache=none
option).
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

b8667395

fuse: rename FUSE_WRITE_KILL_PRIV to FUSE_WRITE_KILL_SUIDGID · 10c52c84

由 Miklos Szeredi 提交于 11月 11, 2020

Kernel has:
ATTR_KILL_PRIV -> clear "security.capability"
ATTR_KILL_SUID -> clear S_ISUID
ATTR_KILL_SGID -> clear S_ISGID if executable

Fuse has:
FUSE_WRITE_KILL_PRIV -> clear S_ISUID and S_ISGID if executable

So FUSE_WRITE_KILL_PRIV implies the complement of ATTR_KILL_PRIV, which is
somewhat confusing.  Also PRIV implies all privileges, including
"security.capability".

Change the name to FUSE_WRITE_KILL_SUIDGID and make FUSE_WRITE_KILL_PRIV an
alias to perserve API compatibility
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

10c52c84

fuse: launder page should wait for page writeback · 3993382b

由 Miklos Szeredi 提交于 11月 11, 2020

Qian Cai reports that the WARNING in tree_insert() can be triggered by a
fuzzer with the following call chain:

invalidate_inode_pages2_range()
   fuse_launder_page()
      fuse_writepage_locked()
         tree_insert()

The reason is that another write for the same page is already queued.

The simplest fix is to wait until the pending write is completed and only
after that queue the new write.

Since this case is very rare, the additional wait should not be a problem.
Reported-by: NQian Cai <cai@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

3993382b

18 9月, 2020 2 次提交

fuse: split fuse_mount off of fuse_conn · fcee216b

由 Max Reitz 提交于 5月 06, 2020

We want to allow submounts for the same fuse_conn, but with different
superblocks so that each of the submounts has its own device ID.  To do
so, we need to split all mount-specific information off of fuse_conn
into a new fuse_mount structure, so that multiple mounts can share a
single fuse_conn.

We need to take care only to perform connection-level actions once (i.e.
when the fuse_conn and thus the first fuse_mount are established, or
when the last fuse_mount and thus the fuse_conn are destroyed).  For
example, fuse_sb_destroy() must invoke fuse_send_destroy() until the
last superblock is released.

To do so, we keep track of which fuse_mount is the root mount and
perform all fuse_conn-level actions only when this fuse_mount is
involved.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fcee216b

fuse: fix the ->direct_IO() treatment of iov_iter · 933a3752

由 Al Viro 提交于 9月 17, 2020

the callers rely upon having any iov_iter_truncate() done inside
->direct_IO() countered by iov_iter_reexpand().
Reported-by: NQian Cai <cai@redhat.com>
Tested-by: NQian Cai <cai@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

933a3752

10 9月, 2020 3 次提交

virtiofs: serialize truncate/punch_hole and dax fault path · 6ae330ca

由 Vivek Goyal 提交于 8月 19, 2020

Currently in fuse we don't seem have any lock which can serialize fault
path with truncate/punch_hole path. With dax support I need one for
following reasons.

1. Dax requirement

  DAX fault code relies on inode size being stable for the duration of
  fault and want to serialize with truncate/punch_hole and they explicitly
  mention it.

  static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
                               const struct iomap_ops *ops)
        /*
         * Check whether offset isn't beyond end of file now. Caller is
         * supposed to hold locks serializing us with truncate / punch hole so
         * this is a reliable test.
         */
        max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);

2. Make sure there are no users of pages being truncated/punch_hole

  get_user_pages() might take references to page and then do some DMA
  to said pages. Filesystem might truncate those pages without knowing
  that a DMA is in progress or some I/O is in progress. So use
  dax_layout_busy_page() to make sure there are no such references
  and I/O is not in progress on said pages before moving ahead with
  truncation.

3. Limitation of kvm page fault error reporting

  If we are truncating file on host first and then removing mappings in
  guest lateter (truncate page cache etc), then this could lead to a
  problem with KVM. Say a mapping is in place in guest and truncation
  happens on host. Now if guest accesses that mapping, then host will
  take a fault and kvm will either exit to qemu or spin infinitely.

  IOW, before we do truncation on host, we need to make sure that guest
  inode does not have any mapping in that region or whole file.

4. virtiofs memory range reclaim

 Soon I will introduce the notion of being able to reclaim dax memory
 ranges from a fuse dax inode. There also I need to make sure that
 no I/O or fault is going on in the reclaimed range and nobody is using
 it so that range can be reclaimed without issues.

Currently if we take inode lock, that serializes read/write. But it does
not do anything for faults. So I add another semaphore fuse_inode->i_mmap_sem
for this purpose.  It can be used to serialize with faults.

As of now, I am adding taking this semaphore only in dax fault path and
not regular fault path because existing code does not have one. May
be existing code can benefit from it as well to take care of some
races, but that we can fix later if need be. For now, I am just focussing
only on DAX path which is new path.

Also added logic to take fuse_inode->i_mmap_sem in
truncate/punch_hole/open(O_TRUNC) path to make sure file truncation and
fuse dax fault are mutually exlusive and avoid all the above problems.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

6ae330ca

virtiofs: add DAX mmap support · 2a9a609a

由 Stefan Hajnoczi 提交于 8月 19, 2020

Add DAX mmap() support.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

2a9a609a

virtiofs: implement dax read/write operations · c2d0ad00

由 Vivek Goyal 提交于 8月 19, 2020

This patch implements basic DAX support. mmap() is not implemented
yet and will come in later patches. This patch looks into implemeting
read/write.

We make use of interval tree to keep track of per inode dax mappings.

Do not use dax for file extending writes, instead just send WRITE message
to daemon (like we do for direct I/O path). This will keep write and
i_size change atomic w.r.t crash.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NPeng Tao <tao.peng@linux.alibaba.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

c2d0ad00

17 7月, 2020 1 次提交

treewide: Remove uninitialized_var() usage · 3f649ab7

由 Kees Cook 提交于 6月 03, 2020

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: NKees Cook <keescook@chromium.org>

3f649ab7

15 7月, 2020 1 次提交

fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS · 31070f6c

由 Chirantan Ekbote 提交于 7月 14, 2020

The ioctl encoding for this parameter is a long but the documentation says
it should be an int and the kernel drivers expect it to be an int.  If the
fuse driver treats this as a long it might end up scribbling over the stack
of a userspace process that only allocated enough space for an int.

This was previously discussed in [1] and a patch for fuse was proposed in
[2].  From what I can tell the patch in [2] was nacked in favor of adding
new, "fixed" ioctls and using those from userspace.  However there is still
no "fixed" version of these ioctls and the fact is that it's sometimes
infeasible to change all userspace to use the new one.

Handling the ioctls specially in the fuse driver seems like the most
pragmatic way for fuse servers to support them without causing crashes in
userspace applications that call them.

[1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/
[2]: https://sourceforge.net/p/fuse/mailman/message/31771759/Signed-off-by: NChirantan Ekbote <chirantan@chromium.org>
Fixes: 59efec7b ("fuse: implement ioctl support")
Cc: <stable@vger.kernel.org>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

31070f6c

14 7月, 2020 4 次提交

fuse: don't ignore errors from fuse_writepages_fill() · 7779b047

由 Vasily Averin 提交于 6月 25, 2020

fuse_writepages() ignores some errors taken from fuse_writepages_fill() I
believe it is a bug: if .writepages is called with WB_SYNC_ALL it should
either guarantee that all data was successfully saved or return error.

Fixes: 26d614df ("fuse: Implement writepages callback")
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

7779b047

fuse: clean up condition for writepage sending · 6ddf3af9

由 Miklos Szeredi 提交于 7月 14, 2020

fuse_writepages_fill uses following construction:

if (wpa && ap->num_pages &&
    (A || B || C)) {
        action;
} else if (wpa && D) {
        if (E) {
                the same action;
        }
}

 - ap->num_pages check is always true and can be removed

 - "if" and "else if" calls the same action and can be merged.

Move checking A, B, C, D, E conditions to a helper, add comments.
Original-patch-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

6ddf3af9

fuse: fix warning in tree_insert() and clean up writepage insertion · c146024e

由 Miklos Szeredi 提交于 7月 14, 2020

fuse_writepages_fill() calls tree_insert() with ap->num_pages = 0 which
triggers the following warning:

 WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 [fuse]
 RIP: 0010:tree_insert+0xab/0xc0 [fuse]
 Call Trace:
  fuse_writepages_fill+0x5da/0x6a0 [fuse]
  write_cache_pages+0x171/0x470
  fuse_writepages+0x8a/0x100 [fuse]
  do_writepages+0x43/0xe0

Fix up the warning and clean up the code around rb-tree insertion:

 - Rename tree_insert() to fuse_insert_writeback() and make it return the
   conflicting entry in case of failure

 - Re-add tree_insert() as a wrapper around fuse_insert_writeback()

 - Rename fuse_writepage_in_flight() to fuse_writepage_add() and reverse
   the meaning of the return value to mean

    + "true" in case the writepage entry was successfully added

    + "false" in case it was in-fligt queued on an existing writepage
       entry's auxiliary list or the existing writepage entry's temporary
       page updated

   Switch from fuse_find_writeback() + tree_insert() to
   fuse_insert_writeback()

 - Move setting orig_pages to before inserting/updating the entry; this may
   result in the orig_pages value being discarded later in case of an
   in-flight request

 - In case of a new writepage entry use fuse_writepage_add()
   unconditionally, only set data->wpa if the entry was added.

Fixes: 6b2fb799 ("fuse: optimize writepages search")
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Original-path-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

c146024e

fuse: move rb_erase() before tree_insert() · 69a6487a

由 Miklos Szeredi 提交于 7月 14, 2020

In fuse_writepage_end() the old writepages entry needs to be removed from
the rbtree before inserting the new one, otherwise tree_insert() would
fail. This is a very rare codepath and no reproducer exists.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

69a6487a

03 6月, 2020 1 次提交

fuse: convert from readpages to readahead · 76a0294e

由 Matthew Wilcox (Oracle) 提交于 6月 01, 2020

Implement the new readahead operation in fuse by using __readahead_batch()
to fill the array of pages in fuse_args_pages directly.  This lets us
inline fuse_readpages_fill() into fuse_readahead().

[willy@infradead.org: build fix]
  Link: http://lkml.kernel.org/r/20200415025938.GB5820@bombadil.infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Acked-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-25-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

76a0294e

20 5月, 2020 2 次提交

fuse: copy_file_range should truncate cache · 9b46418c

由 Miklos Szeredi 提交于 5月 20, 2020

After the copy operation completes the cache is not up-to-date.  Truncate
all pages in the interval that has successfully been copied.

Truncating completely copied dirty pages is okay, since the data has been
overwritten anyway.  Truncating partially copied dirty pages is not okay;
add a comment for now.

Fixes: 88bc7d50 ("fuse: add support for copy_file_range()")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

9b46418c

fuse: fix copy_file_range cache issues · 2c4656df

由 Miklos Szeredi 提交于 5月 20, 2020

a) Dirty cache needs to be written back not just in the writeback_cache
case, since the dirty pages may come from memory maps.

b) The fuse_writeback_range() helper takes an inclusive interval, so the
end position needs to be pos+len-1 instead of pos+len.

Fixes: 88bc7d50 ("fuse: add support for copy_file_range()")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

2c4656df

19 5月, 2020 3 次提交

fuse: optimize writepages search · 6b2fb799

由 Maxim Patlasov 提交于 9月 19, 2019

Re-work fi->writepages, replacing list with rb-tree.  This improves
performance because kernel fuse iterates through fi->writepages for each
writeback page and typical number of entries is about 800 (for 100MB of
fuse writeback).

Before patch:

10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 41.3473 s, 260 MB/s

 2  1      0 57445400  40416 6323676    0    0    33 374743 8633 19210  1  8 88  3  0

  29.86%  [kernel]               [k] _raw_spin_lock
  26.62%  [fuse]                 [k] fuse_page_is_writeback

After patch:

10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 21.4954 s, 500 MB/s

 2  9      0 53676040  31744 10265984    0    0    64 854790 10956 48387  1  6 88  6  0

  23.55%  [kernel]             [k] copy_user_enhanced_fast_string
   9.87%  [kernel]             [k] __memcpy
   3.10%  [kernel]             [k] _raw_spin_lock
Signed-off-by: NMaxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

6b2fb799

fuse: always flush dirty data on close(2) · 614c026e

由 Miklos Szeredi 提交于 5月 19, 2020

We want cached data to synced with the userspace filesystem on close(), for
example to allow getting correct st_blocks value. Do this regardless of
whether the userspace filesystem implements a FLUSH method or not.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

614c026e

fuse: invalidate inode attr in writeback cache mode · cf576c58

由 Eryu Guan 提交于 5月 12, 2020

Under writeback mode, inode->i_blocks is not updated, making utils du
read st.blocks as 0.

For example, when using virtiofs (cache=always & nondax mode) with
writeback_cache enabled, writing a new file and check its disk usage
with du, du reports 0 usage.

  # uname -r
  5.6.0-rc6+
  # mount -t virtiofs virtiofs /mnt/virtiofs
  # rm -f /mnt/virtiofs/testfile

  # create new file and do extend write
  # xfs_io -fc "pwrite 0 4k" /mnt/virtiofs/testfile
  wrote 4096/4096 bytes at offset 0
  4 KiB, 1 ops; 0.0001 sec (28.103 MiB/sec and 7194.2446 ops/sec)
  # du -k /mnt/virtiofs/testfile
  0               <==== disk usage is 0
  # stat -c %s,%b /mnt/virtiofs/testfile
  4096,0          <==== i_size is correct, but st_blocks is 0

Fix it by invalidating attr in fuse_flush(), so we get up-to-date attr
from server on next getattr.
Signed-off-by: NEryu Guan <eguan@linux.alibaba.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

cf576c58

20 4月, 2020 1 次提交

virtiofs: schedule blocking async replies in separate worker · bb737bbe

由 Vivek Goyal 提交于 4月 20, 2020

In virtiofs (unlike in regular fuse) processing of async replies is
serialized.  This can result in a deadlock in rare corner cases when
there's a circular dependency between the completion of two or more async
replies.

Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR ==
SCRATCH_MNT (which is a misconfiguration):

 - Process A is waiting for page lock in worker thread context and blocked
   (virtio_fs_requests_done_work()).
 - Process B is holding page lock and waiting for pending writes to
   finish (fuse_wait_on_page_writeback()).
 - Write requests are waiting in virtqueue and can't complete because
   worker thread is blocked on page lock (process A).

Fix this by creating a unique work_struct for each async reply that can
block (O_DIRECT read).

Fixes: a62a8ef9 ("virtio-fs: add virtiofs filesystem")
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

bb737bbe

06 2月, 2020 3 次提交

fuse: use true,false for bool variable · cabdb4fa

由 zhengbin 提交于 1月 14, 2020

Fixes coccicheck warning:

fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

cabdb4fa

fuse: don't overflow LLONG_MAX with end offset · 2f139829

由 Miklos Szeredi 提交于 2月 06, 2020

Handle the special case of fuse_readpages() wanting to read the last page
of a hugest file possible and overflowing the end offset in the process.

This is basically to unbreak xfstests:generic/525 and prevent filesystems
from doing bad things with an overflowing offset.
Reported-by: NXiao Yang <ice_yangxiao@163.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

2f139829

fix up iter on short count in fuse_direct_io() · f658adee

由 Miklos Szeredi 提交于 2月 06, 2020

fuse_direct_io() can end up advancing the iterator by more than the amount
of data read or written.  This case is handled by the generic code if going
through ->direct_IO(), but not in the FOPEN_DIRECT_IO case.

Fix by reverting the extra bytes from the iterator in case of error or a
short count.

To test: install lxcfs, then the following testcase
  int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY);
  sendfile(1, fd, NULL, 16777216);
  sendfile(1, fd, NULL, 16777216);
will spew WARN_ON() in iov_iter_pipe().
Reported-by: NPeter Geis <pgwipeout@gmail.com>
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Fixes: 3c3db095 ("fuse: use iov_iter based generic splice helpers")
Cc: <stable@vger.kernel.org> # v5.1
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

f658adee

16 1月, 2020 1 次提交

fuse: fix fuse_send_readpages() in the syncronous read case · 7df1e988

由 Miklos Szeredi 提交于 1月 16, 2020

Buffered read in fuse normally goes via:

 -> generic_file_buffered_read()
   -> fuse_readpages()
     -> fuse_send_readpages()
       ->fuse_simple_request() [called since v5.4]

In the case of a read request, fuse_simple_request() will return a
non-negative bytecount on success or a negative error value.  A positive
bytecount was taken to be an error and the PG_error flag set on the page.
This resulted in generic_file_buffered_read() falling back to ->readpage(),
which would repeat the read request and succeed.  Because of the repeated
read succeeding the bug was not detected with regression tests or other use
cases.

The FTP module in GVFS however fails the second read due to the
non-seekable nature of FTP downloads.

Fix by checking and ignoring positive return value from
fuse_simple_request().
Reported-by: NOndrej Holy <oholy@redhat.com>
Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441
Fixes: 134831e3 ("fuse: convert readpages to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

7df1e988

27 11月, 2019 1 次提交

fuse: fix leak of fuse_io_priv · f1ebdeff

由 Miklos Szeredi 提交于 11月 25, 2019

exit_aio() is sometimes stuck in wait_for_completion() after aio is issued
with direct IO and the task receives a signal.

The reason is failure to call ->ki_complete() due to a leaked reference to
fuse_io_priv.  This happens in fuse_async_req_send() if
fuse_simple_background() returns an error (e.g. -EINTR).

In this case the error value is propagated via io->err, so return success
to not confuse callers.

This issue is tracked as a virtio-fs issue:
https://gitlab.com/virtio-fs/qemu/issues/14Reported-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Fixes: 45ac96ed ("fuse: convert direct_io to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

f1ebdeff

12 11月, 2019 1 次提交

fuse: verify write return · 8aab336b

由 Miklos Szeredi 提交于 11月 12, 2019

Make sure filesystem is not returning a bogus number of bytes written.

Fixes: ea9b9907 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org> # v2.6.26
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

8aab336b

23 10月, 2019 2 次提交

fuse: redundant get_fuse_inode() calls in fuse_writepages_fill() · 091d1a72

由 Vasily Averin 提交于 8月 19, 2019

Currently fuse_writepages_fill() calls get_fuse_inode() few times with
the same argument.
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

091d1a72

fuse: truncate pending writes on O_TRUNC · e4648309

由 Miklos Szeredi 提交于 10月 23, 2019

Make sure cached writes are not reordered around open(..., O_TRUNC), with
the obvious wrong results.

Fixes: 4d99ff8f ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

e4648309

24 9月, 2019 2 次提交

fuse: kmemcg account fs data · dc69e98c

由 Khazhismel Kumykov 提交于 9月 17, 2019

account per-file, dentry, and inode data

blockdev/superblock and temporary per-request data was left alone, as
this usually isn't accounted
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Signed-off-by: NKhazhismel Kumykov <khazhy@google.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

dc69e98c

fuse: fix missing unlock_page in fuse_writepage() · d5880c7a

由 Vasily Averin 提交于 9月 13, 2019

unlock_page() was missing in case of an already in-flight write against the
same page.
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Fixes: ff17be08 ("fuse: writepage: skip already in flight")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

d5880c7a

10 9月, 2019 6 次提交

fuse: simplify request allocation · 7213394c

由 Miklos Szeredi 提交于 9月 10, 2019

Page arrays are not allocated together with the request anymore.  Get rid
of the dead code
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

7213394c

fuse: convert release to simple api · 4cb54866

由 Miklos Szeredi 提交于 9月 10, 2019

Since we cannot reserve the request structure up-front, make sure that the
request allocation doesn't fail using __GFP_NOFAIL.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

4cb54866

fuse: convert writepages to simple api · 33826ebb

由 Miklos Szeredi 提交于 9月 10, 2019

Derive fuse_writepage_args from fuse_io_args.

Sending the request is tricky since it was done with fi->lock held, hence
we must either use atomic allocation or release the lock.  Both are
possible so try atomic first and if it fails, release the lock and do the
regular allocation with GFP_NOFS and __GFP_NOFAIL.  Both flags are
necessary for correct operation.

Move the page realloc function from dev.c to file.c and convert to using
fuse_writepage_args.

The last caller of fuse_write_fill() is gone, so get rid of it.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

33826ebb

fuse: convert readdir to simple api · 43f5098e

由 Miklos Szeredi 提交于 9月 10, 2019

The old fuse_read_fill() helper can be deleted, now that the last user is
gone.

The fuse_io_args struct is moved to fuse_i.h so it can be shared between
readdir/read code.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

43f5098e

fuse: convert readpages to simple api · 134831e3

由 Miklos Szeredi 提交于 9月 10, 2019

Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the
functionality of the same named members in fuse_req.

fuse_short_read() can now take struct fuse_args_pages.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

134831e3

fuse: convert direct_io to simple api · 45ac96ed

由 Miklos Szeredi 提交于 9月 10, 2019

Change of semantics in fuse_async_req_send/fuse_send_(read|write): these
can now return error, in which case the 'end' callback isn't called, so the
fuse_io_args object needs to be freed.

Added verification that the return value is sane (less than or equal to the
requested read/write size).
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

45ac96ed

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功