1. 10 9月, 2019 12 次提交
    • M
      fuse: add simple background helper · 12597287
      Miklos Szeredi 提交于
      Create a helper named fuse_simple_background() that is similar to
      fuse_simple_request().  Unlike the latter, it returns immediately and calls
      the supplied 'end' callback when the reply is received.
      
      The supplied 'args' pointer is stored in 'fuse_req' which allows the
      callback to interpret the output arguments decoded from the reply.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      12597287
    • M
      fuse: move page alloc · 4c4f03f7
      Miklos Szeredi 提交于
      fuse_req_pages_alloc() is moved to file.c, since its internal use by the
      device code will eventually be removed.
      
      Rename to fuse_pages_alloc() to signify that it's not only usable for
      fuse_req page array.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      4c4f03f7
    • M
      fuse: add pages to fuse_args · 68583165
      Miklos Szeredi 提交于
      Derive fuse_args_pages from fuse_args. This is used to handle requests
      which use pages for input or output.  The related flags are added to
      fuse_args.
      
      New FR_ALLOC_PAGES flags is added to indicate whether the page arrays in
      fuse_req need to be freed by fuse_put_request() or not.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      68583165
    • M
      fuse: convert destroy to simple api · 1ccd1ea2
      Miklos Szeredi 提交于
      We can use the "force" flag to make sure the DESTROY request is always sent
      to userspace.  So no need to keep it allocated during the lifetime of the
      filesystem.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1ccd1ea2
    • M
      fuse: add nocreds to fuse_args · e413754b
      Miklos Szeredi 提交于
      In some cases it makes no sense to set pid/uid/gid fields in the request
      header.  Allow fuse_simple_background() to omit these.  This is only
      required in the "force" case, so for now just WARN if set otherwise.
      
      Fold fuse_get_req_nofail_nopages() into its only caller.  Comment is
      obsolete anyway.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e413754b
    • M
      fuse: convert fuse_force_forget() to simple api · 3545fe21
      Miklos Szeredi 提交于
      Move this function to the readdir.c where its only caller resides.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3545fe21
    • M
      fuse: add noreply to fuse_args · 454a7613
      Miklos Szeredi 提交于
      This will be used by fuse_force_forget().
      
      We can expand fuse_request_send() into fuse_simple_request().  The
      FR_WAITING bit has already been set, no need to check.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      454a7613
    • M
      fuse: convert flush to simple api · c500ebaa
      Miklos Szeredi 提交于
      Add 'force' to fuse_args and use fuse_get_req_nofail_nopages() to allocate
      the request in that case.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c500ebaa
    • M
      fuse: simplify 'nofail' request · 40ac7ab2
      Miklos Szeredi 提交于
      Instead of complex games with a reserved request, just use __GFP_NOFAIL.
      
      Both calers (flush, readdir) guarantee that connection was already
      initialized, so no need to wait for fc->initialized.
      
      Also remove unneeded clearing of FR_BACKGROUND flag.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      40ac7ab2
    • M
      fuse: rearrange and resize fuse_args fields · 1f4e9d03
      Miklos Szeredi 提交于
      This makes the structure better packed.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1f4e9d03
    • M
      fuse: flatten 'struct fuse_args' · d5b48543
      Miklos Szeredi 提交于
      ...to make future expansion simpler.  The hiearachical structure is a
      historical thing that does not serve any practical purpose.
      
      The generated code is excatly the same before and after the patch.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d5b48543
    • E
      fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock · 76e43c8c
      Eric Biggers 提交于
      When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
      and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
      
      This may have to wait for fuse_iqueue::waitq.lock to be released by one
      of many places that take it with IRQs enabled.  Since the IRQ handler
      may take kioctx::ctx_lock, lockdep reports that a deadlock is possible.
      
      Fix it by protecting the state of struct fuse_iqueue with a separate
      spinlock, and only accessing fuse_iqueue::waitq using the versions of
      the waitqueue functions which do IRQ-safe locking internally.
      
      Reproducer:
      
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <sys/mount.h>
      	#include <sys/stat.h>
      	#include <sys/syscall.h>
      	#include <unistd.h>
      	#include <linux/aio_abi.h>
      
      	int main()
      	{
      		char opts[128];
      		int fd = open("/dev/fuse", O_RDWR);
      		aio_context_t ctx = 0;
      		struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd };
      		struct iocb *cbp = &cb;
      
      		sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
      		mkdir("mnt", 0700);
      		mount("foo",  "mnt", "fuse", 0, opts);
      		syscall(__NR_io_setup, 1, &ctx);
      		syscall(__NR_io_submit, ctx, 1, &cbp);
      	}
      
      Beginning of lockdep output:
      
      	=====================================================
      	WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      	5.3.0-rc5 #9 Not tainted
      	-----------------------------------------------------
      	syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      	000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
      	000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
      	000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
      
      	and this task is already holding:
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
      	which would create a new lock dependency:
      	 (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}
      
      	but this new dependency connects a SOFTIRQ-irq-safe lock:
      	 (&(&ctx->ctx_lock)->rlock){..-.}
      
      	[...]
      
      Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com
      Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.19+
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      76e43c8c
  2. 24 4月, 2019 2 次提交
    • K
      fuse: allow filesystems to have precise control over data cache · ad2ba64d
      Kirill Smelkov 提交于
      On networked filesystems file data can be changed externally.  FUSE
      provides notification messages for filesystem to inform kernel that
      metadata or data region of a file needs to be invalidated in local page
      cache. That provides the basis for filesystem implementations to invalidate
      kernel cache explicitly based on observed filesystem-specific events.
      
      FUSE has also "automatic" invalidation mode(*) when the kernel
      automatically invalidates data cache of a file if it sees mtime change.  It
      also automatically invalidates whole data cache of a file if it sees file
      size being changed.
      
      The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA.
      However, due to probably historical reason, that capability controls only
      whether mtime change should be resulting in automatic invalidation or
      not. A change in file size always results in invalidating whole data cache
      of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+).
      
      The filesystem I write[1] represents data arrays stored in networked
      database as local files suitable for mmap. It is read-only filesystem -
      changes to data are committed externally via database interfaces and the
      filesystem only glues data into contiguous file streams suitable for mmap
      and traditional array processing. The files are big - starting from
      hundreds gigabytes and more. The files change regularly, and frequently by
      data being appended to their end. The size of files thus changes
      frequently.
      
      If a file was accessed locally and some part of its data got into page
      cache, we want that data to stay cached unless there is memory pressure, or
      unless corresponding part of the file was actually changed. However current
      FUSE behaviour - when it sees file size change - is to invalidate the whole
      file. The data cache of the file is thus completely lost even on small size
      change, and despite that the filesystem server is careful to accurately
      translate database changes into FUSE invalidation messages to kernel.
      
      Let's fix it: if a filesystem, through new FUSE_EXPLICIT_INVAL_DATA
      capability, indicates to kernel that it is fully responsible for data cache
      invalidation, then the kernel won't invalidate files data cache on size
      change and only truncate that cache to new size in case the size decreased.
      
      (*) see 72d0d248 "fuse: add FUSE_AUTO_INVAL_DATA init flag",
      eed2179e "fuse: invalidate inode mapping if mtime changes"
      
      (+) in writeback mode the kernel does not invalidate data cache on file
      size change, but neither it allows the filesystem to set the size due to
      external event (see 8373200b "fuse: Trust kernel i_size only")
      
      [1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      ad2ba64d
    • K
      fuse: convert printk -> pr_* · f2294482
      Kirill Smelkov 提交于
      Functions, like pr_err, are a more modern variant of printing compared to
      printk. They could be used to denoise sources by using needed level in
      the print function name, and by automatically inserting per-driver /
      function / ... print prefix as defined by pr_fmt macro. pr_* are also
      said to be used in Documentation/process/coding-style.rst and more
      recent code - for example overlayfs - uses them instead of printk.
      
      Convert CUSE and FUSE to use the new pr_* functions.
      
      CUSE output stays completely unchanged, while FUSE output is amended a
      bit for "trying to steal weird page" warning - the second line now comes
      also with "fuse:" prefix. I hope it is ok.
      Suggested-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f2294482
  3. 13 2月, 2019 7 次提交
  4. 12 12月, 2018 1 次提交
  5. 03 12月, 2018 1 次提交
  6. 15 10月, 2018 2 次提交
    • D
      fuse: enable caching of symlinks · 5571f1e6
      Dan Schatzberg 提交于
      FUSE file reads are cached in the page cache, but symlink reads are
      not. This patch enables FUSE READLINK operations to be cached which
      can improve performance of some FUSE workloads.
      
      In particular, I'm working on a FUSE filesystem for access to source
      code and discovered that about a 10% improvement to build times is
      achieved with this patch (there are a lot of symlinks in the source
      tree).
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5571f1e6
    • M
      fuse: allow fine grained attr cache invaldation · 2f1e8196
      Miklos Szeredi 提交于
      This patch adds the infrastructure for more fine grained attribute
      invalidation.  Currently only 'atime' is invalidated separately.
      
      The use of this infrastructure is extended to the statx(2) interface, which
      for now means that if only 'atime' is invalid and STATX_ATIME is not
      specified in the mask argument, then no GETATTR request will be generated.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2f1e8196
  7. 01 10月, 2018 8 次提交
    • M
      fuse: realloc page array · e52a8250
      Miklos Szeredi 提交于
      Writeback caching currently allocates requests with the maximum number of
      possible pages, while the actual number of pages per request depends on a
      couple of factors that cannot be determined when the request is allocated
      (whether page is already under writeback, whether page is contiguous with
      previous pages already added to a request).
      
      This patch allows such requests to start with no page allocation (all pages
      inline) and grow the page array on demand.
      
      If the max_pages tunable remains the default value, then this will mean
      just one allocation that is the same size as before.  If the tunable is
      larger, then this adds at most 3 additional memory allocations (which is
      generously compensated by the improved performance from the larger
      request).
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e52a8250
    • C
      fuse: add max_pages to init_out · 5da784cc
      Constantine Shulyupin 提交于
      Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to
      improve performance.
      
      Old RFC with detailed description of the problem and many fixes by Mitsuo
      Hayasaka (mitsuo.hayasaka.hu@hitachi.com):
       - https://lkml.org/lkml/2012/7/5/136
      
      We've encountered performance degradation and fixed it on a big and complex
      virtual environment.
      
      Environment to reproduce degradation and improvement:
      
      1. Add lag to user mode FUSE
      Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in
      passthrough_fh.c
      
      2. patch UM fuse with configurable max_pages parameter. The patch will be
      provided latter.
      
      3. run test script and perform test on tmpfs
      fuse_test()
      {
      
             cd /tmp
             mkdir -p fusemnt
             passthrough_fh -o max_pages=$1 /tmp/fusemnt
             grep fuse /proc/self/mounts
             dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \
      		count=1K bs=1M 2>&1 | grep -v records
             rm fusemnt/tmp/tmp
             killall passthrough_fh
      }
      
      Test results:
      
      passthrough_fh /tmp/fusemnt fuse.passthrough_fh \
      	rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0
      1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s
      
      passthrough_fh /tmp/fusemnt fuse.passthrough_fh \
      	rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0
      1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s
      
      Obviously with bigger lag the difference between 'before' and 'after'
      will be more significant.
      
      Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136),
      observed improvement from 400-550 to 520-740.
      Signed-off-by: NConstantine Shulyupin <const@MakeLinux.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5da784cc
    • M
      fuse: reduce size of struct fuse_inode · ab2257e9
      Miklos Szeredi 提交于
      Do this by grouping fields used for cached writes and putting them into a
      union with fileds used for cached readdir (with obviously no overlap, since
      we don't have hybrid objects).
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      ab2257e9
    • M
      fuse: use iversion for readdir cache verification · 261aaba7
      Miklos Szeredi 提交于
      Use the internal iversion counter to make sure modifications of the
      directory through this filesystem are not missed by the mtime check (due to
      mtime granularity).
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      261aaba7
    • M
      fuse: use mtime for readdir cache verification · 7118883b
      Miklos Szeredi 提交于
      Store the modification time of the directory in the cache, obtained before
      starting to fill the cache.
      
      When reading the cache, verify that the directory hasn't changed, by
      checking if current modification time is the same as the one stored in the
      cache.
      
      This only needs to be done when the current file position is at the
      beginning of the directory, as mandated by POSIX.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7118883b
    • M
      fuse: add readdir cache version · 3494927e
      Miklos Szeredi 提交于
      Allow the cache to be invalidated when page(s) have gone missing.  In this
      case increment the version of the cache and reset to an empty state.
      
      Add a version number to the directory stream in struct fuse_file as well,
      indicating the version of the cache it's supposed to be reading.  If the
      cache version doesn't match the stream's version, then reset the stream to
      the beginning of the cache.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3494927e
    • M
      fuse: allow using readdir cache · 5d7bc7e8
      Miklos Szeredi 提交于
      The cache is only used if it's completed, not while it's still being
      filled; this constraint could be lifted later, if it turns out to be
      useful.
      
      Introduce state in struct fuse_file that indicates the position within the
      cache.  After a seek, reset the position to the beginning of the cache and
      search the cache for the current position.  If the current position is not
      found in the cache, then fall back to uncached readdir.
      
      It can also happen that page(s) disappear from the cache, in which case we
      must also fall back to uncached readdir.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5d7bc7e8
    • M
      fuse: allow caching readdir · 69e34551
      Miklos Szeredi 提交于
      This patch just adds the cache filling functions, which are invoked if
      FOPEN_CACHE_DIR flag is set in the OPENDIR reply.
      
      Cache reading and cache invalidation are added by subsequent patches.
      
      The directory cache uses the page cache.  Directory entries are packed into
      a page in the same format as in the READDIR reply.  A page only contains
      whole entries, the space at the end of the page is cleared.  The page is
      locked while being modified.
      
      Multiple parallel readdirs on the same directory can fill the cache; the
      only constraint is that continuity must be maintained (d_off of last entry
      points to position of current entry).
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      69e34551
  8. 28 9月, 2018 6 次提交
  9. 26 7月, 2018 1 次提交