1. 07 5月, 2014 1 次提交
  2. 28 4月, 2014 7 次提交
  3. 08 4月, 2014 1 次提交
  4. 02 4月, 2014 13 次提交
  5. 26 1月, 2014 1 次提交
    • S
      Fix race when checking i_size on direct i/o read · 9fe55eea
      Steven Whitehouse 提交于
      So far I've had one ACK for this, and no other comments. So I think it
      is probably time to send this via some suitable tree. I'm guessing that
      the vfs tree would be the most appropriate route, but not sure that
      there is one at the moment (don't see anything recent at kernel.org)
      so in that case I think -mm is the "back up plan". Al, please let me
      know if you will take this?
      
      Steve.
      
      ---------------------
      
      Following on from the "Re: [PATCH v3] vfs: fix a bug when we do some dio
      reads with append dio writes" thread on linux-fsdevel, this patch is my
      current version of the fix proposed as option (b) in that thread.
      
      Removing the i_size test from the direct i/o read path at vfs level
      means that filesystems now have to deal with requests which are beyond
      i_size themselves. These I've divided into three sets:
      
       a) Those with "no op" ->direct_IO (9p, cifs, ceph)
      These are obviously not going to be an issue
      
       b) Those with "home brew" ->direct_IO (nfs, fuse)
      I've been told that NFS should not have any problem with the larger
      i_size, however I've added an extra test to FUSE to duplicate the
      original behaviour just to be on the safe side.
      
       c) Those using __blockdev_direct_IO()
      These call through to ->get_block() which should deal with the EOF
      condition correctly. I've verified that with GFS2 and I believe that
      Zheng has verified it for ext4. I've also run the test on XFS and it
      passes both before and after this change.
      
      The part of the patch in filemap.c looks a lot larger than it really is
      - there are only two lines of real change. The rest is just indentation
      of the contained code.
      
      There remains a test of i_size though, which was added for btrfs. It
      doesn't cause the other filesystems a problem as the test is performed
      after ->direct_IO has been called. It is possible that there is a race
      that does matter to btrfs, however this patch doesn't change that, so
      its still an overall improvement.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-by: NZheng Liu <gnehzuil.liu@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Acked-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9fe55eea
  6. 23 1月, 2014 2 次提交
    • A
      fuse: support clients that don't implement 'open' · 7678ac50
      Andrew Gallagher 提交于
      open/release operations require userspace transitions to keep track
      of the open count and to perform any FS-specific setup.  However,
      for some purely read-only FSs which don't need to perform any setup
      at open/release time, we can avoid the performance overhead of
      calling into userspace for open/release calls.
      
      This patch adds the necessary support to the fuse kernel modules to prevent
      open/release operations from hitting in userspace. When the client returns
      ENOSYS, we avoid sending the subsequent release to userspace, and also
      remember this so that future opens also don't trigger a userspace
      operation.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      7678ac50
    • A
      fuse: don't invalidate attrs when not using atime · 451418fc
      Andrew Gallagher 提交于
      Various read operations (e.g. readlink, readdir) invalidate the cached
      attrs for atime changes.  This patch adds a new function
      'fuse_invalidate_atime', which checks for a read-only super block and
      avoids the attr invalidation in that case.
      Signed-off-by: NAndrew Gallagher <andrewjcg@fb.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      451418fc
  7. 05 11月, 2013 4 次提交
  8. 01 10月, 2013 9 次提交
    • M
      fuse: writepage: skip already in flight · ff17be08
      Miklos Szeredi 提交于
      If ->writepage() tries to write back a page whose copy is still in flight,
      then just skip by calling redirty_page_for_writepage().
      
      This is OK, since now ->writepage() should never be called for data
      integrity sync.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      ff17be08
    • M
      fuse: writepages: handle same page rewrites · 8b284dc4
      Miklos Szeredi 提交于
      As Maxim Patlasov pointed out, it's possible to get a dirty page while it's
      copy is still under writeback, despite fuse_page_mkwrite() doing its thing
      (direct IO).
      
      This could result in two concurrent write request for the same offset, with
      data corruption if they get mixed up.
      
      To prevent this, fuse needs to check and delay such writes.  This
      implementation does this by:
      
       1. check if page is still under writeout, if so create a new, single page
          secondary request for it
      
       2. chain this secondary request onto the in-flight request
      
       2/a. if a seconday request for the same offset was already chained to the
          in-flight request, then just copy the contents of the page and discard
          the new secondary request.  This makes sure that for each page will
          have at most two requests associated with it
      
       3. when the in-flight request finished, send off all secondary requests
          chained onto it
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      8b284dc4
    • M
      fuse: writepages: fix aggregation · 1e112a48
      Miklos Szeredi 提交于
      Checking against tmp-page indexes is not very useful, and results in one
      (or rarely two) page requests.  Which is not much of an improvement...
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      1e112a48
    • M
      fuse: fix race in fuse_writepages() · 2d033eaa
      Maxim Patlasov 提交于
      The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
      
      1) An user makes a page dirty via mmap-ed write.
      2) The user performs shrinking truncate(2) intended to purge the page.
      3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
         writeback. fuse_writepages_fill attaches a new page to FUSE_WRITE request,
         then releases the original page by end_page_writeback and unlock it.
      4) fuse_do_setattr completes and successfully returns. Since now, i_mutex
         is free.
      5) Ordinary write(2) extends i_size back to cover the page. Note that
         fuse_send_write_pages do wait for fuse writeback, but for another
         page->index.
      6) fuse_writepages_fill attaches more pages to the request (if any), then
         fuse_writepages_send is eventually called. It is supposed to crop
         inarg->size of the request, but it doesn't because i_size has already been
         extended back.
      
      Moving end_page_writeback behind fuse_writepages_send guarantees that
      __fuse_release_nowrite (called from fuse_do_setattr) will crop inarg->size
      of the request before write(2) gets the chance to extend i_size.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      2d033eaa
    • P
      fuse: Implement writepages callback · 26d614df
      Pavel Emelyanov 提交于
      The .writepages one is required to make each writeback request carry more than
      one page on it. The patch enables optimized behaviour unconditionally,
      i.e. mmap-ed writes will benefit from the patch even if fc->writeback_cache=0.
      
      [SzM: simplify, add comments]
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      26d614df
    • M
      fuse: don't BUG on no write file · 72523425
      Miklos Szeredi 提交于
      Don't bug if there's no writable files found for page writeback.  If ever
      this is triggered, a WARN_ON helps debugging it much better then a BUG_ON.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      72523425
    • M
      fuse: lock page in mkwrite · cca24370
      Miklos Szeredi 提交于
      Lock the page in fuse_page_mkwrite() to protect against a race with
      fuse_writepage() where the page is redirtied before the actual writeback
      begins.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      cca24370
    • P
      fuse: Prepare to handle multiple pages in writeback · 385b1268
      Pavel Emelyanov 提交于
      The .writepages callback will issue writeback requests with more than one
      page aboard. Make existing end/check code be aware of this.
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      385b1268
    • P
      fuse: Getting file for writeback helper · adcadfa8
      Pavel Emelyanov 提交于
      There will be a .writepageS callback implementation which will need to
      get a fuse_file out of a fuse_inode, thus make a helper for this.
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      adcadfa8
  9. 18 9月, 2013 2 次提交
    • M
      fuse: fix fallocate vs. ftruncate race · 0ab08f57
      Maxim Patlasov 提交于
      A former patch introducing FUSE_I_SIZE_UNSTABLE flag provided detailed
      description of races between ftruncate and anyone who can extend i_size:
      
      > 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size
      > changed (due to our own fuse_do_setattr()) and is going to call
      > truncate_pagecache() for some  'new_size' it believes valid right now. But by
      > the time that particular truncate_pagecache() is called ...
      > 2. fuse_do_setattr() returns (either having called truncate_pagecache() or
      > not -- it doesn't matter).
      > 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
      > 4. mmap-ed write makes a page in the extended region dirty.
      
      This patch adds necessary bits to fuse_file_fallocate() to protect from that
      race.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      0ab08f57
    • M
      fuse: wait for writeback in fuse_file_fallocate() · bde52788
      Maxim Patlasov 提交于
      The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):
      
      1) An user makes a page dirty via mmap-ed write.
      2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
         and <offset, size> covering the page.
      3) Before truncate_pagecache_range call from fuse_file_fallocate,
         the page goes to write-back. The page is fully processed by fuse_writepage
         (including end_page_writeback on the page), but fuse_flush_writepages did
         nothing because fi->writectr < 0.
      4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
         by calling fuse_release_nowrite. The latter triggers processing queued
         write-back request which will write stale data to the hole soon.
      
      Changed in v2 (thanks to Brian for suggestion):
       - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
         we can end up in returning -ENOTSUPP while user data is already punched
         from page cache. Use filemap_write_and_wait_range() instead.
      Changed in v3 (thanks to Miklos for suggestion):
       - fuse_wait_on_writeback() is prone to livelocks; use fuse_set_nowrite()
         instead. So far as we need a dirty-page barrier only, fuse_sync_writes()
         should be enough.
       - rebased to for-linus branch of fuse.git
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      bde52788