1. 01 10月, 2013 8 次提交
    • M
      fuse: writepages: handle same page rewrites · 8b284dc4
      Miklos Szeredi 提交于
      As Maxim Patlasov pointed out, it's possible to get a dirty page while it's
      copy is still under writeback, despite fuse_page_mkwrite() doing its thing
      (direct IO).
      
      This could result in two concurrent write request for the same offset, with
      data corruption if they get mixed up.
      
      To prevent this, fuse needs to check and delay such writes.  This
      implementation does this by:
      
       1. check if page is still under writeout, if so create a new, single page
          secondary request for it
      
       2. chain this secondary request onto the in-flight request
      
       2/a. if a seconday request for the same offset was already chained to the
          in-flight request, then just copy the contents of the page and discard
          the new secondary request.  This makes sure that for each page will
          have at most two requests associated with it
      
       3. when the in-flight request finished, send off all secondary requests
          chained onto it
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      8b284dc4
    • M
      fuse: writepages: fix aggregation · 1e112a48
      Miklos Szeredi 提交于
      Checking against tmp-page indexes is not very useful, and results in one
      (or rarely two) page requests.  Which is not much of an improvement...
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      1e112a48
    • M
      fuse: fix race in fuse_writepages() · 2d033eaa
      Maxim Patlasov 提交于
      The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
      
      1) An user makes a page dirty via mmap-ed write.
      2) The user performs shrinking truncate(2) intended to purge the page.
      3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
         writeback. fuse_writepages_fill attaches a new page to FUSE_WRITE request,
         then releases the original page by end_page_writeback and unlock it.
      4) fuse_do_setattr completes and successfully returns. Since now, i_mutex
         is free.
      5) Ordinary write(2) extends i_size back to cover the page. Note that
         fuse_send_write_pages do wait for fuse writeback, but for another
         page->index.
      6) fuse_writepages_fill attaches more pages to the request (if any), then
         fuse_writepages_send is eventually called. It is supposed to crop
         inarg->size of the request, but it doesn't because i_size has already been
         extended back.
      
      Moving end_page_writeback behind fuse_writepages_send guarantees that
      __fuse_release_nowrite (called from fuse_do_setattr) will crop inarg->size
      of the request before write(2) gets the chance to extend i_size.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      2d033eaa
    • P
      fuse: Implement writepages callback · 26d614df
      Pavel Emelyanov 提交于
      The .writepages one is required to make each writeback request carry more than
      one page on it. The patch enables optimized behaviour unconditionally,
      i.e. mmap-ed writes will benefit from the patch even if fc->writeback_cache=0.
      
      [SzM: simplify, add comments]
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      26d614df
    • M
      fuse: don't BUG on no write file · 72523425
      Miklos Szeredi 提交于
      Don't bug if there's no writable files found for page writeback.  If ever
      this is triggered, a WARN_ON helps debugging it much better then a BUG_ON.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      72523425
    • M
      fuse: lock page in mkwrite · cca24370
      Miklos Szeredi 提交于
      Lock the page in fuse_page_mkwrite() to protect against a race with
      fuse_writepage() where the page is redirtied before the actual writeback
      begins.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      cca24370
    • P
      fuse: Prepare to handle multiple pages in writeback · 385b1268
      Pavel Emelyanov 提交于
      The .writepages callback will issue writeback requests with more than one
      page aboard. Make existing end/check code be aware of this.
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      385b1268
    • P
      fuse: Getting file for writeback helper · adcadfa8
      Pavel Emelyanov 提交于
      There will be a .writepageS callback implementation which will need to
      get a fuse_file out of a fuse_inode, thus make a helper for this.
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      adcadfa8
  2. 18 9月, 2013 2 次提交
    • M
      fuse: fix fallocate vs. ftruncate race · 0ab08f57
      Maxim Patlasov 提交于
      A former patch introducing FUSE_I_SIZE_UNSTABLE flag provided detailed
      description of races between ftruncate and anyone who can extend i_size:
      
      > 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size
      > changed (due to our own fuse_do_setattr()) and is going to call
      > truncate_pagecache() for some  'new_size' it believes valid right now. But by
      > the time that particular truncate_pagecache() is called ...
      > 2. fuse_do_setattr() returns (either having called truncate_pagecache() or
      > not -- it doesn't matter).
      > 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
      > 4. mmap-ed write makes a page in the extended region dirty.
      
      This patch adds necessary bits to fuse_file_fallocate() to protect from that
      race.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      0ab08f57
    • M
      fuse: wait for writeback in fuse_file_fallocate() · bde52788
      Maxim Patlasov 提交于
      The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):
      
      1) An user makes a page dirty via mmap-ed write.
      2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
         and <offset, size> covering the page.
      3) Before truncate_pagecache_range call from fuse_file_fallocate,
         the page goes to write-back. The page is fully processed by fuse_writepage
         (including end_page_writeback on the page), but fuse_flush_writepages did
         nothing because fi->writectr < 0.
      4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
         by calling fuse_release_nowrite. The latter triggers processing queued
         write-back request which will write stale data to the hole soon.
      
      Changed in v2 (thanks to Brian for suggestion):
       - Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
         we can end up in returning -ENOTSUPP while user data is already punched
         from page cache. Use filemap_write_and_wait_range() instead.
      Changed in v3 (thanks to Miklos for suggestion):
       - fuse_wait_on_writeback() is prone to livelocks; use fuse_set_nowrite()
         instead. So far as we need a dirty-page barrier only, fuse_sync_writes()
         should be enough.
       - rebased to for-linus branch of fuse.git
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      bde52788
  3. 03 9月, 2013 2 次提交
    • M
      fuse: hotfix truncate_pagecache() issue · 06a7c3c2
      Maxim Patlasov 提交于
      The way how fuse calls truncate_pagecache() from fuse_change_attributes()
      is completely wrong. Because, w/o i_mutex held, we never sure whether
      'oldsize' and 'attr->size' are valid by the time of execution of
      truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
      released fc->lock in the middle of fuse_change_attributes(), we completely
      loose control of actions which may happen with given inode until we reach
      truncate_pagecache. The list of potentially dangerous actions includes
      mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.
      
      The typical outcome of doing truncate_pagecache() with outdated arguments
      is data corruption from user point of view. This is (in some sense)
      acceptable in cases when the issue is triggered by a change of the file on
      the server (i.e. externally wrt fuse operation), but it is absolutely
      intolerable in scenarios when a single fuse client modifies a file without
      any external intervention. A real life case I discovered by fsx-linux
      looked like this:
      
      1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
      FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
      2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
      to the server synchronously, then calls fuse_change_attributes(). The
      latter updates i_size, releases fc->lock, but before comparing oldsize vs
      attr->size..
      3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
      updating attributes and i_size, but now oldsize is equal to
      outarg.attr.size because i_size has just been updated (step 2). Hence,
      fuse_do_setattr() returns w/o calling truncate_pagecache().
      4. As soon as ftruncate(2) completes, the user extends file size by
      write(2) making a hole in the middle of file, then reads data from the hole
      either by read(2) or mmap-ed read. The user expects to get zero data from
      the hole, but gets stale data because truncate_pagecache() is not executed
      yet.
      
      The scenario above illustrates one side of the problem: not truncating the
      page cache even though we should. Another side corresponds to truncating
      page cache too late, when the state of inode changed significantly.
      Theoretically, the following is possible:
      
      1. As in the previous scenario fuse_dentry_revalidate() discovered that
      i_size changed (due to our own fuse_do_setattr()) and is going to call
      truncate_pagecache() for some 'new_size' it believes valid right now. But
      by the time that particular truncate_pagecache() is called ...
      2. fuse_do_setattr() returns (either having called truncate_pagecache() or
      not -- it doesn't matter).
      3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
      4. mmap-ed write makes a page in the extended region dirty.
      
      The result will be the lost of data user wrote on the fourth step.
      
      The patch is a hotfix resolving the issue in a simplistic way: let's skip
      dangerous i_size update and truncate_pagecache if an operation changing
      file size is in progress. This simplistic approach looks correct for the
      cases w/o external changes. And to handle them properly, more sophisticated
      and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
      postpone it until the issue is well discussed on the mailing list(s).
      
      Changed in v2:
       - improved patch description to cover both sides of the issue.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      06a7c3c2
    • M
      fuse: postpone end_page_writeback() in fuse_writepage_locked() · 4a4ac4eb
      Maxim Patlasov 提交于
      The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
      
      1) An user makes a page dirty via mmap-ed write.
      2) The user performs shrinking truncate(2) intended to purge the page.
      3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
         writeback. fuse_writepage_locked fills FUSE_WRITE request and releases
         the original page by end_page_writeback.
      4) fuse_do_setattr() completes and successfully returns. Since now, i_mutex
         is free.
      5) Ordinary write(2) extends i_size back to cover the page. Note that
         fuse_send_write_pages do wait for fuse writeback, but for another
         page->index.
      6) fuse_writepage_locked proceeds by queueing FUSE_WRITE request.
         fuse_send_writepage is supposed to crop inarg->size of the request,
         but it doesn't because i_size has already been extended back.
      
      Moving end_page_writeback to the end of fuse_writepage_locked fixes the
      race because now the fact that truncate_pagecache is successfully returned
      infers that fuse_writepage_locked has already called end_page_writeback.
      And this, in turn, infers that fuse_flush_writepages has already called
      fuse_send_writepage, and the latter used valid (shrunk) i_size. write(2)
      could not extend it because of i_mutex held by ftruncate(2).
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: stable@vger.kernel.org
      4a4ac4eb
  4. 29 6月, 2013 1 次提交
  5. 18 6月, 2013 1 次提交
    • M
      fuse: hold i_mutex in fuse_file_fallocate() · 14c14414
      Maxim Patlasov 提交于
      Changing size of a file on server and local update (fuse_write_update_size)
      should be always protected by inode->i_mutex. Otherwise a race like this is
      possible:
      
      1. Process 'A' calls fallocate(2) to extend file (~FALLOC_FL_KEEP_SIZE).
      fuse_file_fallocate() sends FUSE_FALLOCATE request to the server.
      2. Process 'B' calls ftruncate(2) shrinking the file. fuse_do_setattr()
      sends shrinking FUSE_SETATTR request to the server and updates local i_size
      by i_size_write(inode, outarg.attr.size).
      3. Process 'A' resumes execution of fuse_file_fallocate() and calls
      fuse_write_update_size(inode, offset + length). But 'offset + length' was
      obsoleted by ftruncate from previous step.
      
      Changed in v2 (thanks Brian and Anand for suggestions):
       - made relation between mutex_lock() and fuse_set_nowrite(inode) more
         explicit and clear.
       - updated patch description to use ftruncate(2) in example
      Signed-off-by: NMaxim V. Patlasov <MPatlasov@parallels.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      14c14414
  6. 03 6月, 2013 2 次提交
    • M
      fuse: fix alignment in short read optimization for async_dio · e5c5f05d
      Maxim Patlasov 提交于
      The bug was introduced with async_dio feature: trying to optimize short reads,
      we cut number-of-bytes-to-read to i_size boundary. Hence the following example:
      
      	truncate --size=300 /mnt/file
      	dd if=/mnt/file of=/dev/null iflag=direct
      
      led to FUSE_READ request of 300 bytes size. This turned out to be problem
      for userspace fuse implementations who rely on assumption that kernel fuse
      does not change alignment of request from client FS.
      
      The patch turns off the optimization if async_dio is disabled. And, if it's
      enabled, the patch fixes adjustment of number-of-bytes-to-read to preserve
      alignment.
      
      Note, that we cannot throw out short read optimization entirely because
      otherwise a direct read of a huge size issued on a tiny file would generate
      a huge amount of fuse requests and most of them would be ACKed by userspace
      with zero bytes read.
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      e5c5f05d
    • B
      fuse: return -EIOCBQUEUED from fuse_direct_IO() for all async requests · c9ecf989
      Brian Foster 提交于
      If request submission fails for an async request (i.e.,
      get_user_pages() returns -ERESTARTSYS), we currently skip the
      -EIOCBQUEUED return and drop into wait_for_sync_kiocb() forever.
      
      Avoid this by always returning -EIOCBQUEUED for async requests. If
      an error occurs, the error is passed into fuse_aio_complete(),
      returned via aio_complete() and thus propagated to userspace via
      io_getevents().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      c9ecf989
  7. 20 5月, 2013 2 次提交
    • B
      fuse: update inode size and invalidate attributes on fallocate · bee6c307
      Brian Foster 提交于
      An fallocate request without FALLOC_FL_KEEP_SIZE set can extend the
      size of a file. Update the inode size after a successful fallocate.
      
      Also invalidate the inode attributes after a successful fallocate
      to ensure we pick up the latest attribute values (i.e., i_blocks).
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      bee6c307
    • B
      fuse: truncate pagecache range on hole punch · 3634a632
      Brian Foster 提交于
      fuse supports hole punch via the fallocate() FALLOC_FL_PUNCH_HOLE
      interface. When a hole punch is passed through, the page cache
      is not cleared and thus allows reading stale data from the cache.
      
      This is easily demonstrable (using FOPEN_KEEP_CACHE) by reading a
      smallish random data file into cache, punching a hole and creating
      a copy of the file. Drop caches or remount and observe that the
      original file no longer matches the file copied after the hole
      punch. The original file contains a zeroed range and the latter
      file contains stale data.
      
      Protect against writepage requests in progress and punch out the
      associated page cache range after a successful client fs hole
      punch.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      3634a632
  8. 15 5月, 2013 1 次提交
  9. 08 5月, 2013 1 次提交
  10. 01 5月, 2013 1 次提交
  11. 18 4月, 2013 6 次提交
    • M
      fuse: truncate file if async dio failed · efb9fa9e
      Maxim Patlasov 提交于
      The patch improves error handling in fuse_direct_IO(): if we successfully
      submitted several fuse requests on behalf of synchronous direct write
      extending file and some of them failed, let's try to do our best to clean-up.
      
      Changed in v2: reuse fuse_do_setattr(). Thanks to Brian for suggestion.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      efb9fa9e
    • M
      fuse: optimize short direct reads · 439ee5f0
      Maxim Patlasov 提交于
      If user requested direct read beyond EOF, we can skip sending fuse requests
      for positions beyond EOF because userspace would ACK them with zero bytes read
      anyway. We can trust to i_size in fuse_direct_IO for such cases because it's
      called from fuse_file_aio_read() and the latter updates fuse attributes
      including i_size.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      439ee5f0
    • M
      fuse: enable asynchronous processing direct IO · bcba24cc
      Maxim Patlasov 提交于
      In case of synchronous DIO request (i.e. read(2) or write(2) for a file
      opened with O_DIRECT), the patch submits fuse requests asynchronously, but
      waits for their completions before return from fuse_direct_IO().
      
      In case of asynchronous DIO request (i.e. libaio io_submit() or a file opened
      with O_DIRECT), the patch submits fuse requests asynchronously and return
      -EIOCBQUEUED immediately.
      
      The only special case is async DIO extending file. Here the patch falls back
      to old behaviour because we can't return -EIOCBQUEUED and update i_size later,
      without i_mutex hold. And we have no method to wait on real async I/O
      requests.
      
      The patch also clean __fuse_direct_write() up: it's better to update i_size
      in its callers. Thanks Brian for suggestion.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      bcba24cc
    • M
      fuse: make fuse_direct_io() aware about AIO · 36cf66ed
      Maxim Patlasov 提交于
      The patch implements passing "struct fuse_io_priv *io" down the stack up to
      fuse_send_read/write where it is used to submit request asynchronously.
      io->async==0 designates synchronous processing.
      
      Non-trivial part of the patch is changes in fuse_direct_io(): resources
      like fuse requests and user pages cannot be released immediately in async
      case.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      36cf66ed
    • M
      fuse: add support of async IO · 01e9d11a
      Maxim Patlasov 提交于
      The patch implements a framework to process an IO request asynchronously. The
      idea is to associate several fuse requests with a single kiocb by means of
      fuse_io_priv structure. The structure plays the same role for FUSE as 'struct
      dio' for direct-io.c.
      
      The framework is supposed to be used like this:
       - someone (who wants to process an IO asynchronously) allocates fuse_io_priv
         and initializes it setting 'async' field to non-zero value.
       - as soon as fuse request is filled, it can be submitted (in non-blocking way)
         by fuse_async_req_send()
       - when all submitted requests are ACKed by userspace, io->reqs drops to zero
         triggering aio_complete()
      
      In case of IO initiated by libaio, aio_complete() will finish processing the
      same way as in case of dio_complete() calling aio_complete(). But the
      framework may be also used for internal FUSE use when initial IO request
      was synchronous (from user perspective), but it's beneficial to process it
      asynchronously. Then the caller should wait on kiocb explicitly and
      aio_complete() will wake the caller up.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      01e9d11a
    • M
      fuse: move fuse_release_user_pages() up · 187c5c36
      Maxim Patlasov 提交于
      fuse_release_user_pages() will be indirectly used by fuse_send_read/write
      in future patches.
      Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      187c5c36
  12. 17 4月, 2013 1 次提交
  13. 10 4月, 2013 1 次提交
  14. 28 2月, 2013 1 次提交
  15. 04 2月, 2013 1 次提交
  16. 01 2月, 2013 1 次提交
  17. 24 1月, 2013 8 次提交