1. 24 11月, 2013 6 次提交
  2. 01 10月, 2013 2 次提交
  3. 28 9月, 2013 1 次提交
    • D
      FS-Cache: Provide the ability to enable/disable cookies · 94d30ae9
      David Howells 提交于
      Provide the ability to enable and disable fscache cookies.  A disabled cookie
      will reject or ignore further requests to:
      
      	Acquire a child cookie
      	Invalidate and update backing objects
      	Check the consistency of a backing object
      	Allocate storage for backing page
      	Read backing pages
      	Write to backing pages
      
      but still allows:
      
      	Checks/waits on the completion of already in-progress objects
      	Uncaching of pages
      	Relinquishment of cookies
      
      Two new operations are provided:
      
       (1) Disable a cookie:
      
      	void fscache_disable_cookie(struct fscache_cookie *cookie,
      				    bool invalidate);
      
           If the cookie is not already disabled, this locks the cookie against other
           dis/enablement ops, marks the cookie as being disabled, discards or
           invalidates any backing objects and waits for cessation of activity on any
           associated object.
      
           This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
           but it reinitialises the cookie such that it can be reenabled.
      
           All possible failures are handled internally.  The caller should consider
           calling fscache_uncache_all_inode_pages() afterwards to make sure all page
           markings are cleared up.
      
       (2) Enable a cookie:
      
      	void fscache_enable_cookie(struct fscache_cookie *cookie,
      				   bool (*can_enable)(void *data),
      				   void *data)
      
           If the cookie is not already enabled, this locks the cookie against other
           dis/enablement ops, invokes can_enable() and, if the cookie is not an
           index cookie, will begin the procedure of acquiring backing objects.
      
           The optional can_enable() function is passed the data argument and returns
           a ruling as to whether or not enablement should actually be permitted to
           begin.
      
           All possible failures are handled internally.  The cookie will only be
           marked as enabled if provisional backing objects are allocated.
      
      A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
      is then contingent on i_writecount <= 0.  can_enable() checks for a race
      between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
      handling and allows us to get rid of open(O_RDONLY) accidentally introducing
      caching to an inode that's open for writing already.
      
      One operation has its API modified:
      
       (3) Acquire a cookie.
      
      	struct fscache_cookie *fscache_acquire_cookie(
      		struct fscache_cookie *parent,
      		const struct fscache_cookie_def *def,
      		void *netfs_data,
      		bool enable);
      
           This now has an additional argument that indicates whether the requested
           cookie should be enabled by default.  It doesn't need the can_enable()
           function because the caller must prevent multiple calls for the same netfs
           object and it doesn't need to take the enablement lock because no one else
           can get at the cookie before this returns.
      
      Signed-off-by: David Howells <dhowells@redhat.com
      94d30ae9
  4. 26 9月, 2013 1 次提交
    • M
      ceph: hung on ceph fscache invalidate in some cases · ffc79664
      Milosz Tanski 提交于
      In some cases I'm on my ceph client cluster I'm seeing hunk kernel tasks in
      the invalidate page code path. This is due to the fact that we don't check if
      the page is marked as cache before calling fscache_wait_on_page_write().
      
      This is the log from the hang
      
      INFO: task XXXXXX:12034 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       ...
      Call Trace:
      [<ffffffff81568d09>] schedule+0x29/0x70
      [<ffffffffa01d4cbd>] __fscache_wait_on_page_write+0x6d/0xb0 [fscache]
      [<ffffffff81083520>] ? add_wait_queue+0x60/0x60
      [<ffffffffa029a3e9>] ceph_invalidate_fscache_page+0x29/0x50 [ceph]
      [<ffffffffa027df00>] ceph_invalidatepage+0x70/0x190 [ceph]
      [<ffffffff8112656f>] ? delete_from_page_cache+0x5f/0x70
      [<ffffffff81133cab>] truncate_inode_page+0x8b/0x90
      [<ffffffff81133ded>] truncate_inode_pages_range.part.12+0x13d/0x620
      [<ffffffff8113431d>] truncate_inode_pages_range+0x4d/0x60
      [<ffffffff811343b5>] truncate_inode_pages+0x15/0x20
      [<ffffffff8119bbf6>] evict+0x1a6/0x1b0
      [<ffffffff8119c3f3>] iput+0x103/0x190
       ...
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ffc79664
  5. 07 9月, 2013 8 次提交
  6. 28 8月, 2013 5 次提交
    • S
      ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem · 7d6e1f54
      Sha Zhengju 提交于
      Following we will begin to add memcg dirty page accounting around
      __set_page_dirty_{buffers,nobuffers} in vfs layer, so we'd better use vfs interface to
      avoid exporting those details to filesystems.
      
      Since vfs set_page_dirty() should be called under page lock, here we don't need elaborate
      codes to handle racy anymore, and two WARN_ON() are added to detect such exceptions.
      Thanks very much for Sage and Yan Zheng's coaching!
      
      I tested it in a two server's ceph environment that one is client and the other is
      mds/osd/mon, and run the following fsx test from xfstests:
      
        ./fsx   1MB -N 50000 -p 10000 -l 1048576
        ./fsx  10MB -N 50000 -p 10000 -l 10485760
        ./fsx 100MB -N 50000 -p 10000 -l 104857600
      
      The fsx does lots of mmap-read/mmap-write/truncate operations and the tests completed
      successfully without triggering any of WARN_ON.
      Signed-off-by: NSha Zhengju <handai.szj@taobao.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      7d6e1f54
    • M
      ceph: allow sync_read/write return partial successed size of read/write. · ee7289bf
      majianpeng 提交于
      For sync_read/write, it may do multi stripe operations.If one of those
      met erro, we return the former successed size rather than a error value.
      There is a exception for write-operation met -EOLDSNAPC.If this occur,we
      retry the whole write again.
      Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
      ee7289bf
    • M
      ceph: fix bugs about handling short-read for sync read mode. · 02ae66d8
      majianpeng 提交于
      cephfs . show_layout
      >layyout.data_pool:     0
      >layout.object_size:   4194304
      >layout.stripe_unit:   4194304
      >layout.stripe_count:  1
      
      TestA:
      >dd if=/dev/urandom of=test bs=1M count=2 oflag=direct
      >dd if=/dev/urandom of=test bs=1M count=2 seek=4  oflag=direct
      >dd if=test of=/dev/null bs=6M count=1 iflag=direct
      The messages from func striped_read are:
      ceph:           file.c:350  : striped_read 0~6291456 (read 0) got 2097152 HITSTRIPE SHORT
      ceph:           file.c:350  : striped_read 2097152~4194304 (read 2097152) got 0 HITSTRIPE SHORT
      ceph:           file.c:381  : zero tail 4194304
      ceph:           file.c:390  : striped_read returns 6291456
      The hole of file is from 2M--4M.But actualy it zero the last 4M include
      the last 2M area which isn't a hole.
      Using this patch, the messages are:
      ceph:           file.c:350  : striped_read 0~6291456 (read 0) got 2097152 HITSTRIPE SHORT
      ceph:           file.c:358  :  zero gap 2097152 to 4194304
      ceph:           file.c:350  : striped_read 4194304~2097152 (read 4194304) got 2097152
      ceph:           file.c:384  : striped_read returns 6291456
      
      TestB:
      >echo majianpeng > test
      >dd if=test of=/dev/null bs=2M count=1 iflag=direct
      The messages are:
      ceph:           file.c:350  : striped_read 0~6291456 (read 0) got 11 HITSTRIPE SHORT
      ceph:           file.c:350  : striped_read 11~6291445 (read 11) got 0 HITSTRIPE SHORT
      ceph:           file.c:390  : striped_read returns 11
      For this case,it did once more striped_read.It's no meaningless.
      Using this patch, the message are:
      ceph:           file.c:350  : striped_read 0~6291456 (read 0) got 11 HITSTRIPE SHORT
      ceph:           file.c:384  : striped_read returns 11
      
      Big thanks to Yan Zheng for the patch.
      Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
      02ae66d8
    • L
      ceph: remove useless variable revoked_rdcache · e9075743
      Li Wang 提交于
      Cleanup in handle_cap_grant().
      Signed-off-by: NLi Wang <liwang@ubuntukylin.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      e9075743
    • S
      ceph: fix fallocate division · b314a90d
      Sage Weil 提交于
      We need to use do_div to divide by a 64-bit value.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      b314a90d
  7. 16 8月, 2013 4 次提交
    • L
      ceph: punch hole support · ad7a60de
      Li Wang 提交于
      This patch implements fallocate and punch hole support for Ceph kernel client.
      Signed-off-by: NLi Wang <liwang@ubuntukylin.com>
      Signed-off-by: NYunchuan Wen <yunchuanwen@ubuntukylin.com>
      ad7a60de
    • Y
      ceph: fix request max size · 3871cbb9
      Yan, Zheng 提交于
      ceph_check_caps() requests new max size only when there is Fw cap.
      If we call check_max_size() while there is no Fw cap. It updates
      i_wanted_max_size and calls ceph_check_caps(), but ceph_check_caps()
      does nothing. Later when Fw cap is issued, we call check_max_size()
      again. But i_wanted_max_size is equal to 'endoff' at this time, so
      check_max_size() doesn't call ceph_check_caps() and we end up with
      waiting for the new max size forever.
      
      The fix is duplicate ceph_check_caps()'s "request max size" code in
      check_max_size(), and make try_get_cap_refs() wait for the Fw cap
      before retry requesting new max size.
      
      This patch also removes the "endoff > (inode->i_size << 1)" check
      in check_max_size(). It's useless because there is no corresponding
      logic in ceph_check_caps().
      Reviewed-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      3871cbb9
    • Y
      ceph: introduce i_truncate_mutex · b0d7c223
      Yan, Zheng 提交于
      I encountered below deadlock when running fsstress
      
      wmtruncate work      truncate                 MDS
      ---------------  ------------------  --------------------------
                         lock i_mutex
                                            <- truncate file
      lock i_mutex (blocked)
                                            <- revoking Fcb (filelock to MIX)
                         send request ->
                                               handle request (xlock filelock)
      
      At the initial time, there are some dirty pages in the page cache.
      When the kclient receives the truncate message, it reduces inode size
      and creates some 'out of i_size' dirty pages. wmtruncate work can't
      truncate these dirty pages because it's blocked by the i_mutex. Later
      when the kclient receives the cap message that revokes Fcb caps, It
      can't flush all dirty pages because writepages() only flushes dirty
      pages within the inode size.
      
      When the MDS handles the 'truncate' request from kclient, it waits
      for the filelock to become stable. But the filelock is stuck in
      unstable state because it can't finish revoking kclient's Fcb caps.
      
      The truncate pagecache locking has already caused lots of trouble
      for use. I think it's time simplify it by introducing a new mutex.
      We use the new mutex to prevent concurrent truncate_inode_pages().
      There is no need to worry about race between buffered write and
      truncate_inode_pages(), because our "get caps" mechanism prevents
      them from concurrent execution.
      Reviewed-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      b0d7c223
    • M
      ceph: cleanup the logic in ceph_invalidatepage · b150f5c1
      Milosz Tanski 提交于
      The invalidatepage code bails if it encounters a non-zero page offset. The
      current logic that does is non-obvious with multiple if statements.
      
      This should be logically and functionally equivalent.
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      b150f5c1
  8. 10 8月, 2013 12 次提交
  9. 05 7月, 2013 1 次提交