1. 07 9月, 2013 1 次提交
  2. 28 8月, 2013 1 次提交
  3. 16 8月, 2013 2 次提交
    • Y
      ceph: fix request max size · 3871cbb9
      Yan, Zheng 提交于
      ceph_check_caps() requests new max size only when there is Fw cap.
      If we call check_max_size() while there is no Fw cap. It updates
      i_wanted_max_size and calls ceph_check_caps(), but ceph_check_caps()
      does nothing. Later when Fw cap is issued, we call check_max_size()
      again. But i_wanted_max_size is equal to 'endoff' at this time, so
      check_max_size() doesn't call ceph_check_caps() and we end up with
      waiting for the new max size forever.
      
      The fix is duplicate ceph_check_caps()'s "request max size" code in
      check_max_size(), and make try_get_cap_refs() wait for the Fw cap
      before retry requesting new max size.
      
      This patch also removes the "endoff > (inode->i_size << 1)" check
      in check_max_size(). It's useless because there is no corresponding
      logic in ceph_check_caps().
      Reviewed-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      3871cbb9
    • Y
      ceph: introduce i_truncate_mutex · b0d7c223
      Yan, Zheng 提交于
      I encountered below deadlock when running fsstress
      
      wmtruncate work      truncate                 MDS
      ---------------  ------------------  --------------------------
                         lock i_mutex
                                            <- truncate file
      lock i_mutex (blocked)
                                            <- revoking Fcb (filelock to MIX)
                         send request ->
                                               handle request (xlock filelock)
      
      At the initial time, there are some dirty pages in the page cache.
      When the kclient receives the truncate message, it reduces inode size
      and creates some 'out of i_size' dirty pages. wmtruncate work can't
      truncate these dirty pages because it's blocked by the i_mutex. Later
      when the kclient receives the cap message that revokes Fcb caps, It
      can't flush all dirty pages because writepages() only flushes dirty
      pages within the inode size.
      
      When the MDS handles the 'truncate' request from kclient, it waits
      for the filelock to become stable. But the filelock is stuck in
      unstable state because it can't finish revoking kclient's Fcb caps.
      
      The truncate pagecache locking has already caused lots of trouble
      for use. I think it's time simplify it by introducing a new mutex.
      We use the new mutex to prevent concurrent truncate_inode_pages().
      There is no need to worry about race between buffered write and
      truncate_inode_pages(), because our "get caps" mechanism prevents
      them from concurrent execution.
      Reviewed-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      b0d7c223
  4. 10 8月, 2013 1 次提交
  5. 04 7月, 2013 7 次提交
  6. 02 5月, 2013 5 次提交
  7. 12 2月, 2013 2 次提交
    • E
      ceph: Convert kuids and kgids before printing them. · bd2bae6a
      Eric W. Biederman 提交于
      Before printing kuid and kgids values convert them into
      the initial user namespace.
      
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      bd2bae6a
    • E
      ceph: Translate between uid and gids in cap messages and kuids and kgids · 05cb11c1
      Eric W. Biederman 提交于
      - Make the uid and gid arguments of send_cap_msg() used to compose
        ceph_mds_caps messages of type kuid_t and kgid_t.
      
      - Pass inode->i_uid and inode->i_gid in __send_cap to send_cap_msg()
        through variables of type kuid_t and kgid_t.
      
      - Modify struct ceph_cap_snap to store uids and gids in types kuid_t
        and kgid_t.  This allows capturing inode->i_uid and inode->i_gid in
        ceph_queue_cap_snap() without loss and pssing them to
        __ceph_flush_snaps() where they are removed from struct
        ceph_cap_snap and passed to send_cap_msg().
      
      - In handle_cap_grant translate uid and gids in the initial user
        namespace stored in struct ceph_mds_cap into kuids and kgids
        before setting inode->i_uid and inode->i_gid.
      
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      05cb11c1
  8. 18 1月, 2013 4 次提交
  9. 13 12月, 2012 3 次提交
  10. 04 11月, 2012 1 次提交
  11. 02 10月, 2012 1 次提交
  12. 03 2月, 2012 1 次提交
    • A
      ceph: create a new session lock to avoid lock inversion · d8fb02ab
      Alex Elder 提交于
      Lockdep was reporting a possible circular lock dependency in
      dentry_lease_is_valid().  That function needs to sample the
      session's s_cap_gen and and s_cap_ttl fields coherently, but needs
      to do so while holding a dentry lock.  The s_cap_lock field was
      being used to protect the two fields, but that can't be taken while
      holding a lock on a dentry within the session.
      
      In most cases, the s_cap_gen and s_cap_ttl fields only get operated
      on separately.  But in three cases they need to be updated together.
      Implement a new lock to protect the spots updating both fields
      atomically is required.
      Signed-off-by: NAlex Elder <elder@dreamhost.com>
      Reviewed-by: NSage Weil <sage@newdream.net>
      d8fb02ab
  13. 04 1月, 2012 1 次提交
  14. 08 12月, 2011 1 次提交
    • S
      ceph: use i_ceph_lock instead of i_lock · be655596
      Sage Weil 提交于
      We have been using i_lock to protect all kinds of data structures in the
      ceph_inode_info struct, including lists of inodes that we need to iterate
      over while avoiding races with inode destruction.  That requires grabbing
      a reference to the inode with the list lock protected, but igrab() now
      takes i_lock to check the inode flags.
      
      Changing the list lock ordering would be a painful process.
      
      However, using a ceph-specific i_ceph_lock in the ceph inode instead of
      i_lock is a simple mechanical change and avoids the ordering constraints
      imposed by igrab().
      Reported-by: NAmon Ott <a.ott@m-privacy.de>
      Signed-off-by: NSage Weil <sage@newdream.net>
      be655596
  15. 06 11月, 2011 1 次提交
  16. 02 11月, 2011 1 次提交
  17. 26 10月, 2011 1 次提交
  18. 21 7月, 2011 1 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  19. 08 6月, 2011 1 次提交
  20. 25 5月, 2011 1 次提交
    • S
      ceph: fix cap flush race reentrancy · db354052
      Sage Weil 提交于
      In e9964c10 we change cap flushing to do a delicate dance because some
      inodes on the cap_dirty list could be in a migrating state (got EXPORT but
      not IMPORT) in which we couldn't actually flush and move from
      dirty->flushing, breaking the while (!empty) { process first } loop
      structure.  It worked for a single sync thread, but was not reentrant and
      triggered infinite loops when multiple syncers came along.
      
      Instead, move inodes with dirty to a separate cap_dirty_migrating list
      when in the limbo export-but-no-import state, allowing us to go back to
      the simple loop structure (which was reentrant).  This is cleaner and more
      robust.
      
      Audited the cap_dirty users and this looks fine:
      list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
      have dirty caps (which list we're on is irrelevant) and list_del_init()
      calls still do the right thing.
      Signed-off-by: NSage Weil <sage@newdream.net>
      db354052
  21. 20 5月, 2011 1 次提交
  22. 12 5月, 2011 1 次提交
  23. 05 5月, 2011 1 次提交