1. 28 7月, 2016 9 次提交
  2. 01 6月, 2016 1 次提交
    • Y
      ceph: improve fscache revalidation · f7f7e7a0
      Yan, Zheng 提交于
      There are several issues in fscache revalidation code.
      - In ceph_revalidate_work(), fscache_invalidate() is called when
        fscache_check_consistency() return 0. This is complete wrong
        because 0 means cache is valid.
      - Handle_cap_grant() calls ceph_queue_revalidate() if client
        already has CAP_FILE_CACHE. This code is confusing. Client
        should revalidate the cache each time it got CAP_FILE_CACHE
        anew.
      - In Handle_cap_grant(), fscache_invalidate() is called if MDS
        revokes CAP_FILE_CACHE. This is inconsistency with the case
        that inode get evicted. In the later case, the cache is not
        discarded. Client may use the cache when inode is reloaded.
      
      This patch moves the fscache revalidation into ceph_get_caps().
      Client revalidates the cache after it gets CAP_FILE_CACHE.
      i_rdcache_gen should keep constance while CAP_FILE_CACHE is
      used. If i_fscache_gen is not equal to i_rdcache_gen, client
      needs to check cache's consistency.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      f7f7e7a0
  3. 26 5月, 2016 6 次提交
  4. 24 4月, 2016 2 次提交
  5. 11 4月, 2016 1 次提交
  6. 26 3月, 2016 4 次提交
    • Y
      ceph: kill ceph_get_dentry_parent_inode() · 641235d8
      Yan, Zheng 提交于
      use vfs helper dget_parent() instead
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      641235d8
    • Y
      ceph: fix security xattr deadlock · 315f2408
      Yan, Zheng 提交于
      When security is enabled, security module can call filesystem's
      getxattr/setxattr callbacks during d_instantiate(). For cephfs,
      d_instantiate() is usually called by MDS' dispatch thread, while
      handling MDS reply. If the MDS reply does not include xattrs and
      corresponding caps, getxattr/setxattr need to send a new request
      to MDS and waits for the reply. This makes MDS' dispatch sleep,
      nobody handles later MDS replies.
      
      The fix is make sure lookup/atomic_open reply include xattrs and
      corresponding caps. So getxattr can be handled by cached xattrs.
      This requires some modification to both MDS and request message.
      (Client tells MDS what caps it wants; MDS encodes proper caps in
      the reply)
      
      Smack security module may call setxattr during d_instantiate().
      Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
      to us. So just make setxattr return error when called by MDS'
      dispatch thread.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      315f2408
    • I
      ceph: kill ceph_empty_snapc · 34b759b4
      Ilya Dryomov 提交于
      ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
      ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
      equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
      for sizing the request message.
      
      Further, in all four cases the subsequent ceph_osdc_build_request() is
      passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
      and making ceph_empty_snapc entirely useless.  The two cases where it
      actually mattered were removed in commits 86056090 ("ceph: avoid
      sending unnessesary FLUSHSNAP message") and 23078637 ("ceph: fix
      queuing inode to mdsdir's snaprealm").
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      34b759b4
    • Y
      ceph: don't enable rbytes mount option by default · 133e9156
      Yan, Zheng 提交于
      When rbytes mount option is enabled, directory size is recursive
      size. Recursive size is not updated instantly. This can cause
      directory size to change between successive stat(1)
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      133e9156
  7. 05 3月, 2016 1 次提交
  8. 03 11月, 2015 1 次提交
  9. 31 7月, 2015 1 次提交
    • Y
      ceph: always re-send cap flushes when MDS recovers · fc927cd3
      Yan, Zheng 提交于
      commit e548e9b9 makes the kclient
      only re-send cap flush once during MDS failover. If the kclient sends
      a cap flush after MDS enters reconnect stage but before MDS recovers.
      The kclient will skip re-sending the same cap flush when MDS recovers.
      
      This causes problem for newly created inode. The MDS handles cap
      flushes before replaying unsafe requests, so it's possible that MDS
      find corresponding inode is missing when handling cap flush. The fix
      is reverting to old behaviour: always re-send when MDS recovers
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fc927cd3
  10. 25 6月, 2015 9 次提交
    • Y
      ceph: rework dcache readdir · fdd4e158
      Yan, Zheng 提交于
      Previously our dcache readdir code relies on that child dentries in
      directory dentry's d_subdir list are sorted by dentry's offset in
      descending order. When adding dentries to the dcache, if a dentry
      already exists, our readdir code moves it to head of directory
      dentry's d_subdir list. This design relies on dcache internals.
      Al Viro suggests using ncpfs's approach: keeping array of pointers
      to dentries in page cache of directory inode. the validity of those
      pointers are presented by directory inode's complete and ordered
      flags. When a dentry gets pruned, we clear directory inode's complete
      flag in the d_prune() callback. Before moving a dentry to other
      directory, we clear the ordered flag for both old and new directory.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      fdd4e158
    • Y
      f66fd9f0
    • Y
      ceph: re-send flushing caps (which are revoked) in reconnect stage · e548e9b9
      Yan, Zheng 提交于
      if flushing caps were revoked, we should re-send the cap flush in
      client reconnect stage. This guarantees that MDS processes the cap
      flush message before issuing the flushing caps to other client.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      e548e9b9
    • Y
      ceph: track pending caps flushing globally · 8310b089
      Yan, Zheng 提交于
      So we know TID of the oldest pending caps flushing. Later patch will
      send this information to MDS, so that MDS can trim its completed caps
      flush list.
      
      Tracking pending caps flushing globally also simplifies syncfs code.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      8310b089
    • Y
      ceph: track pending caps flushing accurately · 553adfd9
      Yan, Zheng 提交于
      Previously we do not trace accurate TID for flushing caps. when
      MDS failovers, we have no choice but to re-send all flushing caps
      with a new TID. This can cause problem because MDS can has already
      flushed some caps and has issued the same caps to other client.
      The re-sent cap flush has a new TID, which makes MDS unable to
      detect if it has already processed the cap flush.
      
      This patch adds code to track pending caps flushing accurately.
      When re-sending cap flush is needed, we use its original flush
      TID.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      553adfd9
    • Y
      ceph: don't pre-allocate space for cap release messages · 745a8e3b
      Yan, Zheng 提交于
      Previously we pre-allocate cap release messages for each caps. This
      wastes lots of memory when there are large amount of caps. This patch
      make the code not pre-allocate the cap release messages. Instead,
      we add the corresponding ceph_cap struct to a list when releasing a
      cap. Later when flush cap releases is needed, we allocate the cap
      release messages dynamically.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      745a8e3b
    • Y
      ceph: avoid sending unnessesary FLUSHSNAP message · 86056090
      Yan, Zheng 提交于
      when a snap notification contains no new snapshot, we can avoid
      sending FLUSHSNAP message to MDS. But we still need to create
      cap_snap in some case because it's required by write path and
      page writeback path
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      86056090
    • Y
      ceph: use empty snap context for uninline_data and get_pool_perm · 7b06a826
      Yan, Zheng 提交于
      Cached_context in ceph_snap_realm is directly accessed by
      uninline_data() and get_pool_perm(). This is racy in theory.
      both uninline_data() and get_pool_perm() do not modify existing
      object, they only create new object. So we can pass the empty
      snap context to them.  Unlike cached_context in ceph_snap_realm,
      we do not need to protect the empty snap context.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      7b06a826
    • Y
      ceph: check OSD caps before read/write · 10183a69
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      10183a69
  11. 20 4月, 2015 2 次提交
  12. 19 2月, 2015 2 次提交
  13. 18 12月, 2014 1 次提交
    • Y
      ceph: flush inline version · e20d258d
      Yan, Zheng 提交于
      After converting inline data to normal data, client need to flush
      the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes
      cap messages (sent to MDS) contain inline_version and inline_data.
      Client always converts inline data to normal data before data write,
      so the inline data length part is always zero.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      e20d258d