1. 26 5月, 2016 1 次提交
  2. 26 3月, 2016 4 次提交
    • Y
      ceph: kill ceph_get_dentry_parent_inode() · 641235d8
      Yan, Zheng 提交于
      use vfs helper dget_parent() instead
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      641235d8
    • Y
      ceph: fix security xattr deadlock · 315f2408
      Yan, Zheng 提交于
      When security is enabled, security module can call filesystem's
      getxattr/setxattr callbacks during d_instantiate(). For cephfs,
      d_instantiate() is usually called by MDS' dispatch thread, while
      handling MDS reply. If the MDS reply does not include xattrs and
      corresponding caps, getxattr/setxattr need to send a new request
      to MDS and waits for the reply. This makes MDS' dispatch sleep,
      nobody handles later MDS replies.
      
      The fix is make sure lookup/atomic_open reply include xattrs and
      corresponding caps. So getxattr can be handled by cached xattrs.
      This requires some modification to both MDS and request message.
      (Client tells MDS what caps it wants; MDS encodes proper caps in
      the reply)
      
      Smack security module may call setxattr during d_instantiate().
      Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
      to us. So just make setxattr return error when called by MDS'
      dispatch thread.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      315f2408
    • I
      ceph: kill ceph_empty_snapc · 34b759b4
      Ilya Dryomov 提交于
      ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
      ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
      equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
      for sizing the request message.
      
      Further, in all four cases the subsequent ceph_osdc_build_request() is
      passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
      and making ceph_empty_snapc entirely useless.  The two cases where it
      actually mattered were removed in commits 86056090 ("ceph: avoid
      sending unnessesary FLUSHSNAP message") and 23078637 ("ceph: fix
      queuing inode to mdsdir's snaprealm").
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      34b759b4
    • Y
      ceph: don't enable rbytes mount option by default · 133e9156
      Yan, Zheng 提交于
      When rbytes mount option is enabled, directory size is recursive
      size. Recursive size is not updated instantly. This can cause
      directory size to change between successive stat(1)
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      133e9156
  3. 05 3月, 2016 1 次提交
  4. 03 11月, 2015 1 次提交
  5. 31 7月, 2015 1 次提交
    • Y
      ceph: always re-send cap flushes when MDS recovers · fc927cd3
      Yan, Zheng 提交于
      commit e548e9b9 makes the kclient
      only re-send cap flush once during MDS failover. If the kclient sends
      a cap flush after MDS enters reconnect stage but before MDS recovers.
      The kclient will skip re-sending the same cap flush when MDS recovers.
      
      This causes problem for newly created inode. The MDS handles cap
      flushes before replaying unsafe requests, so it's possible that MDS
      find corresponding inode is missing when handling cap flush. The fix
      is reverting to old behaviour: always re-send when MDS recovers
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fc927cd3
  6. 25 6月, 2015 9 次提交
    • Y
      ceph: rework dcache readdir · fdd4e158
      Yan, Zheng 提交于
      Previously our dcache readdir code relies on that child dentries in
      directory dentry's d_subdir list are sorted by dentry's offset in
      descending order. When adding dentries to the dcache, if a dentry
      already exists, our readdir code moves it to head of directory
      dentry's d_subdir list. This design relies on dcache internals.
      Al Viro suggests using ncpfs's approach: keeping array of pointers
      to dentries in page cache of directory inode. the validity of those
      pointers are presented by directory inode's complete and ordered
      flags. When a dentry gets pruned, we clear directory inode's complete
      flag in the d_prune() callback. Before moving a dentry to other
      directory, we clear the ordered flag for both old and new directory.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      fdd4e158
    • Y
      f66fd9f0
    • Y
      ceph: re-send flushing caps (which are revoked) in reconnect stage · e548e9b9
      Yan, Zheng 提交于
      if flushing caps were revoked, we should re-send the cap flush in
      client reconnect stage. This guarantees that MDS processes the cap
      flush message before issuing the flushing caps to other client.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      e548e9b9
    • Y
      ceph: track pending caps flushing globally · 8310b089
      Yan, Zheng 提交于
      So we know TID of the oldest pending caps flushing. Later patch will
      send this information to MDS, so that MDS can trim its completed caps
      flush list.
      
      Tracking pending caps flushing globally also simplifies syncfs code.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      8310b089
    • Y
      ceph: track pending caps flushing accurately · 553adfd9
      Yan, Zheng 提交于
      Previously we do not trace accurate TID for flushing caps. when
      MDS failovers, we have no choice but to re-send all flushing caps
      with a new TID. This can cause problem because MDS can has already
      flushed some caps and has issued the same caps to other client.
      The re-sent cap flush has a new TID, which makes MDS unable to
      detect if it has already processed the cap flush.
      
      This patch adds code to track pending caps flushing accurately.
      When re-sending cap flush is needed, we use its original flush
      TID.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      553adfd9
    • Y
      ceph: don't pre-allocate space for cap release messages · 745a8e3b
      Yan, Zheng 提交于
      Previously we pre-allocate cap release messages for each caps. This
      wastes lots of memory when there are large amount of caps. This patch
      make the code not pre-allocate the cap release messages. Instead,
      we add the corresponding ceph_cap struct to a list when releasing a
      cap. Later when flush cap releases is needed, we allocate the cap
      release messages dynamically.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      745a8e3b
    • Y
      ceph: avoid sending unnessesary FLUSHSNAP message · 86056090
      Yan, Zheng 提交于
      when a snap notification contains no new snapshot, we can avoid
      sending FLUSHSNAP message to MDS. But we still need to create
      cap_snap in some case because it's required by write path and
      page writeback path
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      86056090
    • Y
      ceph: use empty snap context for uninline_data and get_pool_perm · 7b06a826
      Yan, Zheng 提交于
      Cached_context in ceph_snap_realm is directly accessed by
      uninline_data() and get_pool_perm(). This is racy in theory.
      both uninline_data() and get_pool_perm() do not modify existing
      object, they only create new object. So we can pass the empty
      snap context to them.  Unlike cached_context in ceph_snap_realm,
      we do not need to protect the empty snap context.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      7b06a826
    • Y
      ceph: check OSD caps before read/write · 10183a69
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      10183a69
  7. 20 4月, 2015 2 次提交
  8. 19 2月, 2015 2 次提交
  9. 18 12月, 2014 6 次提交
  10. 15 10月, 2014 3 次提交
    • J
      ceph: additional debugfs output · 14ed9703
      John Spray 提交于
      MDS session state and client global ID is
      useful instrumentation when testing.
      Signed-off-by: NJohn Spray <john.spray@redhat.com>
      14ed9703
    • Y
      ceph: include the initial ACL in create/mkdir/mknod MDS requests · b1ee94aa
      Yan, Zheng 提交于
      Current code set new file/directory's initial ACL in a non-atomic
      manner.
      Client first sends request to MDS to create new file/directory, then set
      the initial ACL after the new file/directory is successfully created.
      
      The fix is include the initial ACL in create/mkdir/mknod MDS requests.
      So MDS can handle creating file/directory and setting the initial ACL in
      one request.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      b1ee94aa
    • Y
      ceph: request xattrs if xattr_version is zero · 508b32d8
      Yan, Zheng 提交于
      Following sequence of events can happen.
        - Client releases an inode, queues cap release message.
        - A 'lookup' reply brings the same inode back, but the reply
          doesn't contain xattrs because MDS didn't receive the cap release
          message and thought client already has up-to-data xattrs.
      
      The fix is force sending a getattr request to MDS if xattrs_version
      is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
      does not have xattr.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      508b32d8
  11. 06 6月, 2014 1 次提交
  12. 29 4月, 2014 1 次提交
    • Y
      ceph: clear directory's completeness when creating file · 0a8a70f9
      Yan, Zheng 提交于
      When creating a file, ceph_set_dentry_offset() puts the new dentry
      at the end of directory's d_subdirs, then set the dentry's offset
      based on directory's max offset. The offset does not reflect the
      real postion of the dentry in directory. Later readdir reply from
      MDS may change the dentry's position/offset. This inconsistency
      can cause missing/duplicate entries in readdir result if readdir
      is partly satisfied by dcache_readdir().
      
      The fix is clear directory's completeness after creating/renaming
      file. It prevents later readdir from using dcache_readdir().
      
      Fixes: http://tracker.ceph.com/issues/8025Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      0a8a70f9
  13. 05 4月, 2014 1 次提交
    • Y
      ceph: use fl->fl_file as owner identifier of flock and posix lock · eb13e832
      Yan, Zheng 提交于
      flock and posix lock should use fl->fl_file instead of process ID
      as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
      is usually equal to fl->fl_file, but it also can be a customized
      value). The process ID of who holds the lock is just for F_GETLK
      fcntl(2).
      
      The fix is rename the 'pid' fields of struct ceph_mds_request_args
      and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
      to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
      We also set the most significant bit of the 'owner' field. MDS can
      use that bit to distinguish between old and new clients.
      
      The MDS counterpart of this patch modifies the flock code to not
      take the 'pid_namespace' into consideration when checking conflict
      locks.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      eb13e832
  14. 03 4月, 2014 1 次提交
    • Y
      ceph: fix ceph_dir_llseek() · f0494206
      Yan, Zheng 提交于
      Comparing offset with inode->i_sb->s_maxbytes doesn't make sense for
      directory. For a fragmented directory, offset (frag_t, off) can be
      larger than inode->i_sb->s_maxbytes.
      
      At the very beginning of ceph_dir_llseek(), local variable old_offset
      is initialized to parameter offset. This doesn't make sense neither.
      Old_offset should be ceph_make_fpos(fi->frag, fi->next_offset).
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      f0494206
  15. 18 2月, 2014 1 次提交
  16. 31 1月, 2014 1 次提交
  17. 30 1月, 2014 1 次提交
  18. 29 1月, 2014 1 次提交
    • L
      ceph: Fix up after semantic merge conflict · 4db658ea
      Linus Torvalds 提交于
      The previous ceph-client merge resulted in ceph not even building,
      because there was a merge conflict that wasn't visible as an actual data
      conflict: commit 7221fe4c ("ceph: add acl for cephfs") added support
      for POSIX ACL's into Ceph, but unluckily we also had the VFS tree change
      a lot of the POSIX ACL helper functions to be much more helpful to
      filesystems (see for example commits 2aeccbe9 "fs: add generic
      xattr_acl handlers", 5bf3258f "fs: make posix_acl_chmod more useful"
      and 37bc1539 "fs: make posix_acl_create more useful")
      
      The reason this conflict wasn't obvious was many-fold: because it was a
      semantic conflict rather than a data conflict, it wasn't visible in the
      git merge as a conflict.  And because the VFS tree hadn't been in
      linux-next, people hadn't become aware of it that way.  And because I
      was at jury duty this morning, I was using my laptop and as a result not
      doing constant "allmodconfig" builds.
      
      Anyway, this fixes the build and generally removes a fair chunk of the
      Ceph POSIX ACL support code, since the improved helpers seem to match
      really well for Ceph too.  But I don't actually have any way to *test*
      the end result, and I was really hoping for some ACK's for this.  Oh,
      well.
      
      Not compiling certainly doesn't make things easier to test, so I'm
      committing this without the acks after having waited for four hours...
      Plus it's what I would have done for the merge had I noticed the
      semantic conflict..
      Reported-by: NDave Jones <davej@redhat.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Guangliang Zhao <lucienchao@gmail.com>
      Cc: Li Wang <li.wang@ubuntykylin.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4db658ea
  19. 21 1月, 2014 2 次提交