1. 26 3月, 2016 2 次提交
    • I
      ceph: kill ceph_empty_snapc · 34b759b4
      Ilya Dryomov 提交于
      ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
      ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
      equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
      for sizing the request message.
      
      Further, in all four cases the subsequent ceph_osdc_build_request() is
      passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
      and making ceph_empty_snapc entirely useless.  The two cases where it
      actually mattered were removed in commits 86056090 ("ceph: avoid
      sending unnessesary FLUSHSNAP message") and 23078637 ("ceph: fix
      queuing inode to mdsdir's snaprealm").
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NYan, Zheng <zyan@redhat.com>
      34b759b4
    • Y
      ceph: don't enable rbytes mount option by default · 133e9156
      Yan, Zheng 提交于
      When rbytes mount option is enabled, directory size is recursive
      size. Recursive size is not updated instantly. This can cause
      directory size to change between successive stat(1)
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      133e9156
  2. 05 3月, 2016 1 次提交
  3. 03 11月, 2015 1 次提交
  4. 31 7月, 2015 1 次提交
    • Y
      ceph: always re-send cap flushes when MDS recovers · fc927cd3
      Yan, Zheng 提交于
      commit e548e9b9 makes the kclient
      only re-send cap flush once during MDS failover. If the kclient sends
      a cap flush after MDS enters reconnect stage but before MDS recovers.
      The kclient will skip re-sending the same cap flush when MDS recovers.
      
      This causes problem for newly created inode. The MDS handles cap
      flushes before replaying unsafe requests, so it's possible that MDS
      find corresponding inode is missing when handling cap flush. The fix
      is reverting to old behaviour: always re-send when MDS recovers
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fc927cd3
  5. 25 6月, 2015 9 次提交
    • Y
      ceph: rework dcache readdir · fdd4e158
      Yan, Zheng 提交于
      Previously our dcache readdir code relies on that child dentries in
      directory dentry's d_subdir list are sorted by dentry's offset in
      descending order. When adding dentries to the dcache, if a dentry
      already exists, our readdir code moves it to head of directory
      dentry's d_subdir list. This design relies on dcache internals.
      Al Viro suggests using ncpfs's approach: keeping array of pointers
      to dentries in page cache of directory inode. the validity of those
      pointers are presented by directory inode's complete and ordered
      flags. When a dentry gets pruned, we clear directory inode's complete
      flag in the d_prune() callback. Before moving a dentry to other
      directory, we clear the ordered flag for both old and new directory.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      fdd4e158
    • Y
      f66fd9f0
    • Y
      ceph: re-send flushing caps (which are revoked) in reconnect stage · e548e9b9
      Yan, Zheng 提交于
      if flushing caps were revoked, we should re-send the cap flush in
      client reconnect stage. This guarantees that MDS processes the cap
      flush message before issuing the flushing caps to other client.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      e548e9b9
    • Y
      ceph: track pending caps flushing globally · 8310b089
      Yan, Zheng 提交于
      So we know TID of the oldest pending caps flushing. Later patch will
      send this information to MDS, so that MDS can trim its completed caps
      flush list.
      
      Tracking pending caps flushing globally also simplifies syncfs code.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      8310b089
    • Y
      ceph: track pending caps flushing accurately · 553adfd9
      Yan, Zheng 提交于
      Previously we do not trace accurate TID for flushing caps. when
      MDS failovers, we have no choice but to re-send all flushing caps
      with a new TID. This can cause problem because MDS can has already
      flushed some caps and has issued the same caps to other client.
      The re-sent cap flush has a new TID, which makes MDS unable to
      detect if it has already processed the cap flush.
      
      This patch adds code to track pending caps flushing accurately.
      When re-sending cap flush is needed, we use its original flush
      TID.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      553adfd9
    • Y
      ceph: don't pre-allocate space for cap release messages · 745a8e3b
      Yan, Zheng 提交于
      Previously we pre-allocate cap release messages for each caps. This
      wastes lots of memory when there are large amount of caps. This patch
      make the code not pre-allocate the cap release messages. Instead,
      we add the corresponding ceph_cap struct to a list when releasing a
      cap. Later when flush cap releases is needed, we allocate the cap
      release messages dynamically.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      745a8e3b
    • Y
      ceph: avoid sending unnessesary FLUSHSNAP message · 86056090
      Yan, Zheng 提交于
      when a snap notification contains no new snapshot, we can avoid
      sending FLUSHSNAP message to MDS. But we still need to create
      cap_snap in some case because it's required by write path and
      page writeback path
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      86056090
    • Y
      ceph: use empty snap context for uninline_data and get_pool_perm · 7b06a826
      Yan, Zheng 提交于
      Cached_context in ceph_snap_realm is directly accessed by
      uninline_data() and get_pool_perm(). This is racy in theory.
      both uninline_data() and get_pool_perm() do not modify existing
      object, they only create new object. So we can pass the empty
      snap context to them.  Unlike cached_context in ceph_snap_realm,
      we do not need to protect the empty snap context.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      7b06a826
    • Y
      ceph: check OSD caps before read/write · 10183a69
      Yan, Zheng 提交于
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      10183a69
  6. 20 4月, 2015 2 次提交
  7. 19 2月, 2015 2 次提交
  8. 18 12月, 2014 6 次提交
  9. 15 10月, 2014 3 次提交
    • J
      ceph: additional debugfs output · 14ed9703
      John Spray 提交于
      MDS session state and client global ID is
      useful instrumentation when testing.
      Signed-off-by: NJohn Spray <john.spray@redhat.com>
      14ed9703
    • Y
      ceph: include the initial ACL in create/mkdir/mknod MDS requests · b1ee94aa
      Yan, Zheng 提交于
      Current code set new file/directory's initial ACL in a non-atomic
      manner.
      Client first sends request to MDS to create new file/directory, then set
      the initial ACL after the new file/directory is successfully created.
      
      The fix is include the initial ACL in create/mkdir/mknod MDS requests.
      So MDS can handle creating file/directory and setting the initial ACL in
      one request.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      b1ee94aa
    • Y
      ceph: request xattrs if xattr_version is zero · 508b32d8
      Yan, Zheng 提交于
      Following sequence of events can happen.
        - Client releases an inode, queues cap release message.
        - A 'lookup' reply brings the same inode back, but the reply
          doesn't contain xattrs because MDS didn't receive the cap release
          message and thought client already has up-to-data xattrs.
      
      The fix is force sending a getattr request to MDS if xattrs_version
      is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client
      does not have xattr.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      508b32d8
  10. 06 6月, 2014 1 次提交
  11. 29 4月, 2014 1 次提交
    • Y
      ceph: clear directory's completeness when creating file · 0a8a70f9
      Yan, Zheng 提交于
      When creating a file, ceph_set_dentry_offset() puts the new dentry
      at the end of directory's d_subdirs, then set the dentry's offset
      based on directory's max offset. The offset does not reflect the
      real postion of the dentry in directory. Later readdir reply from
      MDS may change the dentry's position/offset. This inconsistency
      can cause missing/duplicate entries in readdir result if readdir
      is partly satisfied by dcache_readdir().
      
      The fix is clear directory's completeness after creating/renaming
      file. It prevents later readdir from using dcache_readdir().
      
      Fixes: http://tracker.ceph.com/issues/8025Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      0a8a70f9
  12. 05 4月, 2014 1 次提交
    • Y
      ceph: use fl->fl_file as owner identifier of flock and posix lock · eb13e832
      Yan, Zheng 提交于
      flock and posix lock should use fl->fl_file instead of process ID
      as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
      is usually equal to fl->fl_file, but it also can be a customized
      value). The process ID of who holds the lock is just for F_GETLK
      fcntl(2).
      
      The fix is rename the 'pid' fields of struct ceph_mds_request_args
      and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
      to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
      We also set the most significant bit of the 'owner' field. MDS can
      use that bit to distinguish between old and new clients.
      
      The MDS counterpart of this patch modifies the flock code to not
      take the 'pid_namespace' into consideration when checking conflict
      locks.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      eb13e832
  13. 03 4月, 2014 1 次提交
    • Y
      ceph: fix ceph_dir_llseek() · f0494206
      Yan, Zheng 提交于
      Comparing offset with inode->i_sb->s_maxbytes doesn't make sense for
      directory. For a fragmented directory, offset (frag_t, off) can be
      larger than inode->i_sb->s_maxbytes.
      
      At the very beginning of ceph_dir_llseek(), local variable old_offset
      is initialized to parameter offset. This doesn't make sense neither.
      Old_offset should be ceph_make_fpos(fi->frag, fi->next_offset).
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      f0494206
  14. 18 2月, 2014 1 次提交
  15. 31 1月, 2014 1 次提交
  16. 30 1月, 2014 1 次提交
  17. 29 1月, 2014 1 次提交
    • L
      ceph: Fix up after semantic merge conflict · 4db658ea
      Linus Torvalds 提交于
      The previous ceph-client merge resulted in ceph not even building,
      because there was a merge conflict that wasn't visible as an actual data
      conflict: commit 7221fe4c ("ceph: add acl for cephfs") added support
      for POSIX ACL's into Ceph, but unluckily we also had the VFS tree change
      a lot of the POSIX ACL helper functions to be much more helpful to
      filesystems (see for example commits 2aeccbe9 "fs: add generic
      xattr_acl handlers", 5bf3258f "fs: make posix_acl_chmod more useful"
      and 37bc1539 "fs: make posix_acl_create more useful")
      
      The reason this conflict wasn't obvious was many-fold: because it was a
      semantic conflict rather than a data conflict, it wasn't visible in the
      git merge as a conflict.  And because the VFS tree hadn't been in
      linux-next, people hadn't become aware of it that way.  And because I
      was at jury duty this morning, I was using my laptop and as a result not
      doing constant "allmodconfig" builds.
      
      Anyway, this fixes the build and generally removes a fair chunk of the
      Ceph POSIX ACL support code, since the improved helpers seem to match
      really well for Ceph too.  But I don't actually have any way to *test*
      the end result, and I was really hoping for some ACK's for this.  Oh,
      well.
      
      Not compiling certainly doesn't make things easier to test, so I'm
      committing this without the acks after having waited for four hours...
      Plus it's what I would have done for the merge had I noticed the
      semantic conflict..
      Reported-by: NDave Jones <davej@redhat.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Guangliang Zhao <lucienchao@gmail.com>
      Cc: Li Wang <li.wang@ubuntykylin.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4db658ea
  18. 21 1月, 2014 3 次提交
    • Y
      ceph: add imported caps when handling cap export message · 11df2dfb
      Yan, Zheng 提交于
      Version 3 cap export message includes information about the imported
      caps. It allows us to add the imported caps if the corresponding cap
      import message still hasn't been received.
      
      This allow us to handle situation that the importer MDS crashes and
      the cap import message is missing.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      11df2dfb
    • Y
      ceph: check inode caps in ceph_d_revalidate · 9215aeea
      Yan, Zheng 提交于
      Some inodes in readdir reply may have no caps. Getattr mds request
      for these inodes can return -ESTALE. The fix is consider dentry that
      links to inode with no caps as invalid. Invalid dentry causes a
      lookup request to send to the mds, the MDS will send caps back.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      9215aeea
    • Y
      ceph: fix cache revoke race · 9563f88c
      Yan, Zheng 提交于
      handle following sequence of events:
      
      - non-auth MDS revokes Fc cap. queue invalidate work
      - auth MDS issues Fc cap through request reply. i_rdcache_gen gets
        increased.
      - invalidate work runs. it finds i_rdcache_revoking != i_rdcache_gen,
        so it does nothing.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      9563f88c
  19. 01 1月, 2014 1 次提交
  20. 14 12月, 2013 1 次提交