1. 07 9月, 2017 2 次提交
  2. 07 7月, 2017 1 次提交
  3. 04 5月, 2017 2 次提交
    • A
      ceph: fix file open flags on ppc64 · f775ff7d
      Alexander Graf 提交于
      The file open flags (O_foo) are platform specific and should never go
      out to an interface that is not local to the system.
      
      Unfortunately these flags have leaked out onto the wire in the cephfs
      implementation. That lead to bogus flags getting transmitted on ppc64.
      
      This patch converts the kernel view of flags to the ceph view of file
      open flags.
      
      Fixes: 124e68e7 ("ceph: file operations")
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      f775ff7d
    • Y
      ceph: make seeky readdir more efficient · 79162547
      Yan, Zheng 提交于
      Current cephfs client uses string to indicate start position of
      readdir. The string is last entry of previous readdir reply.
      This approach does not work for seeky readdir because we can
      not easily convert the new postion to a string. For seeky readdir,
      mds needs to return dentries from the beginning. Client keeps
      retrying if the reply does not contain the dentry it wants.
      
      In current version of ceph, mds sorts CDentry in its cache in
      hash order. Client also uses dentry hash to compose dir postion.
      For seeky readdir, if client passes the hash part of dir postion
      to mds. mds can avoid replying useless dentries.
      Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      79162547
  4. 13 12月, 2016 1 次提交
  5. 03 10月, 2016 1 次提交
  6. 25 8月, 2016 1 次提交
  7. 28 7月, 2016 4 次提交
  8. 26 5月, 2016 4 次提交
    • Y
      ceph: using hash value to compose dentry offset · f3c4ebe6
      Yan, Zheng 提交于
      If MDS sorts dentries in dirfrag in hash order, we use hash value to
      compose dentry offset. dentry offset is:
      
        (0xff << 52) | ((24 bits hash) << 28) |
        (the nth entry hash hash collision)
      
      This offset is stable across directory fragmentation. This alos means
      there is no need to reset readdir offset if directory get fragmented
      in the middle of readdir.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      f3c4ebe6
    • Y
      ceph: define 'end/complete' in readdir reply as bit flags · 956d39d6
      Yan, Zheng 提交于
      Set a flag in readdir request, which indicates that client interprets
      'end/complete' as bit flags. So that mds can reply additional flags in
      readdir reply.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      956d39d6
    • I
      737cc81e
    • I
      libceph, rbd: ceph_osd_linger_request, watch/notify v2 · 922dab61
      Ilya Dryomov 提交于
      This adds support and switches rbd to a new, more reliable version of
      watch/notify protocol.  As with the OSD client update, this is mostly
      about getting the right structures linked into the right places so that
      reconnects are properly sent when needed.  watch/notify v2 also
      requires sending regular pings to the OSDs - send_linger_ping().
      
      A major change from the old watch/notify implementation is the
      introduction of ceph_osd_linger_request - linger requests no longer
      piggy back on ceph_osd_request.  ceph_osd_event has been merged into
      ceph_osd_linger_request.
      
      All the details are now hidden within libceph, the interface consists
      of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
      ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
      the lifetime management simple.
      
      ceph_osdc_notify_ack() accepts an optional data payload, which is
      relayed back to the notifier.
      
      Portions of this patch are loosely based on work by Douglas Fuller
      <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      922dab61
  9. 26 3月, 2016 2 次提交
    • Y
      ceph: fix security xattr deadlock · 315f2408
      Yan, Zheng 提交于
      When security is enabled, security module can call filesystem's
      getxattr/setxattr callbacks during d_instantiate(). For cephfs,
      d_instantiate() is usually called by MDS' dispatch thread, while
      handling MDS reply. If the MDS reply does not include xattrs and
      corresponding caps, getxattr/setxattr need to send a new request
      to MDS and waits for the reply. This makes MDS' dispatch sleep,
      nobody handles later MDS replies.
      
      The fix is make sure lookup/atomic_open reply include xattrs and
      corresponding caps. So getxattr can be handled by cached xattrs.
      This requires some modification to both MDS and request message.
      (Client tells MDS what caps it wants; MDS encodes proper caps in
      the reply)
      
      Smack security module may call setxattr during d_instantiate().
      Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
      to us. So just make setxattr return error when called by MDS'
      dispatch thread.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      315f2408
    • I
      libceph: revamp subs code, switch to SUBSCRIBE2 protocol · 82dcabad
      Ilya Dryomov 提交于
      It is currently hard-coded in the mon_client that mdsmap and monmap
      subs are continuous, while osdmap sub is always "onetime".  To better
      handle full clusters/pools in the osd_client, we need to be able to
      issue continuous osdmap subs.  Revamp subs code to allow us to specify
      for each sub whether it should be continuous or not.
      
      Although not strictly required for the above, switch to SUBSCRIBE2
      protocol while at it, eliminating the ambiguity between a request for
      "every map since X" and a request for "just the latest" when we don't
      have a map yet (i.e. have epoch 0).  SUBSCRIBE2 feature bit is now
      required - it's been supported since pre-argonaut (2010).
      
      Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
      in before we validate the epoch and successfully install the new map
      can mess up mon_client sub state.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      82dcabad
  10. 22 4月, 2015 1 次提交
  11. 19 2月, 2015 2 次提交
  12. 18 12月, 2014 3 次提交
    • Y
      ceph: use getattr request to fetch inline data · 01deead0
      Yan, Zheng 提交于
      Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data
      in getattr reply will be copied to the page.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      01deead0
    • Y
      ceph: add inline data to pagecache · 31c542a1
      Yan, Zheng 提交于
      Request reply and cap message can contain inline data. add inline data
      to the page cache if there is Fc cap.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      31c542a1
    • Y
      ceph: fix file lock interruption · 9280be24
      Yan, Zheng 提交于
      When a lock operation is interrupted, current code sends a unlock request to
      MDS to undo the lock operation. This method does not work as expected because
      the unlock request can drop locks that have already been acquired.
      
      The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR
      requests to interrupt blocked file lock request. These requests do not drop
      locks that have alread been acquired, they only interrupt blocked file lock
      request.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      9280be24
  13. 06 6月, 2014 1 次提交
  14. 05 4月, 2014 1 次提交
    • Y
      ceph: use fl->fl_file as owner identifier of flock and posix lock · eb13e832
      Yan, Zheng 提交于
      flock and posix lock should use fl->fl_file instead of process ID
      as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner
      is usually equal to fl->fl_file, but it also can be a customized
      value). The process ID of who holds the lock is just for F_GETLK
      fcntl(2).
      
      The fix is rename the 'pid' fields of struct ceph_mds_request_args
      and struct ceph_filelock to 'owner', rename 'pid_namespace' fields
      to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages.
      We also set the most significant bit of the 'owner' field. MDS can
      use that bit to distinguish between old and new clients.
      
      The MDS counterpart of this patch modifies the flock code to not
      take the 'pid_namespace' into consideration when checking conflict
      locks.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      eb13e832
  15. 03 4月, 2014 1 次提交
  16. 18 2月, 2014 1 次提交
  17. 28 1月, 2014 1 次提交
  18. 21 1月, 2014 2 次提交
  19. 19 2月, 2013 1 次提交
    • A
      libceph: update ceph_fs.h · dd6f5e10
      Alex Elder 提交于
      Update most of "include/linux/ceph/ceph_fs.h" to match its user
      space counterpart in "src/include/ceph_fs.h" in the ceph tree.
      
      Everything that has changed is either:
          - added definitions (therefore no real effect on existing code)
          - deleting unused symbols
          - added or revised comments
      
      There were some differences between the struct definitions for
      ceph_mon_subscribe_item and the open field of ceph_mds_request_args;
      those differences remain.
      
      This and the next commit resolve:
          http://tracker.ceph.com/issues/4165Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
      dd6f5e10
  20. 03 10月, 2012 1 次提交
  21. 31 7月, 2012 1 次提交
  22. 08 5月, 2012 1 次提交
  23. 25 5月, 2011 1 次提交
  24. 22 3月, 2011 1 次提交
  25. 13 1月, 2011 1 次提交
    • S
      ceph: add dir_layout to inode · 6c0f3af7
      Sage Weil 提交于
      Add a ceph_dir_layout to the inode, and calculate dentry hash values based
      on the parent directory's specified dir_hash function.  This is needed
      because the old default Linux dcache hash function is extremely week and
      leads to a poor distribution of files among dir fragments.
      Signed-off-by: NSage Weil <sage@newdream.net>
      6c0f3af7
  26. 21 10月, 2010 2 次提交
    • G
      571dba52
    • Y
      ceph: factor out libceph from Ceph file system · 3d14c5d2
      Yehuda Sadeh 提交于
      This factors out protocol and low-level storage parts of ceph into a
      separate libceph module living in net/ceph and include/linux/ceph.  This
      is mostly a matter of moving files around.  However, a few key pieces
      of the interface change as well:
      
       - ceph_client becomes ceph_fs_client and ceph_client, where the latter
         captures the mon and osd clients, and the fs_client gets the mds client
         and file system specific pieces.
       - Mount option parsing and debugfs setup is correspondingly broken into
         two pieces.
       - The mon client gets a generic handler callback for otherwise unknown
         messages (mds map, in this case).
       - The basic supported/required feature bits can be expanded (and are by
         ceph_fs_client).
      
      No functional change, aside from some subtle error handling cases that got
      cleaned up in the refactoring process.
      Signed-off-by: NSage Weil <sage@newdream.net>
      3d14c5d2