1. 13 1月, 2011 2 次提交
    • S
      ceph: drop redundant r_mds field · 4af25fdd
      Sage Weil 提交于
      The r_mds field is redundant, since we can find the same information at
      r_session->s_mds, and when r_session is NULL then r_mds is meaningless.
      Signed-off-by: NSage Weil <sage@newdream.net>
      4af25fdd
    • S
      ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS · 14303d20
      Sage Weil 提交于
      This implements the DIRLAYOUTHASH protocol feature, which passes the dir
      layout over the wire from the MDS.  This gives the client knowledge
      of the correct hash function to use for mapping dentries among dir
      fragments.
      
      Note that if this feature is _not_ present on the client but is on the
      MDS, the client may misdirect requests.  This will result in a forward
      and degrade performance.  It may also result in inaccurate NFS filehandle
      generation, which will prevent fh resolution when the inode is not present
      in the client cache and the parent directories have been fragmented.
      Signed-off-by: NSage Weil <sage@newdream.net>
      14303d20
  2. 02 12月, 2010 1 次提交
  3. 08 11月, 2010 1 次提交
    • S
      ceph: fix uid/gid on resent mds requests · cb4276cc
      Sage Weil 提交于
      MDS requests can be rebuilt and resent in non-process context, but were
      filling in uid/gid from current_fsuid/gid.  Put that information in the
      request struct on request setup.
      
      This fixes incorrect (and root) uid/gid getting set for requests that
      are forwarded between MDSs, usually due to metadata migrations.
      Signed-off-by: NSage Weil <sage@newdream.net>
      cb4276cc
  4. 21 10月, 2010 1 次提交
    • Y
      ceph: factor out libceph from Ceph file system · 3d14c5d2
      Yehuda Sadeh 提交于
      This factors out protocol and low-level storage parts of ceph into a
      separate libceph module living in net/ceph and include/linux/ceph.  This
      is mostly a matter of moving files around.  However, a few key pieces
      of the interface change as well:
      
       - ceph_client becomes ceph_fs_client and ceph_client, where the latter
         captures the mon and osd clients, and the fs_client gets the mds client
         and file system specific pieces.
       - Mount option parsing and debugfs setup is correspondingly broken into
         two pieces.
       - The mon client gets a generic handler callback for otherwise unknown
         messages (mds map, in this case).
       - The basic supported/required feature bits can be expanded (and are by
         ceph_fs_client).
      
      No functional change, aside from some subtle error handling cases that got
      cleaned up in the refactoring process.
      Signed-off-by: NSage Weil <sage@newdream.net>
      3d14c5d2
  5. 23 8月, 2010 1 次提交
    • S
      ceph: fix multiple mds session shutdown · f3c60c59
      Sage Weil 提交于
      The use of a completion when waiting for session shutdown during umount is
      inappropriate, given the complexity of the condition.  For multiple MDS's,
      this resulted in the umount thread spinning, often preventing the session
      close message from being processed in some cases.
      
      Switch to a waitqueue and defined a condition helper.  This cleans things
      up nicely.
      Signed-off-by: NSage Weil <sage@newdream.net>
      f3c60c59
  6. 02 8月, 2010 4 次提交
  7. 17 7月, 2010 1 次提交
  8. 11 6月, 2010 2 次提交
  9. 18 5月, 2010 3 次提交
    • S
      ceph: use common helper for aborted dir request invalidation · 167c9e35
      Sage Weil 提交于
      We invalidate I_COMPLETE and dentry leases in two places: on aborted mds
      request and on request replay.  Use common helper to avoid duplicate code.
      Signed-off-by: NSage Weil <sage@newdream.net>
      167c9e35
    • S
      ceph: fix race between aborted requests and fill_trace · b4556396
      Sage Weil 提交于
      When we abort requests we need to prevent fill_trace et al from doing
      anything that relies on locks held by the VFS caller.  This fixes a race
      between the reply handler and the abort code, ensuring that continue
      holding the dir mutex until the reply handler completes.
      Signed-off-by: NSage Weil <sage@newdream.net>
      b4556396
    • S
      ceph: clean up mds reply, error handling · e1518c7c
      Sage Weil 提交于
      We would occasionally BUG out in the reply handler because r_reply was
      nonzero, due to a race with ceph_mdsc_do_request temporarily setting
      r_reply to an ERR_PTR value.  This is unnecessary, messy, and also wrong
      in the EIO case.
      
      Clean up by consistently using r_err for errors and r_reply for messages.
      Also fix the abort logic to trigger consistently for all errors that return
      to the caller early (e.g., EIO from timeout case).  If an abort races with
      a reply, use the result from the reply.
      
      Also fix locking for r_err, r_reply update in the reply handler.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e1518c7c
  10. 18 2月, 2010 1 次提交
    • S
      ceph: fix iterate_caps removal race · 7c1332b8
      Sage Weil 提交于
      We need to be able to iterate over all caps on a session with a
      possibly slow callback on each cap.  To allow this, we used to
      prevent cap reordering while we were iterating.  However, we were
      not safe from races with removal: removing the 'next' cap would
      make the next pointer from list_for_each_entry_safe be invalid,
      and cause a lock up or similar badness.
      
      Instead, we keep an iterator pointer in the session pointing to
      the current cap.  As before, we avoid reordering.  For removal,
      if the cap isn't the current cap we are iterating over, we are
      fine.  If it is, we clear cap->ci (to mark the cap as pending
      removal) but leave it in the session list.  In iterate_caps, we
      can safely finish removal and get the next cap pointer.
      
      While we're at it, clean up put_cap to not take a cap reservation
      context, as it was never used.
      Signed-off-by: NSage Weil <sage@newdream.net>
      7c1332b8
  11. 17 2月, 2010 2 次提交
  12. 26 1月, 2010 1 次提交
    • S
      ceph: properly handle aborted mds requests · 5b1daecd
      Sage Weil 提交于
      Previously, if the MDS request was interrupted, we would unregister the
      request and ignore any reply.  This could cause the caps or other cache
      state to become out of sync.  (For instance, aborting dbench and doing
      rm -r on clients would complain about a non-empty directory because the
      client didn't realize it's aborted file create request completed.)
      
      Even we don't unregister, we still can't process the reply normally because
      we are no longer holding the caller's locks (like the dir i_mutex).
      
      So, mark aborted operations with r_aborted, and in the reply handler, be
      sure to process all the caps.  Do not process the namespace changes,
      though, since we no longer will hold the dir i_mutex.  The dentry lease
      state can also be ignored as it's more forgiving.
      Signed-off-by: NSage Weil <sage@newdream.net>
      5b1daecd
  13. 24 12月, 2009 1 次提交
  14. 08 12月, 2009 1 次提交
  15. 19 11月, 2009 2 次提交
  16. 13 11月, 2009 1 次提交
  17. 11 11月, 2009 1 次提交
  18. 10 11月, 2009 1 次提交
    • S
      ceph: do not confuse stale and dead (unreconnected) caps · 685f9a5d
      Sage Weil 提交于
      We were using the cap_gen to track both stale caps (caps that timed out
      due to temporarily losing touch with the mds) and dead caps that did not
      reconnect after an MDS failure.  Introduce a recon_gen counter to track
      reconnections to restarted MDSs and kill dead caps based on that instead.
      
      Rename gen to cap_gen while we're at it to make it more clear which is
      which.
      Signed-off-by: NSage Weil <sage@newdream.net>
      685f9a5d
  19. 07 10月, 2009 1 次提交
    • S
      ceph: MDS client · 2f2dc053
      Sage Weil 提交于
      The MDS (metadata server) client is responsible for submitting
      requests to the MDS cluster and parsing the response.  We decide which
      MDS to submit each request to based on cached information about the
      current partition of the directory hierarchy across the cluster.  A
      stateful session is opened with each MDS before we submit requests to
      it, and a mutex is used to control the ordering of messages within
      each session.
      
      An MDS request may generate two responses.  The first indicates the
      operation was a success and returns any result.  A second reply is
      sent when the operation commits to disk.  Note that locking on the MDS
      ensures that the results of updates are visible only to the updating
      client before the operation commits.  Requests are linked to the
      containing directory so that an fsync will wait for them to commit.
      
      If an MDS fails and/or recovers, we resubmit requests as needed.  We
      also reconnect existing capabilities to a recovering MDS to
      reestablish that shared session state.  Old dentry leases are
      invalidated.
      Signed-off-by: NSage Weil <sage@newdream.net>
      2f2dc053