提交 · ff3d0046625c1b37df37beb8477135d44dae2823 · openanolis / cloud-kernel

12 2月, 2013 1 次提交

ceph: Convert struct ceph_mds_request to use kuid_t and kgid_t · ff3d0046

由 Eric W. Biederman 提交于 1月 31, 2013

Hold the uid and gid for a pending ceph mds request using the types
kuid_t and kgid_t.  When a request message is finally created convert
the kuid_t and kgid_t values into uids and gids in the initial user
namespace.

Cc: Sage Weil <sage@inktank.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

ff3d0046

17 5月, 2012 1 次提交

ceph: define ceph_auth_handshake type · 6c4a1915

由 Alex Elder 提交于 5月 16, 2012

The definitions for the ceph_mds_session and ceph_osd both contain
five fields related only to "authorizers."  Encapsulate those fields
into their own struct type, allowing for better isolation in some
upcoming patches.

Fix the #includes in "linux/ceph/osd_client.h" to lay out their more
complete canonical path.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

6c4a1915

03 2月, 2012 1 次提交

ceph: create a new session lock to avoid lock inversion · d8fb02ab

由 Alex Elder 提交于 1月 12, 2012

Lockdep was reporting a possible circular lock dependency in
dentry_lease_is_valid().  That function needs to sample the
session's s_cap_gen and and s_cap_ttl fields coherently, but needs
to do so while holding a dentry lock.  The s_cap_lock field was
being used to protect the two fields, but that can't be taken while
holding a lock on a dentry within the session.

In most cases, the s_cap_gen and s_cap_ttl fields only get operated
on separately.  But in three cases they need to be updated together.
Implement a new lock to protect the spots updating both fields
atomically is required.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

d8fb02ab

08 12月, 2011 1 次提交

ceph: use i_ceph_lock instead of i_lock · be655596

由 Sage Weil 提交于 11月 30, 2011

We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction.  That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().
Reported-by: NAmon Ott <a.ott@m-privacy.de>
Signed-off-by: NSage Weil <sage@newdream.net>

be655596

27 7月, 2011 2 次提交

ceph: explicitly reference rename old_dentry parent dir in request · 41b02e1f

由 Sage Weil 提交于 7月 26, 2011

We carry a pin on the parent directory for the rename source and dest
dentries.  For the source it's r_locked_dir; we need to explicitly
reference the old_dentry parent as well, since the dentry's d_parent may
change between when the request was created and pinned and when it is
freed.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

41b02e1f

ceph: ignore lease mask · 2f90b852

由 Sage Weil 提交于 7月 26, 2011

The lease mask is no longer used (and it changed a while back).  Instead,
use a non-zero duration to indicate that there is a lease being issued.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

2f90b852

25 5月, 2011 1 次提交

ceph: fix cap flush race reentrancy · db354052

由 Sage Weil 提交于 5月 24, 2011

In e9964c10 we change cap flushing to do a delicate dance because some
inodes on the cap_dirty list could be in a migrating state (got EXPORT but
not IMPORT) in which we couldn't actually flush and move from
dirty->flushing, breaking the while (!empty) { process first } loop
structure.  It worked for a single sync thread, but was not reentrant and
triggered infinite loops when multiple syncers came along.

Instead, move inodes with dirty to a separate cap_dirty_migrating list
when in the limbo export-but-no-import state, allowing us to go back to
the simple loop structure (which was reentrant).  This is cleaner and more
robust.

Audited the cap_dirty users and this looks fine:
list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
have dirty caps (which list we're on is irrelevant) and list_del_init()
calls still do the right thing.
Signed-off-by: NSage Weil <sage@newdream.net>

db354052

13 1月, 2011 2 次提交

ceph: drop redundant r_mds field · 4af25fdd

由 Sage Weil 提交于 11月 02, 2010

The r_mds field is redundant, since we can find the same information at
r_session->s_mds, and when r_session is NULL then r_mds is meaningless.
Signed-off-by: NSage Weil <sage@newdream.net>

4af25fdd

ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS · 14303d20

由 Sage Weil 提交于 12月 14, 2010

This implements the DIRLAYOUTHASH protocol feature, which passes the dir
layout over the wire from the MDS. This gives the client knowledge
of the correct hash function to use for mapping dentries among dir
fragments.

Note that if this feature is _not_ present on the client but is on the
MDS, the client may misdirect requests. This will result in a forward
and degrade performance. It may also result in inaccurate NFS filehandle
generation, which will prevent fh resolution when the inode is not present
in the client cache and the parent directories have been fragmented.
Signed-off-by: NSage Weil <sage@newdream.net>

14303d20

02 12月, 2010 1 次提交

ceph: Handle file locks in replies from the MDS. · 25933abd

由 Herb Shiu 提交于 12月 01, 2010

Previously the kernel client incorrectly assumed everything was a directory.
Signed-off-by: NHerb Shiu <herb_shiu@tcloudcomputing.com>
Acked-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

25933abd

08 11月, 2010 1 次提交

ceph: fix uid/gid on resent mds requests · cb4276cc

由 Sage Weil 提交于 11月 08, 2010

MDS requests can be rebuilt and resent in non-process context, but were
filling in uid/gid from current_fsuid/gid.  Put that information in the
request struct on request setup.

This fixes incorrect (and root) uid/gid getting set for requests that
are forwarded between MDSs, usually due to metadata migrations.
Signed-off-by: NSage Weil <sage@newdream.net>

cb4276cc

21 10月, 2010 1 次提交

ceph: factor out libceph from Ceph file system · 3d14c5d2

由 Yehuda Sadeh 提交于 4月 06, 2010

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: NSage Weil <sage@newdream.net>

3d14c5d2

23 8月, 2010 1 次提交

ceph: fix multiple mds session shutdown · f3c60c59

由 Sage Weil 提交于 8月 11, 2010

The use of a completion when waiting for session shutdown during umount is
inappropriate, given the complexity of the condition.  For multiple MDS's,
this resulted in the umount thread spinning, often preventing the session
close message from being processed in some cases.

Switch to a waitqueue and defined a condition helper.  This cleans things
up nicely.
Signed-off-by: NSage Weil <sage@newdream.net>

f3c60c59

02 8月, 2010 4 次提交

G
ceph: handle ESTALE properly; on receipt send to authority if it wasn't · e55b71f8
由 Greg Farnum 提交于 6月 22, 2010
```
Signed-off-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>
```
e55b71f8

ceph: connect to export targets on cap export · 154f42c2

由 Sage Weil 提交于 6月 21, 2010

When we get a cap EXPORT message, make sure we are connected to all export
targets to ensure we can handle the matching IMPORT.
Signed-off-by: NSage Weil <sage@newdream.net>

154f42c2

ceph: do caps accounting per mds_client · 37151668

由 Yehuda Sadeh 提交于 6月 17, 2010

Caps related accounting is now being done per mds client instead
of just being global. This prepares ground work for a later revision
of the caps preallocated reservation list.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

37151668

S
ceph: drop unused argument · ee6b272b
由 Sage Weil 提交于 6月 10, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
ee6b272b

17 7月, 2010 1 次提交

ceph: do not include cap/dentry releases in replayed messages · e979cf50

由 Sage Weil 提交于 7月 15, 2010

Strip the cap and dentry releases from replayed messages.  They can
cause the shared state to get out of sync because they were generated
(with the request message) earlier, and no longer reflect the current
client state.
Signed-off-by: NSage Weil <sage@newdream.net>

e979cf50

11 6月, 2010 2 次提交

ceph: try to send partial cap release on cap message on missing inode · 2b2300d6

由 Sage Weil 提交于 6月 09, 2010

If we have enough memory to allocate a new cap release message, do so, so
that we can send a partial release message immediately.  This keeps us from
making the MDS wait when the cap release it needs is in a partially full
release message.

If we fail because of ENOMEM, oh well, they'll just have to wait a bit
longer.
Signed-off-by: NSage Weil <sage@newdream.net>

2b2300d6

ceph: release cap on import if we don't have the inode · 3d7ded4d

由 Sage Weil 提交于 6月 09, 2010

If we get an IMPORT that give us a cap, but we don't have the inode, queue
a release (and try to send it immediately) so that the MDS doesn't get
stuck waiting for us.
Signed-off-by: NSage Weil <sage@newdream.net>

3d7ded4d

18 5月, 2010 3 次提交

ceph: use common helper for aborted dir request invalidation · 167c9e35

由 Sage Weil 提交于 5月 14, 2010

We invalidate I_COMPLETE and dentry leases in two places: on aborted mds
request and on request replay. Use common helper to avoid duplicate code.
Signed-off-by: NSage Weil <sage@newdream.net>

167c9e35

ceph: fix race between aborted requests and fill_trace · b4556396

由 Sage Weil 提交于 5月 13, 2010

When we abort requests we need to prevent fill_trace et al from doing
anything that relies on locks held by the VFS caller. This fixes a race
between the reply handler and the abort code, ensuring that continue
holding the dir mutex until the reply handler completes.
Signed-off-by: NSage Weil <sage@newdream.net>

b4556396

ceph: clean up mds reply, error handling · e1518c7c

由 Sage Weil 提交于 5月 13, 2010

We would occasionally BUG out in the reply handler because r_reply was
nonzero, due to a race with ceph_mdsc_do_request temporarily setting
r_reply to an ERR_PTR value.  This is unnecessary, messy, and also wrong
in the EIO case.

Clean up by consistently using r_err for errors and r_reply for messages.
Also fix the abort logic to trigger consistently for all errors that return
to the caller early (e.g., EIO from timeout case).  If an abort races with
a reply, use the result from the reply.

Also fix locking for r_err, r_reply update in the reply handler.
Signed-off-by: NSage Weil <sage@newdream.net>

e1518c7c

18 2月, 2010 1 次提交

ceph: fix iterate_caps removal race · 7c1332b8

由 Sage Weil 提交于 2月 16, 2010

We need to be able to iterate over all caps on a session with a
possibly slow callback on each cap.  To allow this, we used to
prevent cap reordering while we were iterating.  However, we were
not safe from races with removal: removing the 'next' cap would
make the next pointer from list_for_each_entry_safe be invalid,
and cause a lock up or similar badness.

Instead, we keep an iterator pointer in the session pointing to
the current cap.  As before, we avoid reordering.  For removal,
if the cap isn't the current cap we are iterating over, we are
fine.  If it is, we clear cap->ci (to mark the cap as pending
removal) but leave it in the session list.  In iterate_caps, we
can safely finish removal and get the next cap pointer.

While we're at it, clean up put_cap to not take a cap reservation
context, as it was never used.
Signed-off-by: NSage Weil <sage@newdream.net>

7c1332b8

17 2月, 2010 2 次提交

ceph: use rbtree for snap_realms · a105f00c

由 Sage Weil 提交于 2月 15, 2010

Switch from radix tree to rbtree for snap realms.  This is much more
appropriate given that realm keys are few and far between.
Signed-off-by: NSage Weil <sage@newdream.net>

a105f00c

ceph: use rbtree for mds requests · 44ca18f2

由 Sage Weil 提交于 2月 15, 2010

The rbtree is a more appropriate data structure than a radix_tree.  It
avoids extra memory usage and simplifies the code.

It also fixes a bug where the debugfs 'mdsc' file wasn't including the
most recent mds request.
Signed-off-by: NSage Weil <sage@newdream.net>

44ca18f2

26 1月, 2010 1 次提交

ceph: properly handle aborted mds requests · 5b1daecd

由 Sage Weil 提交于 1月 25, 2010

Previously, if the MDS request was interrupted, we would unregister the
request and ignore any reply. This could cause the caps or other cache
state to become out of sync. (For instance, aborting dbench and doing
rm -r on clients would complain about a non-empty directory because the
client didn't realize it's aborted file create request completed.)

Even we don't unregister, we still can't process the reply normally because
we are no longer holding the caller's locks (like the dir i_mutex).

So, mark aborted operations with r_aborted, and in the reply handler, be
sure to process all the caps. Do not process the namespace changes,
though, since we no longer will hold the dir i_mutex. The dentry lease
state can also be ignored as it's more forgiving.
Signed-off-by: NSage Weil <sage@newdream.net>

5b1daecd

24 12月, 2009 1 次提交

ceph: do not touch_caps while iterating over caps list · 5dacf091

由 Sage Weil 提交于 12月 21, 2009

Avoid confusing iterate_session_caps(), flag the session while we are
iterating so that __touch_cap does not rearrange items on the list.

All other modifiers of session->s_caps do so under the protection of
s_mutex.
Signed-off-by: NSage Weil <sage@newdream.net>

5dacf091

08 12月, 2009 1 次提交
- S
  ceph: use kref for struct ceph_mds_request · 153c8e6b
  由 Sage Weil 提交于 12月 07, 2009
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  153c8e6b
19 11月, 2009 2 次提交

ceph: negotiate authentication protocol; implement AUTH_NONE protocol · 4e7a5dcd

由 Sage Weil 提交于 11月 18, 2009

When we open a monitor session, we send an initial AUTH message listing
the auth protocols we support, our entity name, and (possibly) a previously
assigned global_id.  The monitor chooses a protocol and responds with an
initial message.

Initially implement AUTH_NONE, a dummy protocol that provides no security,
but works within the new framework.  It generates 'authorizers' that are
used when connecting to (mds, osd) services that simply state our entity
name and global_id.

This is a wire protocol change.
Signed-off-by: NSage Weil <sage@newdream.net>

4e7a5dcd

ceph: handle errors during osd client init · 5f44f142

由 Sage Weil 提交于 11月 18, 2009

Unwind initializing if we get ENOMEM during client initialization.
Signed-off-by: NSage Weil <sage@newdream.net>

5f44f142

13 11月, 2009 1 次提交
- S
  ceph: build cleanly without CONFIG_DEBUG_FS · 039934b8
  由 Sage Weil 提交于 11月 12, 2009
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  039934b8
11 11月, 2009 1 次提交

ceph: remove recon_gen logic · cdac8303

由 Sage Weil 提交于 11月 10, 2009

We don't get an explicit affirmative confirmation that our caps reconnect,
nor do we necessarily want to pay that cost.  So, take all this code out
for now.
Signed-off-by: NSage Weil <sage@newdream.net>

cdac8303

10 11月, 2009 1 次提交

ceph: do not confuse stale and dead (unreconnected) caps · 685f9a5d

由 Sage Weil 提交于 11月 09, 2009

We were using the cap_gen to track both stale caps (caps that timed out
due to temporarily losing touch with the mds) and dead caps that did not
reconnect after an MDS failure.  Introduce a recon_gen counter to track
reconnections to restarted MDSs and kill dead caps based on that instead.

Rename gen to cap_gen while we're at it to make it more clear which is
which.
Signed-off-by: NSage Weil <sage@newdream.net>

685f9a5d

07 10月, 2009 1 次提交

ceph: MDS client · 2f2dc053

由 Sage Weil 提交于 10月 06, 2009

The MDS (metadata server) client is responsible for submitting
requests to the MDS cluster and parsing the response.  We decide which
MDS to submit each request to based on cached information about the
current partition of the directory hierarchy across the cluster.  A
stateful session is opened with each MDS before we submit requests to
it, and a mutex is used to control the ordering of messages within
each session.

An MDS request may generate two responses.  The first indicates the
operation was a success and returns any result.  A second reply is
sent when the operation commits to disk.  Note that locking on the MDS
ensures that the results of updates are visible only to the updating
client before the operation commits.  Requests are linked to the
containing directory so that an fsync will wait for them to commit.

If an MDS fails and/or recovers, we resubmit requests as needed.  We
also reconnect existing capabilities to a recovering MDS to
reestablish that shared session state.  Old dentry leases are
invalidated.
Signed-off-by: NSage Weil <sage@newdream.net>

2f2dc053

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功