提交 · cdf5f61ed1d64d50eb9cf10071ab40836f5f9f91 · openanolis / cloud-kernel

12 5月, 2010 2 次提交

ceph: resubmit requests on pg mapping change (not just primary change) · d85b7056

由 Sage Weil 提交于 5月 10, 2010

OSD requests need to be resubmitted on any pg mapping change, not just when
the pg primary changes. Resending only when the primary changes results in
occasional 'hung' requests during osd cluster recovery or rebalancing.
Signed-off-by: NSage Weil <sage@newdream.net>

d85b7056

ceph: unregister osd request on failure · 0ceed5db

由 Sage Weil 提交于 5月 11, 2010

The osd request wasn't being unregistered when the osd returned a failure
code, even though the result was returned to the caller. This would cause
it to eventually time out, and then crash the kernel when it tried to
resend the request using a stale page vector.
Signed-off-by: NSage Weil <sage@newdream.net>

0ceed5db

23 3月, 2010 3 次提交

ceph: avoid reopening osd connections when address hasn't changed · 87b315a5

由 Sage Weil 提交于 3月 22, 2010

We get a fault callback on _every_ tcp connection fault.  Normally, we
want to reopen the connection when that happens.  If the address we have
is bad, however, and connection attempts always result in a connection
refused or similar error, explicitly closing and reopening the msgr
connection just prevents the messenger's backoff logic from kicking in.
The result can be a console full of

[ 3974.417106] ceph: osd11 10.3.14.138:6800 connection failed
[ 3974.423295] ceph: osd11 10.3.14.138:6800 connection failed
[ 3974.429709] ceph: osd11 10.3.14.138:6800 connection failed

Instead, if we get a fault, and have outstanding requests, but the osd
address hasn't changed and the connection never successfully connected in
the first place, do nothing to the osd connection.  The messenger layer
will back off and retry periodically, because we never connected and thus
the lossy bit is not set.

Instead, touch each request's r_stamp so that handle_timeout can tell the
request is still alive and kicking.
Signed-off-by: NSage Weil <sage@newdream.net>

87b315a5

ceph: rename r_sent_stamp r_stamp · 3dd72fc0

由 Sage Weil 提交于 3月 22, 2010

Make variable name slightly more generic, since it will (soon)
reflect either the time the request was sent OR the time it was
last determined to be still retrying.
Signed-off-by: NSage Weil <sage@newdream.net>

3dd72fc0

ceph: fix null pointer deref of r_osd in debug output · 12eadc19

由 Sage Weil 提交于 3月 15, 2010

This causes an oops when debug output is enabled and we kick
an osd request with no current r_osd (sometime after an osd
failure).  Check the pointer before dereferencing.
Signed-off-by: NSage Weil <sage@newdream.net>

12eadc19

05 3月, 2010 1 次提交

ceph: reset osd after relevant messages timed out · 422d2cb8

由 Yehuda Sadeh 提交于 2月 26, 2010

This simplifies the process of timing out messages. We
keep lru of current messages that are in flight. If a
timeout has passed, we reset the osd connection, so that
messages will be retransmitted.  This is a failsafe in case
we hit some sort of problem sending out message to the OSD.
Normally, we'll get notification via an updated osdmap if
there are problems.

If a request is older than the keepalive timeout, send a
keepalive to ensure we detect any breaks in the TCP connection.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

422d2cb8

02 3月, 2010 2 次提交

ceph: set osd request message front length correctly · 6f863e71

由 Sage Weil 提交于 3月 01, 2010

We didn't set the front length correctly.  When messages used
the message pool we ended up with the conservative max (4 KB), and
the rest of the time the slightly less conservative estimate.  Even
though the OSD ignores the extra data, set it to the right value to avoid
sending extra data over the network.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

6f863e71

ceph: use single osd op reply msg · c16e7869

由 Sage Weil 提交于 3月 01, 2010

Use a single ceph_msg for the osd reply, even when we are getting multiple
replies.
Signed-off-by: NSage Weil <sage@newdream.net>

c16e7869

27 2月, 2010 1 次提交

ceph: remove fragile __map_osds optimization · c99eb1c7

由 Sage Weil 提交于 2月 26, 2010

We used to try to avoid freeing and then reallocating the osd
struct.  This is a bit fragile due to potential interactions with
other references (beyond o_requests), and may be the cause of
this crash:

[120633.442358] BUG: unable to handle kernel NULL pointer dereference at (null)
[120633.443292] IP: [<ffffffff812549b6>] rb_erase+0x11d/0x277
[120633.443292] PGD f7ff3067 PUD f7f53067 PMD 0
[120633.443292] Oops: 0000 [#1] PREEMPT SMP
[120633.443292] last sysfs file: /sys/kernel/uevent_seqnum
[120633.443292] CPU 1
[120633.443292] Modules linked in: ceph fan ac battery psmouse ehci_hcd ide_pci_generic ohci_hcd thermal processor button
[120633.443292] Pid: 3023, comm: ceph-msgr/1 Not tainted 2.6.32-rc2 #12 H8SSL
[120633.443292] RIP: 0010:[<ffffffff812549b6>]  [<ffffffff812549b6>] rb_erase+0x11d/0x277
[120633.443292] RSP: 0018:ffff8800f7b13a50  EFLAGS: 00010246
[120633.443292] RAX: ffff880022907819 RBX: ffff880022907818 RCX: 0000000000000000
[120633.443292] RDX: ffff8800f7b13a80 RSI: ffff8800f587eb48 RDI: 0000000000000000
[120633.443292] RBP: ffff8800f7b13a60 R08: 0000000000000000 R09: 0000000000000004
[120633.443292] R10: 0000000000000000 R11: ffff8800c4441000 R12: ffff8800f587eb48
[120633.443292] R13: ffff8800f58eaa00 R14: ffff8800f413c000 R15: 0000000000000001
[120633.443292] FS:  00007fbef6e226e0(0000) GS:ffff880009200000(0000) knlGS:0000000000000000
[120633.443292] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[120633.443292] CR2: 0000000000000000 CR3: 00000000f7c53000 CR4: 00000000000006e0
[120633.443292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[120633.443292] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[120633.443292] Process ceph-msgr/1 (pid: 3023, threadinfo ffff8800f7b12000, task ffff8800f5858b40)
[120633.443292] Stack:
[120633.443292]  ffff8800f413c000 ffff8800f587e9c0 ffff8800f7b13a80 ffffffffa0098a86
[120633.443292] <0> 00000000000006f1 0000000000000000 ffff8800f7b13af0 ffffffffa009959b
[120633.443292] <0> ffff8800f413c000 ffff880022a68400 ffff880022a68400 ffff8800f587e9c0
[120633.443292] Call Trace:
[120633.443292]  [<ffffffffa0098a86>] __remove_osd+0x4d/0xbc [ceph]
[120633.443292]  [<ffffffffa009959b>] __map_osds+0x199/0x4fa [ceph]
[120633.443292]  [<ffffffffa00999f4>] ? __send_request+0xf8/0x186 [ceph]
[120633.443292]  [<ffffffffa0099beb>] kick_requests+0x169/0x3cb [ceph]
[120633.443292]  [<ffffffffa009a8c1>] ceph_osdc_handle_map+0x370/0x522 [ceph]

Since we're probably screwed anyway if a small kmalloc is
failing, don't bother with trying to be clever here.
Signed-off-by: NSage Weil <sage@newdream.net>

c99eb1c7

24 2月, 2010 1 次提交

ceph: fix up unexpected message handling · 5b3a4db3

由 Sage Weil 提交于 2月 19, 2010

Fix skipping of unexpected message types from osd, mon.

Clean up pr_info and debug output.
Signed-off-by: NSage Weil <sage@newdream.net>

5b3a4db3

16 2月, 2010 1 次提交

ceph: reset osd connections after fault · 153a008b

由 Sage Weil 提交于 2月 15, 2010

A single osd connection fault (e.g. tcp disconnect) wasn't
reopening the connection, which causes all current and future
requests for that osd to hang.
Signed-off-by: NSage Weil <sage@newdream.net>

153a008b

12 2月, 2010 1 次提交

ceph: put unused osd connections on lru · f5a2041b

由 Yehuda Sadeh 提交于 2月 03, 2010

Instead of removing osd connection immediately when the
requests list is empty, put the osd connection on an lru.
Only if that osd has not been used for more than a specified
time, will it be removed.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

f5a2041b

11 2月, 2010 1 次提交

ceph: allow renewal of auth credentials · 9bd2e6f8

由 Sage Weil 提交于 2月 02, 2010

Add infrastructure to allow the mon_client to periodically renew its auth
credentials.  Also add a messenger callback that will force such a renewal
if a peer rejects our authenticator.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

9bd2e6f8

03 2月, 2010 1 次提交

ceph: always send truncation info with read and write osd ops · 0c948992

由 Yehuda Sadeh 提交于 2月 01, 2010

This fixes a bug where the read/write ops arrive the osd after
a following truncation request.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

0c948992

26 1月, 2010 3 次提交

ceph: keep reserved replies on the request structure · 0d59ab81

由 Yehuda Sadeh 提交于 1月 13, 2010

This includes treating all the data preallocation and revokation
at the same place, not having to have a special case for
the reserved pages.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

0d59ab81

ceph: alloc message data pages and check if tid exists · 0547a9b3

由 Yehuda Sadeh 提交于 1月 11, 2010

Now doing it in the same callback that is also responsible for
allocating the 'front' part of the message. If we get a message
that we haven't got a corresponding tid for, mark it for skipping.

Moving the mutex unlock/lock from the osd alloc_msg callback
to the calling function in the messenger.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

0547a9b3

ceph: allocate middle of message before stating to read · 2450418c

由 Yehuda Sadeh 提交于 1月 08, 2010

Both front and middle parts of the message are now being
allocated at the ceph_alloc_msg().
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

2450418c

15 1月, 2010 2 次提交

S
ceph: display pgid in debugfs osd request dump · 7740a42f
由 Sage Weil 提交于 1月 08, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
7740a42f

ceph: remove unused erank field · 103e2d3a

由 Sage Weil 提交于 1月 07, 2010

The ceph_entity_addr erank field is obsolete; remove it.  Get rid of
trivial addr comparison helpers while we're at it.
Signed-off-by: NSage Weil <sage@newdream.net>

103e2d3a

24 12月, 2009 3 次提交

ceph: include transaction id in ceph_msg_header (protocol change) · 6df058c0

由 Sage Weil 提交于 12月 22, 2009

Many (most?) message types include a transaction id.  By including it in
the fixed size header, we always have it available even when we are unable
to allocate memory for the (larger, variable sized) message body.  This
will allow us to error out the appropriate request instead of (silently)
dropping the reply.
Signed-off-by: NSage Weil <sage@newdream.net>

6df058c0

ceph: control access to page vector for incoming data · 350b1c32

由 Sage Weil 提交于 12月 22, 2009

When we issue an OSD read, we specify a vector of pages that the data is to
be read into. The request may be sent multiple times, to multiple OSDs, if
the osdmap changes, which means we can get more than one reply.

Only read data into the page vector if the reply is coming from the
OSD we last sent the request to. Keep track of which connection is using
the vector by taking a reference. If another connection was already
using the vector before and a new reply comes in on the right connection,
revoke the pages from the other connection.
Signed-off-by: NSage Weil <sage@newdream.net>

350b1c32

ceph: unregister canceled/timed out osd requests · 529cfcc4

由 Sage Weil 提交于 12月 22, 2009

Canceled or timed out osd requests were getting left in the request list
and never deallocated (until umount).  Unregister if they are canceled
(control-c) or time out.
Signed-off-by: NSage Weil <sage@newdream.net>

529cfcc4

22 12月, 2009 3 次提交
- S
  ceph: fix error paths for corrupt osdmap messages · 30dc6381
  由 Sage Weil 提交于 12月 21, 2009
```
Both osdmap_decode() and osdmap_apply_incremental() should never return
NULL.
Signed-off-by: NSage Weil <sage@newdream.net>
```
  30dc6381
- S
  ceph: hex dump corrupt server data to KERN_DEBUG · 9ec7cab1
  由 Sage Weil 提交于 12月 14, 2009
```
Also, print fsid using standard format, NOT hex dump.
Signed-off-by: NSage Weil <sage@newdream.net>
```
  9ec7cab1
- Y
  ceph: fix msgpool reservation leak · 93c20d98
  由 Yehuda Sadeh 提交于 12月 15, 2009
```
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
```
  93c20d98
08 12月, 2009 1 次提交
- S
  ceph: use kref for ceph_osd_request · 415e49a9
  由 Sage Weil 提交于 12月 07, 2009
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  415e49a9
04 12月, 2009 1 次提交
- S
  ceph: whitespace cleanup · 50b885b9
  由 Sage Weil 提交于 12月 01, 2009
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  50b885b9
02 12月, 2009 1 次提交

ceph: plug leak of request_mutex · 34b43a56

由 Sage Weil 提交于 12月 01, 2009

Fix leak of osd client request_mutex on receiving dup ack.
Signed-off-by: NSage Weil <sage@newdream.net>

34b43a56

22 11月, 2009 1 次提交

fs/ceph: Move a dereference below a NULL test · 32c895e7

由 Julia Lawall 提交于 11月 21, 2009

If the NULL test is necessary, then the dereference should be moved below
the NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/).

// <smpl>
@@
type T;
expression E;
identifier i,fld;
statement S;
@@

- T i = E->fld;
+ T i;
  ... when != E
      when != i
  if (E == NULL) S
+ i = E->fld;
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NSage Weil <sage@newdream.net>

32c895e7

21 11月, 2009 1 次提交

ceph: fix debugfs entry, simplify fsid checks · 0743304d

由 Sage Weil 提交于 11月 18, 2009

We may first learn our fsid from any of the mon, osd, or mds maps
(whichever the monitor sends first).  Consolidate checks in a single
helper.  Initialize the client debugfs entry then, since we need the
fsid (and global_id) for the directory name.

Also remove dead mount code.
Signed-off-by: NSage Weil <sage@newdream.net>

0743304d

19 11月, 2009 3 次提交

ceph: negotiate authentication protocol; implement AUTH_NONE protocol · 4e7a5dcd

由 Sage Weil 提交于 11月 18, 2009

When we open a monitor session, we send an initial AUTH message listing
the auth protocols we support, our entity name, and (possibly) a previously
assigned global_id.  The monitor chooses a protocol and responds with an
initial message.

Initially implement AUTH_NONE, a dummy protocol that provides no security,
but works within the new framework.  It generates 'authorizers' that are
used when connecting to (mds, osd) services that simply state our entity
name and global_id.

This is a wire protocol change.
Signed-off-by: NSage Weil <sage@newdream.net>

4e7a5dcd

ceph: handle errors during osd client init · 5f44f142

由 Sage Weil 提交于 11月 18, 2009

Unwind initializing if we get ENOMEM during client initialization.
Signed-off-by: NSage Weil <sage@newdream.net>

5f44f142

ceph: remove bad calls to ceph_con_shutdown · 42ce56e5

由 Sage Weil 提交于 11月 18, 2009

We want to ceph_con_close when we're done with the connection, before
the ref count reaches 0.  Once it does, do not call ceph_con_shutdown,
as that takes the con mutex and may sleep, and besides that is
unnecessary.
Signed-off-by: NSage Weil <sage@newdream.net>

42ce56e5

05 11月, 2009 1 次提交

ceph: fix endian conversions for ceph_pg · 51042122

由 Sage Weil 提交于 11月 04, 2009

The endian conversions don't quite work with the old union ceph_pg.  Just
make it a regular struct, and make each field __le.  This is simpler and it
has the added bonus of actually working.
Signed-off-by: NSage Weil <sage@newdream.net>

51042122

28 10月, 2009 1 次提交

ceph: allocate and parse mount args before client instance · 6b805185

由 Sage Weil 提交于 10月 27, 2009

This simplifies much of the error handling during mount.  It also means
that we have the mount args before client creation, and we can initialize
based on those options.
Signed-off-by: NSage Weil <sage@newdream.net>

6b805185

16 10月, 2009 1 次提交

ceph: warn on allocation from msgpool with larger front_len · 8f3bc053

由 Sage Weil 提交于 10月 14, 2009

Pass the front_len we need when pulling a message off a msgpool,
and WARN if it is greater than the pool's size.  Then try to
allocate a new message (to continue without failing).
Signed-off-by: NSage Weil <sage@newdream.net>

8f3bc053

15 10月, 2009 1 次提交

ceph: convert encode/decode macros to inlines · c89136ea

由 Sage Weil 提交于 10月 14, 2009

This avoids the fugly pass by reference and makes the code a bit easier
to read.
Signed-off-by: NSage Weil <sage@newdream.net>

c89136ea

10 10月, 2009 3 次提交

ceph: cancel osd requests before resending them · 266673db

由 Sage Weil 提交于 10月 09, 2009

This ensures we don't submit the same request twice if we are kicking a
specific osd (as with an osd_reset), or when we hit a transient error and
resend.
Signed-off-by: NSage Weil <sage@newdream.net>

266673db

ceph: reset osd session on fault, not peer_reset · 81b024e7

由 Sage Weil 提交于 10月 09, 2009

The peer_reset just takes longer (until we reconnect and discover the osd
dropped the session... which it will).
Signed-off-by: NSage Weil <sage@newdream.net>

81b024e7

ceph: revoke osd request message on request completion · 0ba6478d

由 Sage Weil 提交于 10月 08, 2009

If an osd has failed or returned and a request has been sent twice, it's
possible to get a reply and unregister the request while the request
message is queued for delivery. Since the message references the caller's
page vector, we need to revoke it before completing.
Signed-off-by: NSage Weil <sage@newdream.net>

0ba6478d

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功