提交 · 68b4476b0bc13fef18266b4140309a30e86739d2 · openanolis / cloud-kernel

21 10月, 2010 2 次提交

ceph: messenger and osdc changes for rbd · 68b4476b

由 Yehuda Sadeh 提交于 4月 06, 2010

Allow the messenger to send/receive data in a bio.  This is added
so that we wouldn't need to copy the data into pages or some other buffer
when doing IO for an rbd block device.

We can now have trailing variable sized data for osd
ops.  Also osd ops encoding is more modular.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

68b4476b

ceph: refactor osdc requests creation functions · 3499e8a5

由 Yehuda Sadeh 提交于 4月 06, 2010

The osd requests creation are being decoupled from the
vino parameter, allowing clients using the osd to use
other arbitrary object names that are not necessarily
vino based. Also, calc_raw_layout now takes a snap id.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

3499e8a5

07 10月, 2010 1 次提交

ceph: avoid null deref in osd request error path · 6bc18876

由 Sage Weil 提交于 9月 27, 2010

If we interrupt an osd request, we call __cancel_request, but it wasn't
verifying that req->r_osd was non-NULL before dereferencing it. This could
cause a crash if osds were flapping and we aborted a request on said osd.
Reported-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

6bc18876

23 8月, 2010 1 次提交

ceph: fix osd request lru adjustment when sending request · 07a27e22

由 Henry C Chang 提交于 8月 22, 2010

Fix argument order.  We want to move the item to the end of the list, not
change the position of the head.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

07a27e22

04 8月, 2010 1 次提交
- S
  ceph: whitespace cleanup · 213c99ee
  由 Sage Weil 提交于 8月 03, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  213c99ee
02 8月, 2010 1 次提交

ceph: only set num_pages in calc_layout · 796d6955

由 Sage Weil 提交于 6月 10, 2010

Setting it elsewhere is unnecessary and more fragile.
Signed-off-by: NSage Weil <sage@newdream.net>

796d6955

28 7月, 2010 1 次提交

ceph: use complete_all and wake_up_all · 03066f23

由 Yehuda Sadeh 提交于 7月 27, 2010

This fixes an issue triggered by running concurrent syncs. One of the syncs
would go through while the other would just hang indefinitely. In any case, we
never actually want to wake a single waiter, so the *_all functions should
be used.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

03066f23

14 6月, 2010 1 次提交

ceph: fix map handler error path · 4a32f93d

由 Sage Weil 提交于 6月 13, 2010

Don't leak message if we receive an unexpected message type.
Signed-off-by: NSage Weil <sage@newdream.net>

4a32f93d

30 5月, 2010 1 次提交

ceph: fix leak of osd authorizer · 79494d1b

由 Sage Weil 提交于 5月 27, 2010

Release the ceph_authorizer when releasing osd state.
Signed-off-by: NSage Weil <sage@newdream.net>

79494d1b

22 5月, 2010 1 次提交

ceph: Storage class should be before const qualifier · 9e32789f

由 Tobias Klauser 提交于 5月 20, 2010

The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NSage Weil <sage@newdream.net>

9e32789f

18 5月, 2010 8 次提交

ceph: all allocation functions should get gfp_mask · 34d23762

由 Yehuda Sadeh 提交于 4月 06, 2010

This is essential, as for the rados block device we'll need
to run in different contexts that would need flags that
are other than GFP_NOFS.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

34d23762

S
ceph: name msgpools; useful error messages · 4f48280e
由 Sage Weil 提交于 4月 24, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
4f48280e

ceph: osdtimeout=0 for now timeout · f26e681d

由 Sage Weil 提交于 4月 21, 2010

Allow the osd reset timeout to be disabled.
Signed-off-by: NSage Weil <sage@newdream.net>

f26e681d

ceph: wake up mount thread when getting osdmap · c473ad92

由 Yehuda Sadeh 提交于 4月 13, 2010

Now that the mount thread waits for the osdmap, it needs
to be awaken.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

c473ad92

ceph: simplify ceph_msg_new · bb257664

由 Sage Weil 提交于 4月 01, 2010

We only need to pass in front_len.  Callers can attach any other payload
pieces (middle, data) as they see fit.
Signed-off-by: NSage Weil <sage@newdream.net>

bb257664

ceph: make ceph_msg_new return NULL on failure; clean up, fix callers · a79832f2

由 Sage Weil 提交于 4月 01, 2010

Returning ERR_PTR(-ENOMEM) is useless extra work. Return NULL on failure
instead, and fix up the callers (about half of which were wrong anyway).
Signed-off-by: NSage Weil <sage@newdream.net>

a79832f2

ceph: fix theoretically possible double-put on connection · 6f46cb29

由 Sage Weil 提交于 3月 24, 2010

This would only trigger if we bailed out before resetting r_con_filling_msg
because the server reply was corrupt (oversized).
Signed-off-by: NSage Weil <sage@newdream.net>

6f46cb29

ceph: simplify page setup for incoming data · 21b667f6

由 Sage Weil 提交于 3月 04, 2010

Drop largely useless helper __prepare_pages(), and simplify sanity checks.
Signed-off-by: NSage Weil <sage@newdream.net>

21b667f6

12 5月, 2010 2 次提交

ceph: resubmit requests on pg mapping change (not just primary change) · d85b7056

由 Sage Weil 提交于 5月 10, 2010

OSD requests need to be resubmitted on any pg mapping change, not just when
the pg primary changes. Resending only when the primary changes results in
occasional 'hung' requests during osd cluster recovery or rebalancing.
Signed-off-by: NSage Weil <sage@newdream.net>

d85b7056

ceph: unregister osd request on failure · 0ceed5db

由 Sage Weil 提交于 5月 11, 2010

The osd request wasn't being unregistered when the osd returned a failure
code, even though the result was returned to the caller. This would cause
it to eventually time out, and then crash the kernel when it tried to
resend the request using a stale page vector.
Signed-off-by: NSage Weil <sage@newdream.net>

0ceed5db

23 3月, 2010 3 次提交

ceph: avoid reopening osd connections when address hasn't changed · 87b315a5

由 Sage Weil 提交于 3月 22, 2010

We get a fault callback on _every_ tcp connection fault.  Normally, we
want to reopen the connection when that happens.  If the address we have
is bad, however, and connection attempts always result in a connection
refused or similar error, explicitly closing and reopening the msgr
connection just prevents the messenger's backoff logic from kicking in.
The result can be a console full of

[ 3974.417106] ceph: osd11 10.3.14.138:6800 connection failed
[ 3974.423295] ceph: osd11 10.3.14.138:6800 connection failed
[ 3974.429709] ceph: osd11 10.3.14.138:6800 connection failed

Instead, if we get a fault, and have outstanding requests, but the osd
address hasn't changed and the connection never successfully connected in
the first place, do nothing to the osd connection.  The messenger layer
will back off and retry periodically, because we never connected and thus
the lossy bit is not set.

Instead, touch each request's r_stamp so that handle_timeout can tell the
request is still alive and kicking.
Signed-off-by: NSage Weil <sage@newdream.net>

87b315a5

ceph: rename r_sent_stamp r_stamp · 3dd72fc0

由 Sage Weil 提交于 3月 22, 2010

Make variable name slightly more generic, since it will (soon)
reflect either the time the request was sent OR the time it was
last determined to be still retrying.
Signed-off-by: NSage Weil <sage@newdream.net>

3dd72fc0

ceph: fix null pointer deref of r_osd in debug output · 12eadc19

由 Sage Weil 提交于 3月 15, 2010

This causes an oops when debug output is enabled and we kick
an osd request with no current r_osd (sometime after an osd
failure).  Check the pointer before dereferencing.
Signed-off-by: NSage Weil <sage@newdream.net>

12eadc19

05 3月, 2010 1 次提交

ceph: reset osd after relevant messages timed out · 422d2cb8

由 Yehuda Sadeh 提交于 2月 26, 2010

This simplifies the process of timing out messages. We
keep lru of current messages that are in flight. If a
timeout has passed, we reset the osd connection, so that
messages will be retransmitted.  This is a failsafe in case
we hit some sort of problem sending out message to the OSD.
Normally, we'll get notification via an updated osdmap if
there are problems.

If a request is older than the keepalive timeout, send a
keepalive to ensure we detect any breaks in the TCP connection.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

422d2cb8

02 3月, 2010 2 次提交

ceph: set osd request message front length correctly · 6f863e71

由 Sage Weil 提交于 3月 01, 2010

We didn't set the front length correctly.  When messages used
the message pool we ended up with the conservative max (4 KB), and
the rest of the time the slightly less conservative estimate.  Even
though the OSD ignores the extra data, set it to the right value to avoid
sending extra data over the network.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

6f863e71

ceph: use single osd op reply msg · c16e7869

由 Sage Weil 提交于 3月 01, 2010

Use a single ceph_msg for the osd reply, even when we are getting multiple
replies.
Signed-off-by: NSage Weil <sage@newdream.net>

c16e7869

27 2月, 2010 1 次提交

ceph: remove fragile __map_osds optimization · c99eb1c7

由 Sage Weil 提交于 2月 26, 2010

We used to try to avoid freeing and then reallocating the osd
struct.  This is a bit fragile due to potential interactions with
other references (beyond o_requests), and may be the cause of
this crash:

[120633.442358] BUG: unable to handle kernel NULL pointer dereference at (null)
[120633.443292] IP: [<ffffffff812549b6>] rb_erase+0x11d/0x277
[120633.443292] PGD f7ff3067 PUD f7f53067 PMD 0
[120633.443292] Oops: 0000 [#1] PREEMPT SMP
[120633.443292] last sysfs file: /sys/kernel/uevent_seqnum
[120633.443292] CPU 1
[120633.443292] Modules linked in: ceph fan ac battery psmouse ehci_hcd ide_pci_generic ohci_hcd thermal processor button
[120633.443292] Pid: 3023, comm: ceph-msgr/1 Not tainted 2.6.32-rc2 #12 H8SSL
[120633.443292] RIP: 0010:[<ffffffff812549b6>]  [<ffffffff812549b6>] rb_erase+0x11d/0x277
[120633.443292] RSP: 0018:ffff8800f7b13a50  EFLAGS: 00010246
[120633.443292] RAX: ffff880022907819 RBX: ffff880022907818 RCX: 0000000000000000
[120633.443292] RDX: ffff8800f7b13a80 RSI: ffff8800f587eb48 RDI: 0000000000000000
[120633.443292] RBP: ffff8800f7b13a60 R08: 0000000000000000 R09: 0000000000000004
[120633.443292] R10: 0000000000000000 R11: ffff8800c4441000 R12: ffff8800f587eb48
[120633.443292] R13: ffff8800f58eaa00 R14: ffff8800f413c000 R15: 0000000000000001
[120633.443292] FS:  00007fbef6e226e0(0000) GS:ffff880009200000(0000) knlGS:0000000000000000
[120633.443292] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[120633.443292] CR2: 0000000000000000 CR3: 00000000f7c53000 CR4: 00000000000006e0
[120633.443292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[120633.443292] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[120633.443292] Process ceph-msgr/1 (pid: 3023, threadinfo ffff8800f7b12000, task ffff8800f5858b40)
[120633.443292] Stack:
[120633.443292]  ffff8800f413c000 ffff8800f587e9c0 ffff8800f7b13a80 ffffffffa0098a86
[120633.443292] <0> 00000000000006f1 0000000000000000 ffff8800f7b13af0 ffffffffa009959b
[120633.443292] <0> ffff8800f413c000 ffff880022a68400 ffff880022a68400 ffff8800f587e9c0
[120633.443292] Call Trace:
[120633.443292]  [<ffffffffa0098a86>] __remove_osd+0x4d/0xbc [ceph]
[120633.443292]  [<ffffffffa009959b>] __map_osds+0x199/0x4fa [ceph]
[120633.443292]  [<ffffffffa00999f4>] ? __send_request+0xf8/0x186 [ceph]
[120633.443292]  [<ffffffffa0099beb>] kick_requests+0x169/0x3cb [ceph]
[120633.443292]  [<ffffffffa009a8c1>] ceph_osdc_handle_map+0x370/0x522 [ceph]

Since we're probably screwed anyway if a small kmalloc is
failing, don't bother with trying to be clever here.
Signed-off-by: NSage Weil <sage@newdream.net>

c99eb1c7

24 2月, 2010 1 次提交

ceph: fix up unexpected message handling · 5b3a4db3

由 Sage Weil 提交于 2月 19, 2010

Fix skipping of unexpected message types from osd, mon.

Clean up pr_info and debug output.
Signed-off-by: NSage Weil <sage@newdream.net>

5b3a4db3

16 2月, 2010 1 次提交

ceph: reset osd connections after fault · 153a008b

由 Sage Weil 提交于 2月 15, 2010

A single osd connection fault (e.g. tcp disconnect) wasn't
reopening the connection, which causes all current and future
requests for that osd to hang.
Signed-off-by: NSage Weil <sage@newdream.net>

153a008b

12 2月, 2010 1 次提交

ceph: put unused osd connections on lru · f5a2041b

由 Yehuda Sadeh 提交于 2月 03, 2010

Instead of removing osd connection immediately when the
requests list is empty, put the osd connection on an lru.
Only if that osd has not been used for more than a specified
time, will it be removed.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

f5a2041b

11 2月, 2010 1 次提交

ceph: allow renewal of auth credentials · 9bd2e6f8

由 Sage Weil 提交于 2月 02, 2010

Add infrastructure to allow the mon_client to periodically renew its auth
credentials.  Also add a messenger callback that will force such a renewal
if a peer rejects our authenticator.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

9bd2e6f8

03 2月, 2010 1 次提交

ceph: always send truncation info with read and write osd ops · 0c948992

由 Yehuda Sadeh 提交于 2月 01, 2010

This fixes a bug where the read/write ops arrive the osd after
a following truncation request.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

0c948992

26 1月, 2010 3 次提交

ceph: keep reserved replies on the request structure · 0d59ab81

由 Yehuda Sadeh 提交于 1月 13, 2010

This includes treating all the data preallocation and revokation
at the same place, not having to have a special case for
the reserved pages.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

0d59ab81

ceph: alloc message data pages and check if tid exists · 0547a9b3

由 Yehuda Sadeh 提交于 1月 11, 2010

Now doing it in the same callback that is also responsible for
allocating the 'front' part of the message. If we get a message
that we haven't got a corresponding tid for, mark it for skipping.

Moving the mutex unlock/lock from the osd alloc_msg callback
to the calling function in the messenger.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

0547a9b3

ceph: allocate middle of message before stating to read · 2450418c

由 Yehuda Sadeh 提交于 1月 08, 2010

Both front and middle parts of the message are now being
allocated at the ceph_alloc_msg().
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

2450418c

15 1月, 2010 2 次提交

S
ceph: display pgid in debugfs osd request dump · 7740a42f
由 Sage Weil 提交于 1月 08, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
7740a42f

ceph: remove unused erank field · 103e2d3a

由 Sage Weil 提交于 1月 07, 2010

The ceph_entity_addr erank field is obsolete; remove it.  Get rid of
trivial addr comparison helpers while we're at it.
Signed-off-by: NSage Weil <sage@newdream.net>

103e2d3a

24 12月, 2009 3 次提交

ceph: include transaction id in ceph_msg_header (protocol change) · 6df058c0

由 Sage Weil 提交于 12月 22, 2009

Many (most?) message types include a transaction id.  By including it in
the fixed size header, we always have it available even when we are unable
to allocate memory for the (larger, variable sized) message body.  This
will allow us to error out the appropriate request instead of (silently)
dropping the reply.
Signed-off-by: NSage Weil <sage@newdream.net>

6df058c0

ceph: control access to page vector for incoming data · 350b1c32

由 Sage Weil 提交于 12月 22, 2009

When we issue an OSD read, we specify a vector of pages that the data is to
be read into. The request may be sent multiple times, to multiple OSDs, if
the osdmap changes, which means we can get more than one reply.

Only read data into the page vector if the reply is coming from the
OSD we last sent the request to. Keep track of which connection is using
the vector by taking a reference. If another connection was already
using the vector before and a new reply comes in on the right connection,
revoke the pages from the other connection.
Signed-off-by: NSage Weil <sage@newdream.net>

350b1c32

ceph: unregister canceled/timed out osd requests · 529cfcc4

由 Sage Weil 提交于 12月 22, 2009

Canceled or timed out osd requests were getting left in the request list
and never deallocated (until umount).  Unregister if they are canceled
(control-c) or time out.
Signed-off-by: NSage Weil <sage@newdream.net>

529cfcc4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功