提交 · 17c1cc1d9293a568a00545469078e29555cc7f39 · openeuler / Kernel

02 5月, 2013 2 次提交

libceph: define ceph_decode_pgid() only once · ef4859d6

由 Alex Elder 提交于 4月 01, 2013

There are two basically identical definitions of __decode_pgid()
in libceph, one in "net/ceph/osdmap.c" and the other in
"net/ceph/osd_client.c".  Get rid of both, and instead define
a single inline version in "include/linux/ceph/osdmap.h".
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

ef4859d6

libceph: rename ceph_calc_object_layout() · 41766f87

由 Alex Elder 提交于 3月 01, 2013

The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.

Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.

Change the function so it takes a pool number rather than the full
file layout structure.  Only update the ceph_pg if the pool is found
in the osd map.  Get rid of few useless lines of code from the
function while there.

Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

41766f87

12 3月, 2013 1 次提交

libceph: fix decoding of pgids · d6c0dd6b

由 Sage Weil 提交于 3月 06, 2013

In 4f6a7e5e we effectively dropped support
for the legacy encoding for the OSDMap and incremental.  However, we didn't
fix the decoding for the pgid.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

d6c0dd6b

27 2月, 2013 5 次提交

libceph: add support for HASHPSPOOL pool flag · 83ca14fd

由 Sage Weil 提交于 2月 26, 2013

The legacy behavior adds the pgid seed and pool together as the input for
CRUSH.  That is problematic because each pool's PGs end up mapping to the
same OSDs: 1.5 == 2.4 == 3.3 == ...

Instead, if the HASHPSPOOL flag is set, we has the ps and pool together and
feed that into CRUSH.  This ensures that two adjacent pools will map to
an independent pseudorandom set of OSDs.

Advertise our support for this via a protocol feature flag.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

83ca14fd

libceph: calculate placement based on the internal data types · 2169aea6

由 Sage Weil 提交于 2月 25, 2013

Instead of using the old ceph_object_layout struct, update our internal
ceph_calc_object_layout method to use the ceph_pg type.  This allows us to
pass the full 32-bit precision of the pgid.seed to the callers.  It also
allows some callers to avoid reaching into the request structures for the
struct ceph_object_layout fields.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

2169aea6

ceph: update support for PGID64, PGPOOL3, OSDENC protocol features · 4f6a7e5e

由 Sage Weil 提交于 2月 23, 2013

Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features.
These have been present in ceph.git since v0.42, Feb 2012.  Require these
features to simplify support; nobody is running older userspace.

Note that the new request and reply encoding is still not in place, so the new
code is not yet functional.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

4f6a7e5e

libceph: decode into cpu-native ceph_pg type · 5b191d99

由 Sage Weil 提交于 2月 23, 2013

Always decode data into our cpu-native ceph_pg type that has the correct
field widths.  Limit any remaining uses of ceph_pg_v1 to dealing with the
legacy protocol.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

5b191d99

libceph: rename ceph_pg -> ceph_pg_v1 · 12979354

由 Sage Weil 提交于 1月 08, 2013

Rename the old version this type to distinguish it from the new version.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

12979354

26 1月, 2013 1 次提交

libceph: fix undefined behavior when using snprintf() · 1ec3911d

由 Cong Ding 提交于 1月 25, 2013

The variable "str" is used as both the source and destination in
function snprintf(), which is undefined behavior based on C11. The
original description in C11 is:
	"If copying takes place between objects that
	overlap, the behavior is undefined."

And, the function of ceph_osdmap_state_str() is to return the osdmap
state, so it should return "doesn't exist" when all the conditions
are not satisfied. I fix it in this patch.

[elder@inktank.com: shortened the commit message]
Signed-off-by: NCong Ding <dinggnu@gmail.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

1ec3911d

18 1月, 2013 2 次提交

libceph: pass length to ceph_calc_file_object_mapping() · e8afad65

由 Alex Elder 提交于 11月 14, 2012

ceph_calc_file_object_mapping() takes (among other things) a "file"
offset and length, and based on the layout, determines the object
number ("bno") backing the affected portion of the file's data and
the offset into that object where the desired range begins.  It also
computes the size that should be used for the request--either the
amount requested or something less if that would exceed the end of
the object.

This patch changes the input length parameter in this function so it
is used only for input.  That is, the argument will be passed by
value rather than by address, so the value provided won't get
updated by the function.

The value would only get updated if the length would surpass the
current object, and in that case the value it got updated to would
be exactly that returned in *oxlen.

Only one of the two callers is affected by this change.  Update
ceph_calc_raw_layout() so it records any updated value.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

e8afad65

libceph: for chooseleaf rules, retry CRUSH map descent from root if leaf is failed · 1604f488

由 Jim Schutt 提交于 11月 30, 2012

Add libceph support for a new CRUSH tunable recently added to Ceph servers.

Consider the CRUSH rule
step chooseleaf firstn 0 type <node_type>

This rule means that <n> replicas will be chosen in a manner such that
each chosen leaf's branch will contain a unique instance of <node_type>.

When an object is re-replicated after a leaf failure, if the CRUSH map uses
a chooseleaf rule the remapped replica ends up under the <node_type> bucket
that held the failed leaf. This causes uneven data distribution across the
storage cluster, to the point that when all the leaves but one fail under a
particular <node_type> bucket, that remaining leaf holds all the data from
its failed peers.

This behavior also limits the number of peers that can participate in the
re-replication of the data held by the failed leaf, which increases the
time required to re-replicate after a failure.

For a chooseleaf CRUSH rule, the tree descent has two steps: call them the
inner and outer descents.

If the tree descent down to <node_type> is the outer descent, and the descent
from <node_type> down to a leaf is the inner descent, the issue is that a
down leaf is detected on the inner descent, so only the inner descent is
retried.

In order to disperse re-replicated data as widely as possible across a
storage cluster after a failure, we want to retry the outer descent. So,
fix up crush_choose() to allow the inner descent to return immediately on
choosing a failed leaf. Wire this up as a new CRUSH tunable.

Note that after this change, for a chooseleaf rule, if the primary OSD
in a placement group has failed, choosing a replacement may result in
one of the other OSDs in the PG colliding with the new primary. This
requires that OSD's data for that PG to need moving as well. This
seems unavoidable but should be relatively rare.

This corresponds to ceph.git commit 88f218181a9e6d2292e2697fc93797d0f6d6e5dc.
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NSage Weil <sage@inktank.com>

1604f488

01 11月, 2012 1 次提交

libceph: define ceph_pg_pool_name_by_id() · 72afc71f

由 Alex Elder 提交于 10月 30, 2012

Define and export function ceph_pg_pool_name_by_id() to supply
the name of a pg pool whose id is given.  This will be used by
the next patch.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

72afc71f

30 10月, 2012 1 次提交

libceph: fix osdmap decode error paths · 0ed7285e

由 Sage Weil 提交于 10月 29, 2012

Ensure that we set the err value correctly so that we do not pass a 0
value to ERR_PTR and confuse the calling code.  (In particular,
osd_client.c handle_map() will BUG(!newmap)).
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

0ed7285e

02 10月, 2012 1 次提交

libceph: check for invalid mapping · d63b77f4

由 Sage Weil 提交于 9月 24, 2012

If we encounter an invalid (e.g., zeroed) mapping, return an error
and avoid a divide by zero.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

d63b77f4

31 7月, 2012 1 次提交

libceph: support crush tunables · 546f04ef

由 Sage Weil 提交于 7月 30, 2012

The server side recently added support for tuning some magic
crush variables. Decode these variables if they are present, or use the
default values if they are not present.

Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5.
Signed-off-by: Ncaleb miles <caleb.miles@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>

546f04ef

07 6月, 2012 3 次提交

libceph: fix overflow in osdmap_apply_incremental() · a5506049

由 Xi Wang 提交于 6月 06, 2012

On 32-bit systems, a large `pglen' would overflow `pglen*sizeof(u32)'
and bypass the check ceph_decode_need(p, end, pglen*sizeof(u32), bad).
It would also overflow the subsequent kmalloc() size, leading to
out-of-bounds write.
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

a5506049

libceph: fix overflow in osdmap_decode() · e91a9b63

由 Xi Wang 提交于 6月 06, 2012

On 32-bit systems, a large `n' would overflow `n * sizeof(u32)' and bypass
the check ceph_decode_need(p, end, n * sizeof(u32), bad).  It would also
overflow the subsequent kmalloc() size, leading to out-of-bounds write.
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

e91a9b63

libceph: fix overflow in __decode_pool_names() · ad3b904c

由 Xi Wang 提交于 6月 06, 2012

`len' is read from network and thus needs validation.  Otherwise a
large `len' would cause out-of-bounds access via the memcpy() call.
In addition, len = 0xffffffff would overflow the kmalloc() size,
leading to out-of-bounds write.

This patch adds a check of `len' via ceph_decode_need().  Also use
kstrndup rather than kmalloc/memcpy.

[elder@inktank.com: added -ENOMEM return for null kstrndup() result]
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

ad3b904c

22 5月, 2012 1 次提交

libceph: fix pg_temp updates · 6bd9adbd

由 Sage Weil 提交于 5月 21, 2012

Usually, we are adding pg_temp entries or removing them. Occasionally they
update. In that case, osdmap_apply_incremental() was failing because the
rbtree entry already exists.

Fix by removing the existing entry before inserting a new one.

Fixes http://tracker.newdream.net/issues/2446Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

6bd9adbd

08 5月, 2012 4 次提交

crush: warn on do_rule failure · 8b393269

由 Sage Weil 提交于 5月 07, 2012

If we get an error code from crush_do_rule(), print an error to the
console.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

8b393269

crush: remove parent maps · fc7c3ae5

由 Sage Weil 提交于 5月 07, 2012

These were used for the ill-fated forcefeed feature.  Remove them.

Reflects ceph.git commit ebdf80edfecfbd5a842b71fbe5732857994380c1.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

fc7c3ae5

crush: remove forcefeed functionality · 41ebcc09

由 Sage Weil 提交于 5月 07, 2012

Remove forcefeed functionality from CRUSH.  This is an ugly misfeature that
is mostly useless and unused.  Remove it.

Reflects ceph.git commit ed974b5000f2851207d860a651809af4a1867942.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

Conflicts:

	net/ceph/crush/mapper.c

41ebcc09

ceph: drop support for preferred_osd pgs · 3469ac1a

由 Sage Weil 提交于 5月 07, 2012

This was an ill-conceived feature that has been removed from Ceph.  Do
this gracefully:

 - reject attempts to specify a preferred_osd via the ioctl
 - stop exposing this information via virtual xattrs
 - always fill in -1 for requests, in case we talk to an older server
 - don't calculate preferred_osd placements/pgids
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

3469ac1a

16 4月, 2012 1 次提交

net: cleanup unsigned to unsigned int · 95c96174

由 Eric Dumazet 提交于 4月 15, 2012

Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95c96174

22 3月, 2012 1 次提交

libceph: fix overflow check in crush_decode() · 64486697

由 Xi Wang 提交于 2月 16, 2012

The existing overflow check (n > ULONG_MAX / b) didn't work, because
n = ULONG_MAX / b would both bypass the check and still overflow the
allocation size a + n * b.

The correct check should be (n > (ULONG_MAX - a) / b).
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

64486697

29 9月, 2011 2 次提交

libceph: fix pg_temp mapping update · 8adc8b3d

由 Sage Weil 提交于 9月 28, 2011

The incremental map updates have a record for each pg_temp mapping that is
to be add/updated (len > 0) or removed (len == 0).  The old code was
written as if the updates were a complete enumeration; that was just wrong.
Update the code to remove 0-length entries and drop the rbtree traversal.

This avoids misdirected (and hung) requests that manifest as server
errors like

[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
Signed-off-by: NSage Weil <sage@newdream.net>

8adc8b3d

libceph: fix pg_temp mapping calculation · 782e182e

由 Sage Weil 提交于 9月 28, 2011

We need to apply the modulo pg_num calculation before looking up a pgid in
the pg_temp mapping rbtree.  This fixes pg_temp mappings, and fixes
(some) misdirected requests that result in messages like

[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11

on the server and stall make the client block without getting a reply (at
least until the pg_temp mapping goes way, but that can take a long long
time).

Reorder calc_pg_raw() a bit to make more sense.
Signed-off-by: NSage Weil <sage@newdream.net>

782e182e

25 5月, 2011 1 次提交

libceph: handle new osdmap down/state change encoding · 7662d8ff

由 Sage Weil 提交于 5月 03, 2011

Old incrementals encode a 0 value (nearly always) when an osd goes down.
Change that to allow any state bit(s) to be flipped. Special case 0 to
mean flip the CEPH_OSD_UP bit to mimic the old behavior.
Signed-off-by: NSage Weil <sage@newdream.net>

7662d8ff

20 5月, 2011 1 次提交
- S
  libceph: fix osdmap timestamp assignment · 31456665
  由 Sage Weil 提交于 5月 12, 2011
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  31456665
13 1月, 2011 1 次提交

ceph: Always free allocated memory in osdmap_decode() · b0aee351

由 Jesper Juhl 提交于 12月 24, 2010

Always free memory allocated to 'pi' in
net/ceph/osdmap.c::osdmap_decode().
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NSage Weil <sage@newdream.net>

b0aee351

21 10月, 2010 2 次提交

ceph: factor out libceph from Ceph file system · 3d14c5d2

由 Yehuda Sadeh 提交于 4月 06, 2010

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: NSage Weil <sage@newdream.net>

3d14c5d2

ceph: lookup pool in osdmap by name · 7669a2c9

由 Yehuda Sadeh 提交于 5月 17, 2010

Implement a pool lookup by name.  This will be used by rbd.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

7669a2c9

04 8月, 2010 1 次提交
- S
  ceph: whitespace cleanup · 213c99ee
  由 Sage Weil 提交于 8月 03, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
  213c99ee
03 8月, 2010 1 次提交

ceph: fix decoding of pool snap info · 73a7e693

由 Sage Weil 提交于 8月 02, 2010

The pool info contains a vector for snap_info_t, not snap ids.  This fixes
the broken decoding, which would declare teh update corrupt when a pool
snapshot was created.
Signed-off-by: NSage Weil <sage@newdream.net>

73a7e693

02 8月, 2010 2 次提交

S
ceph: print useful error message when crush rule not found · effcb9ed
由 Sage Weil 提交于 7月 09, 2010
```
Include the crush_ruleset in the error message.
Signed-off-by: NSage Weil <sage@newdream.net>
```
effcb9ed

ceph: code cleanup · cd84db6e

由 Yehuda Sadeh 提交于 6月 11, 2010

Mainly fixing minor issues reported by sparse.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

cd84db6e

24 7月, 2010 1 次提交

ceph: fix pg_mapping leak on pg_temp updates · bc4fdca8

由 Sage Weil 提交于 7月 20, 2010

Free the ceph_pg_mapping structs when they are removed from the pg_temp
rbtree.  Also fix a leak in the __insert_pg_mapping() error path.
Signed-off-by: NSage Weil <sage@newdream.net>

bc4fdca8

08 7月, 2010 1 次提交

ceph: add kfree() to error path · b0bbb0be

由 Dan Carpenter 提交于 7月 08, 2010

We leak a "pi" on this error path.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

b0bbb0be

18 6月, 2010 1 次提交

ceph: fix crush map update decoding · cebc5be6

由 Sage Weil 提交于 6月 17, 2010

If the incremental osdmap has a new crush map, advance the position after
decoding so that we can parse the rest of the osdmap properly.
Signed-off-by: NSage Weil <sage@newdream.net>

cebc5be6

30 5月, 2010 1 次提交

fs/ceph: Use ERR_CAST · 7e34bc52

由 Julia Lawall 提交于 5月 22, 2010

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of
the returned value is the same as the type of the enclosing function.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NSage Weil <sage@newdream.net>

7e34bc52

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功