提交 · 31990f0f5366a8f66688edae8688723b22034108 · openeuler / Kernel

22 10月, 2018 11 次提交

libceph: support the RADOS copy-from operation · 23ddf9be

由 Luis Henriques 提交于 10月 15, 2018

Add support for performing remote object copies using the 'copy-from'
operation.

[ Add COPY_FROM to get_num_data_items(). ]
Signed-off-by: NLuis Henriques <lhenriques@suse.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

23ddf9be

libceph: check reply num_data_items in setup_request_data() · 98c4bfe9

由 Ilya Dryomov 提交于 10月 17, 2018

setup_request_data() adds message data items to both request and reply
messages, but only checks request num_data_items before proceeding with
the loop. This is wrong because if an op doesn't have any request data
items but has a reply data item (e.g. read), a duplicate data item gets
added to the message on every resend attempt.

This went unnoticed for years but now that message data items are
preallocated, it promptly crashes in ceph_msg_data_add(). Amend the
signature to make it clear that setup_request_data() operates on both
request and reply messages. Also, remove data_len assert -- we have
another one in prepare_write_message().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

98c4bfe9

libceph: preallocate message data items · 0d9c1ab3

由 Ilya Dryomov 提交于 10月 15, 2018

Currently message data items are allocated with ceph_msg_data_create()
in setup_request_data() inside send_request().  send_request() has never
been allowed to fail, so each allocation is followed by a BUG_ON:

  data = ceph_msg_data_create(...);
  BUG_ON(!data);

It's been this way since support for multiple message data items was
added in commit 6644ed7b ("libceph: make message data be a pointer")
in 3.10.

There is no reason to delay the allocation of message data items until
the last possible moment and we certainly don't need a linked list of
them as they are only ever appended to the end and never erased.  Make
ceph_msg_new2() take max_data_items and adapt the rest of the code.
Reported-by: NJerry Lee <leisurelysw24@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0d9c1ab3

libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls · 26f887e0

由 Ilya Dryomov 提交于 10月 15, 2018

The current requirement is that ceph_osdc_alloc_messages() should be
called after oid and oloc are known.  In preparation for preallocating
message data items, move ceph_osdc_alloc_messages() further down, so
that it is called when OSD op codes are known.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

26f887e0

libceph: introduce alloc_watch_request() · 39e58c34

由 Ilya Dryomov 提交于 10月 15, 2018

ceph_osdc_alloc_messages() call will be moved out of
alloc_linger_request() in the next commit, which means that
ceph_osdc_watch() will need to call ceph_osdc_alloc_messages()
twice.  Add a helper for that.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

39e58c34

libceph: assign cookies in linger_submit() · 81c65213

由 Ilya Dryomov 提交于 10月 11, 2018

Register lingers directly in linger_submit().  This avoids allocating
memory for notify pagelist while holding osdc->lock and simplifies both
callers of linger_submit().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

81c65213

libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get() · 3b83f60d

由 Ilya Dryomov 提交于 10月 11, 2018

ceph_msgpool_get() can fall back to ceph_msg_new() when it is asked for
a message whose front portion is larger than pool->front_len. However
the caller always passes 0, effectively disabling that code path. The
allocation goes to the message pool and returns a message with a front
that is smaller than requested, setting us up for a crash.

One example of this is a directory with a large number of snapshots.
If its snap context doesn't fit, we oops in encode_request_partial().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3b83f60d

libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op() · 41a264e1

由 Ilya Dryomov 提交于 10月 13, 2018

Any uninitialized or unknown ops will be caught by the default clause
anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

41a264e1

libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist() · 89486833

由 Ilya Dryomov 提交于 9月 28, 2018

Because send_mds_reconnect() wants to send a message with a pagelist
and pass the ownership to the messenger, ceph_msg_data_add_pagelist()
consumes a ref which is then put in ceph_msg_data_destroy().  This
makes managing pagelists in the OSD client (where they are wrapped in
ceph_osd_data) unnecessarily hard because the handoff only happens in
ceph_osdc_start_request() instead of when the pagelist is passed to
ceph_osd_data_pagelist_init().  I counted several memory leaks on
various error paths.

Fix up ceph_msg_data_add_pagelist() and carry a pagelist ref in
ceph_osd_data.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

89486833

libceph: introduce ceph_pagelist_alloc() · 33165d47

由 Ilya Dryomov 提交于 9月 28, 2018

struct ceph_pagelist cannot be embedded into anything else because it
has its own refcount. Merge allocation and initialization together.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

33165d47

I
libceph: osd_req_op_cls_init() doesn't need to take opcode · 24639ce5
由 Ilya Dryomov 提交于 9月 26, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
24639ce5

28 9月, 2018 1 次提交

libceph: Remove VLA usage of skcipher · 69d6302b

由 Kees Cook 提交于 9月 18, 2018

In the quest to remove all stack VLA usage from the kernel[1], this
replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage
with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(),
which uses a fixed stack size.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Sage Weil <sage@redhat.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

69d6302b

13 8月, 2018 2 次提交

crush: fix using plain integer as NULL warning · 4de17aea

由 YueHaibing 提交于 8月 08, 2018

Fixes the following sparse warnings:

net/ceph/crush/mapper.c:517:76: warning: Using plain integer as NULL pointer
net/ceph/crush/mapper.c:728:68: warning: Using plain integer as NULL pointer
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

4de17aea

libceph: remove unnecessary non NULL check for request_key · bad87216

由 YueHaibing 提交于 8月 07, 2018

request_key never return NULL,so no need do non-NULL check.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bad87216

03 8月, 2018 14 次提交

libceph: weaken sizeof check in ceph_x_verify_authorizer_reply() · f1d10e04

由 Ilya Dryomov 提交于 7月 27, 2018

Allow for extending ceph_x_authorize_reply in the future.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

f1d10e04

libceph: check authorizer reply/challenge length before reading · 130f52f2

由 Ilya Dryomov 提交于 7月 27, 2018

Avoid scribbling over memory if the received reply/challenge is larger
than the buffer supplied with the authorizer.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

130f52f2

libceph: implement CEPHX_V2 calculation mode · cc255c76

由 Ilya Dryomov 提交于 7月 27, 2018

Derive the signature from the entire buffer (both AES cipher blocks)
instead of using just the first half of the first block, leaving out
data_crc entirely.

This addresses CVE-2018-1129.

Link: http://tracker.ceph.com/issues/24837Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

cc255c76

libceph: add authorizer challenge · 6daca13d

由 Ilya Dryomov 提交于 7月 27, 2018

When a client authenticates with a service, an authorizer is sent with
a nonce to the service (ceph_x_authorize_[ab]) and the service responds
with a mutation of that nonce (ceph_x_authorize_reply). This lets the
client verify the service is who it says it is but it doesn't protect
against a replay: someone can trivially capture the exchange and reuse
the same authorizer to authenticate themselves.

Allow the service to reject an initial authorizer with a random
challenge (ceph_x_authorize_challenge). The client then has to respond
with an updated authorizer proving they are able to decrypt the
service's challenge and that the new authorizer was produced for this
specific connection instance.

The accepting side requires this challenge and response unconditionally
if the client side advertises they have CEPHX_V2 feature bit.

This addresses CVE-2018-1128.

Link: http://tracker.ceph.com/issues/24836Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

6daca13d

libceph: factor out encrypt_authorizer() · 149cac4a

由 Ilya Dryomov 提交于 7月 27, 2018

Will be used for encrypting both the initial and updated authorizers.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

149cac4a

libceph: factor out __ceph_x_decrypt() · c571fe24

由 Ilya Dryomov 提交于 7月 26, 2018

Will be used for decrypting the server challenge which is only preceded
by ceph_x_encrypt_header.

Drop struct_v check to allow for extending ceph_x_encrypt_header in the
future.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

c571fe24

libceph: factor out __prepare_write_connect() · c0f56b48

由 Ilya Dryomov 提交于 7月 26, 2018

Will be used for sending ceph_msg_connect with an updated authorizer,
after the server challenges the initial authorizer.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

c0f56b48

libceph: store ceph_auth_handshake pointer in ceph_connection · 262614c4

由 Ilya Dryomov 提交于 7月 26, 2018

We already copy authorizer_reply_buf and authorizer_reply_buf_len into
ceph_connection.  Factoring out __prepare_write_connect() requires two
more: authorizer_buf and authorizer_buf_len.  Store the pointer to the
handshake in con->auth rather than piling on.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

262614c4

ceph: fix whitespace · 24e1dd6a

由 Stephen Hemminger 提交于 7月 24, 2018

Remove blank lines at end of file and trailing whitespace.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

24e1dd6a

libceph: use timespec64 for r_mtime · fac02ddf

由 Arnd Bergmann 提交于 7月 13, 2018

The request mtime field is used all over ceph, and is currently
represented as a 'timespec' structure in Linux. This changes it to
timespec64 to allow times beyond 2038, modifying all users at the
same time.

[ Remove now redundant ts variable in writepage_nounlock(). ]
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fac02ddf

libceph: use timespec64 in for keepalive2 and ticket validity · 473bd2d7

由 Arnd Bergmann 提交于 7月 13, 2018

ceph_con_keepalive_expired() is the last user of timespec_add() and some
of the last uses of ktime_get_real_ts(). Replacing this with timespec64
based interfaces lets us remove that deprecated API.

I'm introducing new ceph_encode_timespec64()/ceph_decode_timespec64()
here that take timespec64 structures and convert to/from ceph_timespec,
which is defined to have an unsigned 32-bit tv_sec member. This extends
the range of valid times to year 2106, avoiding the year 2038 overflow.

The ceph file system portion still uses the old functions for inode
timestamps, this will be done separately after the VFS layer is converted.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

473bd2d7

libceph: amend "bad option arg" error message · 2f56b6ba

由 Ilya Dryomov 提交于 6月 27, 2018

Don't mention "mount" -- in the rbd case it is "mapping".
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2f56b6ba

libceph: stop parsing when a bad int arg is detected · 17173c82

由 Chengguang Xu 提交于 6月 27, 2018

There is no reason to continue option parsing after detecting
bad option.

[ Return match_int() errors from ceph_parse_options() to match the
  behaviour of parse_rbd_opts_token() and parse_fsopt_token(). ]
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

17173c82

libceph: make ceph_osdc_notify{,_ack}() payload_len u32 · 6d54228f

由 Ilya Dryomov 提交于 6月 25, 2018

The wire format dictates that payload_len fits into 4 bytes.  Take u32
instead of size_t to reflect that.

All callers pass a small integer, so no changes required.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6d54228f

13 6月, 2018 1 次提交

treewide: kmalloc() -> kmalloc_array() · 6da2ec56

由 Kees Cook 提交于 6月 12, 2018

The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)
Signed-off-by: NKees Cook <keescook@chromium.org>

6da2ec56

07 6月, 2018 1 次提交

treewide: Use struct_size() for kmalloc()-family · acafe7e3

由 Kees Cook 提交于 5月 08, 2018

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:

// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
//                      sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@

- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)
Signed-off-by: NKees Cook <keescook@chromium.org>

acafe7e3

05 6月, 2018 10 次提交

libceph: allocate the locator string with GFP_NOFAIL · a86f009f

由 Ilya Dryomov 提交于 5月 23, 2018

calc_target() isn't supposed to fail with anything but POOL_DNE, in
which case we report that the pool doesn't exist and fail the request
with -ENOENT.  Doing this for -ENOMEM is at the very least confusing
and also harmful -- as the preceding requests complete, a short-lived
locator string allocation is likely to succeed after a wait.

(We used to call ceph_object_locator_to_pg() for a pi lookup.  In
theory that could fail with -ENOENT, hence the "ret != -ENOENT" warning
being removed.)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a86f009f

libceph: make abort_on_full a per-osdc setting · c843d13c

由 Ilya Dryomov 提交于 5月 30, 2018

The intent behind making it a per-request setting was that it would be
set for writes, but not for reads.  As it is, the flag is set for all
fs/ceph requests except for pool perm check stat request (technically
a read).

ceph_osdc_abort_on_full() skips reads since the previous commit and
I don't see a use case for marking individual requests.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

c843d13c

libceph: don't abort reads in ceph_osdc_abort_on_full() · 690f951d

由 Ilya Dryomov 提交于 5月 30, 2018

Don't consider reads for aborting and use ->base_oloc instead of
->target_oloc, as done in __submit_request().

Strictly speaking, we shouldn't be aborting FULL_TRY/FULL_FORCE writes
either.  But, there is an inconsistency in FULL_TRY/FULL_FORCE handling
on the OSD side [1], so given that neither of these is used in the
kernel client, leave it for when the OSD behaviour is sorted out.

[1] http://tracker.ceph.com/issues/24339Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

690f951d

libceph: avoid a use-after-free during map check · 6001567c

由 Ilya Dryomov 提交于 5月 22, 2018

Sending map check after complete_request() was called is not only
useless, but can lead to a use-after-free as req->r_kref decrement in
__complete_request() races with map check code.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

6001567c

libceph: don't warn if req->r_abort_on_full is set · 29e87820

由 Ilya Dryomov 提交于 5月 17, 2018

The "FULL or reached pool quota" warning is there to explain paused
requests.  No need to emit it if pausing isn't going to occur.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

29e87820

libceph: use for_each_request() in ceph_osdc_abort_on_full() · 4eea0fef

由 Ilya Dryomov 提交于 5月 16, 2018

Scanning the trees just to see if there is anything to abort is
unnecessary -- all that is needed here is to update the epoch barrier
first, before we start aborting.  Simplify and do the update inside the
loop before calling abort_request() for the first time.

The switch to for_each_request() also fixes a bug: homeless requests
weren't even considered for aborting.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

4eea0fef

libceph: defer __complete_request() to a workqueue · 88bc1922

由 Ilya Dryomov 提交于 5月 21, 2018

In the common case, req->r_callback is called by handle_reply() on the
ceph-msgr worker thread without any locks.  If handle_reply() fails, it
is called with both osd->lock and osdc->lock.  In the map check case,
it is called with just osdc->lock but held for write.  Finally, if the
request is aborted because of -ENOSPC or by ceph_osdc_abort_requests(),
it is called directly on the submitter's thread, again with both locks.

req->r_callback on the submitter's thread is relatively new (introduced
in 4.12) and ripe for deadlocks -- e.g. writeback worker thread waiting
on itself:

  inode_wait_for_writeback+0x26/0x40
  evict+0xb5/0x1a0
  iput+0x1d2/0x220
  ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph]
  writepages_finish+0x2d3/0x410 [ceph]
  __complete_request+0x26/0x60 [libceph]
  complete_request+0x2e/0x70 [libceph]
  __submit_request+0x256/0x330 [libceph]
  submit_request+0x2b/0x30 [libceph]
  ceph_osdc_start_request+0x25/0x40 [libceph]
  ceph_writepages_start+0xdfe/0x1320 [ceph]
  do_writepages+0x1f/0x70
  __writeback_single_inode+0x45/0x330
  writeback_sb_inodes+0x26a/0x600
  __writeback_inodes_wb+0x92/0xc0
  wb_writeback+0x274/0x330
  wb_workfn+0x2d5/0x3b0

Defer __complete_request() to a workqueue in all failure cases so it's
never on the same thread as ceph_osdc_start_request() and always called
with no locks held.

Link: http://tracker.ceph.com/issues/23978Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

88bc1922

libceph: move more code into __complete_request() · 26df726b

由 Ilya Dryomov 提交于 5月 21, 2018

Move req->r_completion wake up and req->r_kref decrement into
__complete_request().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

26df726b

libceph: no need to call flush_workqueue() before destruction · 0d09c57d

由 Ilya Dryomov 提交于 5月 18, 2018

destroy_workqueue() drains the workqueue before proceeding with
destruction.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0d09c57d

libceph: introduce ceph_osdc_abort_requests() · 66850df5

由 Ilya Dryomov 提交于 5月 15, 2018

This will be used by the filesystem for "umount -f".
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

66850df5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功