提交 · c300156bc734796e251fa31b07dff2af2f572889 · openeuler / Kernel

03 8月, 2018 2 次提交

rbd: pass rbd_spec into parse_rbd_opts_token() · c300156b

由 Ilya Dryomov 提交于 7月 03, 2018

In preparation for _pool_ns client option, make rbd_spec available
inside parse_rbd_opts_token().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c300156b

libceph: amend "bad option arg" error message · 2f56b6ba

由 Ilya Dryomov 提交于 6月 27, 2018

Don't mention "mount" -- in the rbd case it is "mapping".
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2f56b6ba

05 6月, 2018 2 次提交

rbd: flush rbd_dev->watch_dwork after watch is unregistered · 23edca86

由 Dongsheng Yang 提交于 6月 04, 2018

There is a problem if we are going to unmap a rbd device and the
watch_dwork is going to queue delayed work for watch:

unmap Thread                    watch Thread                  timer
do_rbd_remove
  cancel_tasks_sync(rbd_dev)
                                queue_delayed_work for watch
  destroy_workqueue(rbd_dev->task_wq)
    drain_workqueue(wq)
    destroy other resources in wq
                                                              call_timer_fn
                                                                __queue_work()

Then the delayed work escape the cancel_tasks_sync() and
destroy_workqueue() and we will get an user-after-free call trace:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  Modules linked in:
  CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     4.17.0-rc6+ #13
  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
  RIP: 0010:__queue_work+0x6a/0x3b0
  RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
  RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000
  RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00
  R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970
  R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007
  FS:  0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0
  Call Trace:
   <IRQ>
   ? __queue_work+0x3b0/0x3b0
   call_timer_fn+0x2d/0x130
   run_timer_softirq+0x16e/0x430
   ? tick_sched_timer+0x37/0x70
   __do_softirq+0xd2/0x280
   irq_exit+0xd5/0xe0
   smp_apic_timer_interrupt+0x6c/0x130
   apic_timer_interrupt+0xf/0x20

[ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch()
  either bails out early because the watch is UNREGISTERED at that point
  or just gets cancelled. ]

Cc: stable@vger.kernel.org
Fixes: 99d16943 ("rbd: retry watch re-registration periodically")
Signed-off-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

23edca86

libceph, rbd: add error handling for osd_req_op_cls_init() · fe943d50

由 Chengguang Xu 提交于 4月 12, 2018

Add proper error handling for osd_req_op_cls_init() to replace
BUG_ON statement when failing from memory allocation.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fe943d50

25 5月, 2018 1 次提交

block drivers/block: Use octal not symbolic permissions · 5657a819

由 Joe Perches 提交于 5月 24, 2018

Convert the S_<FOO> symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.

see: https://lkml.org/lkml/2016/8/2/1945

Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>

Miscellanea:

o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5657a819

10 5月, 2018 1 次提交

libceph: add osd_req_op_extent_osd_data_bvecs() · 0010f705

由 Ilya Dryomov 提交于 5月 04, 2018

... and store num_bvecs for client code's convenience.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>

0010f705

16 4月, 2018 5 次提交

rbd: notrim map option · d9360540

由 Ilya Dryomov 提交于 3月 23, 2018

Add an option to turn off discard and write zeroes offload support to
avoid deprovisioning a fully provisioned image.  When enabled, discard
requests will fail with -EOPNOTSUPP, write zeroes requests will fall
back to manually zeroing.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Tested-by: NHitoshi Kamei <hitoshi.kamei.xm@hitachi.com>

d9360540

rbd: adjust queue limits for "fancy" striping · 420efbdf

由 Ilya Dryomov 提交于 4月 16, 2018

In order to take full advantage of merging in ceph_file_to_extents(),
allow object set sized I/Os.  If the layout is not "fancy", an object
set consists of just one object.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

420efbdf

rbd: avoid Wreturn-type warnings · c6244b3b

由 Arnd Bergmann 提交于 4月 04, 2018

In some configurations gcc cannot see that rbd_assert(0) leads to an
unreachable code path:

drivers/block/rbd.c: In function 'rbd_img_is_write':
drivers/block/rbd.c:1397:1: error: control reaches end of non-void function [-Werror=return-type]
drivers/block/rbd.c: In function '__rbd_obj_handle_request':
drivers/block/rbd.c:2499:1: error: control reaches end of non-void function [-Werror=return-type]
drivers/block/rbd.c: In function 'rbd_obj_handle_write':
drivers/block/rbd.c:2471:1: error: control reaches end of non-void function [-Werror=return-type]

As the rbd_assert() here shows has no extra information beyond the verbose
BUG(), we can simply use BUG() directly in its place. This is reliably
detected as not returning on any architecture, since it doesn't depend
on the unlikely() comparison that confused gcc.

Fixes: 3da691bf ("rbd: new request handling code")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c6244b3b

rbd: support timeout in rbd_wait_state_locked() · 34f55d0b

由 Dongsheng Yang 提交于 3月 26, 2018

currently, the rbd_wait_state_locked() will wait forever if we
can't get our state locked. Example:

rbd map --exclusive test1  --> /dev/rbd0
rbd map test1  --> /dev/rbd1
dd if=/dev/zero of=/dev/rbd1 bs=1M count=1 --> IO blocked

To avoid this problem, this patch introduce a timeout design
in rbd_wait_state_locked(). Then rbd_wait_state_locked() will
return error when we reach a timeout.

This patch allow user to set the lock_timeout in rbd mapping.
Signed-off-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

34f55d0b

rbd: refactor rbd_wait_state_locked() · 2f18d466

由 Ilya Dryomov 提交于 4月 04, 2018

In preparation for lock_timeout option, make rbd_wait_state_locked()
return error codes.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2f18d466

02 4月, 2018 29 次提交

rbd: remove VLA usage · 08a79102

由 Kyle Spiers 提交于 3月 17, 2018

As part of the effort to remove VLAs from the kernel[1], this moves
the literal values into the stack array calculation instead of using a
variable for the sizing. The resulting size can be found from
sizeof(buf).

[1] https://lkml.org/lkml/2018/3/7/621Signed-off-by: NKyle Spiers <kyle@spiers.me>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

08a79102

rbd: fix spelling mistake: "reregisteration" -> "reregistration" · f6870cc9

由 Colin Ian King 提交于 3月 19, 2018

Trivial fix to spelling mistake in rdb_warn message text.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f6870cc9

rbd: get the latest osdmap when using an existing client · dd435855

由 Ilya Dryomov 提交于 2月 22, 2018

Currently we request the latest osdmap only if ceph_pg_poolid_by_name()
fails with -ENOENT.  This is effective with newly created pools, but we
also want to avoid attempting to map from pools that were recently
deleted and report "pool does not exist" instead.  (Such an attempt
eventually fails in the OSD client after map check code kicks in, but
the error message is confusing.)

Request the latest osdmap unconditionally after bumping a ref on an
existing client in rbd_client_find().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

dd435855

rbd: move rbd_get_client() below rbd_put_client() · 5feb0d8d

由 Ilya Dryomov 提交于 2月 22, 2018

... to avoid a forward declaration in the next commit.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5feb0d8d

I
rbd: remove redundant declaration of rbd_spec_put() · 0a4a1e68
由 Ilya Dryomov 提交于 2月 12, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
0a4a1e68

rbd: allow "fancy" striping · b1331852

由 Ilya Dryomov 提交于 2月 07, 2018

Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NJason Dillaman <dillaman@redhat.com>

b1331852

rbd: introduce OWN_BVECS data type · afb97888

由 Ilya Dryomov 提交于 2月 06, 2018

If the layout is "fancy", we need to be able to rearrange the provided
bio_vecs in stripe unit chunks to make it possible for the messenger to
read/write directly from/to the provided data buffer, without employing
a temporary data buffer for assembling the result.

Higher level bio_vec arrays are generally immutable, so this requires
copying into a private array. Only the bio_vecs themselves are shuffled
around, not the actual data. OWN_BVECS doesn't own any pages.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

afb97888

rbd: remove rbd_parent_request_{create,destroy}() · e93aca0a

由 Ilya Dryomov 提交于 2月 06, 2018

rbd_parent_request_create() takes a ref on obj_req for child_img_req.
There is no point in doing that because child_img_req is created on
behalf of obj_req -- obj_req is the initiator and can't be completed
before child_img_req.

Open-code the rest of rbd_parent_request_create() and remove it along
with rbd_parent_request_destroy().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e93aca0a

I
rbd: get rid of img_req->{offset,length} · dfd9875f
由 Ilya Dryomov 提交于 2月 06, 2018
```
These are set, but no longer used.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
dfd9875f
I
rbd: remove rbd_img_request_fill() and helpers · 0420c5dd
由 Ilya Dryomov 提交于 2月 06, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
0420c5dd
I
rbd: switch to common striping framework · 5a237819
由 Ilya Dryomov 提交于 2月 06, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
5a237819

rbd: create+truncate for whole-object layered discards · 2bb1e56e

由 Ilya Dryomov 提交于 2月 06, 2018

A whole-object layered discard is implemented as a truncate rather
than a delete: a dummy object is needed to prevent the CoW machinery
from kicking in.  However, a truncate on a non-existent object is
a no-op.  If the object doesn't exist in HEAD, a discard request is
effectively ignored, which violates our "discard zeroes data" promise
and breaks REQ_OP_WRITE_ZEROES implementation.

A non-exclusive create on an existing object is also a no-op, so the
fix is to do a compound create+truncate instead.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2bb1e56e

rbd: move to obj_req->img_extents · 86bd7998

由 Ilya Dryomov 提交于 2月 06, 2018

In preparation for rbd "fancy" striping, replace obj_req->img_offset
with obj_req->img_extents. A single starting offset isn't sufficient
because we want only one OSD request per object and will merge adjacent
object extents in ceph_file_to_extents(). The final object extent may
map into multiple different byte ranges in the image.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

86bd7998

rbd: incorporate ceph_object_extent · 43df3d35

由 Ilya Dryomov 提交于 2月 02, 2018

obj_req->object_no -> obj_req->ex.oe_objno
obj_req->offset -> obj_req->ex.oe_off
obj_req->length -> obj_req->ex.oe_len

... and use ex for linking object requests to image requests.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

43df3d35

rbd: store data_type in img_req instead of obj_req · ecc633ca

由 Ilya Dryomov 提交于 2月 01, 2018

All object requests are associated with an image request now -- avoid
duplicating the same info in each object request.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ecc633ca

rbd: remove obj_req->flags field · 0be2d60e

由 Ilya Dryomov 提交于 2月 01, 2018

There are no standalone (!IMG_DATA) object requests anymore.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0be2d60e

I
rbd: remove old request completion code · 15961b44
由 Ilya Dryomov 提交于 2月 01, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
15961b44

rbd: new request completion code · 7114edac

由 Ilya Dryomov 提交于 2月 01, 2018

Do away with partial request completions and all the associated
complexity.  Individual object requests no longer need to be completed
in order -- when the last one becomes ready, we complete the entire
higher level request all at once.

This also wraps up the conversion to a state machine model and
eliminates the recursion described in commit 6d69bb53 ("rbd:
prevent kernel stack blow up on rbd map").
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7114edac

rbd: update rbd_img_request_submit() signature · efbd1a11

由 Ilya Dryomov 提交于 1月 30, 2018

It should be void now.  Also, object requests are unlinked only in
image request destructor, which can't run before rbd_img_request_put(),
so no need for _safe.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

efbd1a11

rbd: add img_req->op_type field · 9bb0248d

由 Ilya Dryomov 提交于 1月 30, 2018

Store op_type in its own field instead of packing it into flags.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9bb0248d

rbd: simplify rbd_osd_req_create() · a162b308

由 Ilya Dryomov 提交于 1月 30, 2018

No need to pass rbd_dev and op_type to rbd_osd_req_create(): there are
no standalone (!IMG_DATA) object requests anymore and osd_req->r_flags
can be set in rbd_osd_req_format_{read,write}().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a162b308

I
rbd: remove old request handling code · 51c3509e
由 Ilya Dryomov 提交于 1月 29, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
51c3509e

rbd: new request handling code · 3da691bf

由 Ilya Dryomov 提交于 1月 29, 2018

The notable changes are:

- instead of explicitly stat'ing the object to see if it exists before
  issuing the write, send the write optimistically along with the stat
  in a single OSD request
- zero copyup optimization
- all object requests are associated with an image request and have
  a valid ->img_request pointer; there are no standalone (!IMG_DATA)
  object requests anymore
- code is structured as a state machine (vs a bunch of callbacks with
  implicit state)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3da691bf

rbd: move from raw pages to bvec data descriptors · 7e07efb1