提交 · 9bb0248d9eb9438b991ba538e30eedb493cf1fb4 · openeuler / Kernel

02 4月, 2018 11 次提交

rbd: add img_req->op_type field · 9bb0248d

由 Ilya Dryomov 提交于 1月 30, 2018

Store op_type in its own field instead of packing it into flags.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9bb0248d

rbd: simplify rbd_osd_req_create() · a162b308

由 Ilya Dryomov 提交于 1月 30, 2018

No need to pass rbd_dev and op_type to rbd_osd_req_create(): there are
no standalone (!IMG_DATA) object requests anymore and osd_req->r_flags
can be set in rbd_osd_req_format_{read,write}().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a162b308

I
rbd: remove old request handling code · 51c3509e
由 Ilya Dryomov 提交于 1月 29, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
51c3509e

rbd: new request handling code · 3da691bf

由 Ilya Dryomov 提交于 1月 29, 2018

The notable changes are:

- instead of explicitly stat'ing the object to see if it exists before
  issuing the write, send the write optimistically along with the stat
  in a single OSD request
- zero copyup optimization
- all object requests are associated with an image request and have
  a valid ->img_request pointer; there are no standalone (!IMG_DATA)
  object requests anymore
- code is structured as a state machine (vs a bunch of callbacks with
  implicit state)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3da691bf

rbd: move from raw pages to bvec data descriptors · 7e07efb1

由 Ilya Dryomov 提交于 1月 20, 2018

In preparation for rbd "fancy" striping which requires bio_vec arrays,
wire up BVECS data type and kill off PAGES data type. There is nothing
wrong with using page vectors for copyup requests -- it's just less
iterator boilerplate code to write for the new striping framework.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7e07efb1

rbd: get rid of img_req->copyup_pages · f9dcbc44

由 Ilya Dryomov 提交于 1月 20, 2018

The initiating object request is the proper owner -- save a bit of
space.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

f9dcbc44

rbd: don't (ab)use obj_req->pages for stat requests · 06fbb699

由 Ilya Dryomov 提交于 1月 20, 2018

obj_req->pages is for provided data buffers.  stat requests are
internal and should be NODATA.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

06fbb699

rbd: remove bio cloning helpers · df6ba701

由 Ilya Dryomov 提交于 1月 20, 2018

Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

df6ba701

libceph, rbd: new bio handling code (aka don't clone bios) · 5359a17d

由 Ilya Dryomov 提交于 1月 20, 2018

The reason we clone bios is to be able to give each object request
(and consequently each ceph_osd_data/ceph_msg_data item) its own
pointer to a (list of) bio(s).  The messenger then initializes its
cursor with cloned bio's ->bi_iter, so it knows where to start reading
from/writing to.  That's all the cloned bios are used for: to determine
each object request's starting position in the provided data buffer.

Introduce ceph_bio_iter to do exactly that -- store position within bio
list (i.e. pointer to bio) + position within that bio (i.e. bvec_iter).
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5359a17d

I
rbd: start enums at 1 instead of 0 · a1fbb5e7
由 Ilya Dryomov 提交于 1月 16, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
a1fbb5e7

rbd: set max_segment_size to UINT_MAX · 24f1df60

由 Ilya Dryomov 提交于 1月 12, 2018

Commit 21acdf45 ("rbd: set max_segments to USHRT_MAX") removed the
limit on max_segments.  Remove the limit on max_segment_size as well.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

24f1df60

30 1月, 2018 1 次提交

rbd: whitelist RBD_FEATURE_OPERATIONS feature bit · e573427a

由 Ilya Dryomov 提交于 1月 16, 2018

This feature bit restricts older clients from performing certain
maintenance operations against an image (e.g. clone, snap create).
krbd does not perform maintenance operations.

Cc: stable@vger.kernel.org
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

e573427a

29 1月, 2018 3 次提交

rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full() · d98f153f

由 Ilya Dryomov 提交于 1月 18, 2018

If rbd_img_request_submit() fails, parent_request->obj_request is
NULLed out, triggering an assert in rbd_obj_request_put():

  rbd_img_request_put(parent_request)
    rbd_parent_request_destroy
      rbd_obj_request_put(NULL)

Just remove it -- parent_request->obj_request will be put in
rbd_parent_request_destroy().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d98f153f

I
rbd: use kmem_cache_zalloc() in rbd_img_request_create() · a0c5895b
由 Ilya Dryomov 提交于 1月 22, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
a0c5895b
I
rbd: obj_request->completion is unused · 2e584bce
由 Ilya Dryomov 提交于 1月 15, 2018
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
2e584bce

10 1月, 2018 2 次提交

rbd: set max_segments to USHRT_MAX · 21acdf45

由 Ilya Dryomov 提交于 12月 21, 2017

Commit d3834fef ("rbd: bump queue_max_segments") bumped
max_segments (unsigned short) to max_hw_sectors (unsigned int).
max_hw_sectors is set to the number of 512-byte sectors in an object
and overflows unsigned short for 32M (largest possible) objects, making
the block layer resort to handing us single segment (i.e. single page
or even smaller) bios in that case.

Cc: stable@vger.kernel.org
Fixes: d3834fef ("rbd: bump queue_max_segments")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

21acdf45

rbd: reacquire lock should update lock owner client id · edd8ca80

由 Florian Margaine 提交于 12月 13, 2017

Otherwise, future operations on this RBD using exclusive-lock are
going to require the lock from a non-existent client id.

Cc: stable@vger.kernel.org
Fixes: 14bb211d ("rbd: support updating the lock cookie without releasing the lock")
Link: http://tracker.ceph.com/issues/19929Signed-off-by: NFlorian Margaine <florian@platform.sh>
[idryomov@gmail.com: rbd_set_owner_cid() call, __rbd_lock() helper]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

edd8ca80

13 11月, 2017 4 次提交

rbd: default to single-major device number scheme · 3cfa3b16

由 Ilya Dryomov 提交于 11月 13, 2017

It's been 3.5 years, let's turn it on by default.  Support in rbd(8)
utility goes back to pre-firefly, "rbd map" has been loading the module
with single_major=Y ever since.  However, if the module is already
loaded (whether by hand or at boot time), we end up with single_major=N.
Also, some people don't install rbd(8) and use the sysfs interface
directly.

(With single-major=N, a major number is consumed for every mapping,
imposing a limit of ~240 rbd images per host.  single-major=Y allows
mapping thousands of rbd images on a single machine.)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

3cfa3b16

rbd: set discard_alignment to zero · 7c084289

由 David Disseldorp 提交于 11月 02, 2017

RBD devices are currently incorrectly initialised with the block queue
discard_alignment set to the underlying RADOS object size.

As per Documentation/ABI/testing/sysfs-block:
  The discard_alignment parameter indicates how many bytes the beginning
  of the device is offset from the internal allocation unit's natural
  alignment.

Correcting the discard_alignment parameter from the RADOS object size to
zero (the blk_set_default_limits() default) has no effect on how discard
requests are propagated through the block layer - @alignment in
__blkdev_issue_discard() remains zero. However, it does fix the UNMAP
granularity alignment value advertised to SCSI initiators via the Block
Limits VPD.
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7c084289

rbd: get rid of rbd_mapping::read_only · 9568c93e

由 Ilya Dryomov 提交于 10月 12, 2017

It is redundant -- rw/ro state is stored in hd_struct and managed by
the block layer.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9568c93e

rbd: fix and simplify rbd_ioctl_set_ro() · 1de797bb

由 Ilya Dryomov 提交于 10月 12, 2017

->open_count/-EBUSY check is bogus and wrong: when an open device is
set read-only, blkdev_write_iter() refuses further writes with -EPERM.
This is standard behaviour and all other block devices allow this.

set_disk_ro() call is also problematic: we affect the entire device
when called on a single partition.

All rbd_ioctl_set_ro() needs to do is refuse ro -> rw transition for
mapped snapshots.  Everything else can be handled by generic code.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1de797bb

09 11月, 2017 1 次提交

rbd: use GFP_NOIO for parent stat and data requests · 1e37f2f8

由 Ilya Dryomov 提交于 11月 06, 2017

rbd_img_obj_exists_submit() and rbd_img_obj_parent_read_full() are on
the writeback path for cloned images -- we attempt a stat on the parent
object to see if it exists and potentially read it in to call copyup.
GFP_NOIO should be used instead of GFP_KERNEL here.

Cc: stable@vger.kernel.org
Link: http://tracker.ceph.com/issues/22014Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NDavid Disseldorp <ddiss@suse.de>

1e37f2f8

07 9月, 2017 1 次提交

rbd: silence bogus uninitialized use warning in rbd_acquire_lock() · 37f13252

由 Kefeng Wang 提交于 7月 13, 2017

  drivers/block/rbd.c: In function 'rbd_acquire_lock':
  drivers/block/rbd.c:3602:44: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Silence the warning, found it when built old kernel(3.10) with
OBS(opensuse build service).
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

37f13252

19 6月, 2017 1 次提交

rbd: use bio_clone_fast() instead of bio_clone() · f856dc36

由 NeilBrown 提交于 6月 18, 2017

bio_clone() makes a copy of the bi_io_vec, but rbd never changes that,
so there is no need for a copy.
bio_clone_fast() can be used instead, which avoids making the copy.

This requires that we provide a bio_set.  bio_clone() uses fs_bio_set,
but it isn't, in general, safe to use the same bio_set at different
levels of the stack, as that can lead to deadlocks.  As filesystems
use fs_bio_set, block devices shouldn't.

As rbd never stacks, it is safe to have a single global bio_set for
all rbd devices to use.  So allocate that when the module is
initialised, and use it with bio_clone_fast().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f856dc36

09 6月, 2017 2 次提交

blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653

由 Christoph Hellwig 提交于 6月 03, 2017

Use the same values for use for request completion errors as the return
value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
a requeue, and all the others are completed as-is.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fc17b653

block: introduce new block status code type · 2a842aca

由 Christoph Hellwig 提交于 6月 03, 2017

Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.

For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.

blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2a842aca

29 5月, 2017 1 次提交

rbd: implement REQ_OP_WRITE_ZEROES · 6ac56951

由 Ilya Dryomov 提交于 5月 22, 2017

Commit 93c1defe ("rbd: remove the discard_zeroes_data flag")
explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
following commit 48920ff2 ("block: remove the discard_zeroes_data
flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.

rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
release either some or all blocks depending on whether the zeroing
request is rbd_obj_bytes() aligned.  This is how we currently implement
discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
now.  Caveats:

- REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
  current implementations - nvme and loop

- there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
  is hence less helpful than blk_bio_discard_split(), but this can (and
  should) be fixed on the rbd side

In the future we will split these into two code paths to respect
REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
released on discard.

Fixes: 93c1defe ("rbd: remove the discard_zeroes_data flag")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

6ac56951

09 5月, 2017 1 次提交

fs: ceph: CURRENT_TIME with ktime_get_real_ts() · 1134e091

由 Deepa Dinamani 提交于 5月 08, 2017

CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.comSigned-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
M:	Ilya Dryomov <idryomov@gmail.com>
M:	"Yan, Zheng" <zyan@redhat.com>
M:	Sage Weil <sage@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1134e091

04 5月, 2017 10 次提交

rbd: exclusive map option · e010dd0a