提交 · b2cd1df66037e7c4697c7e40496bf7e4a5e16a2d · openeuler / Kernel

13 11月, 2017 4 次提交

rbd: default to single-major device number scheme · 3cfa3b16

由 Ilya Dryomov 提交于 11月 13, 2017

It's been 3.5 years, let's turn it on by default.  Support in rbd(8)
utility goes back to pre-firefly, "rbd map" has been loading the module
with single_major=Y ever since.  However, if the module is already
loaded (whether by hand or at boot time), we end up with single_major=N.
Also, some people don't install rbd(8) and use the sysfs interface
directly.

(With single-major=N, a major number is consumed for every mapping,
imposing a limit of ~240 rbd images per host.  single-major=Y allows
mapping thousands of rbd images on a single machine.)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

3cfa3b16

rbd: set discard_alignment to zero · 7c084289

由 David Disseldorp 提交于 11月 02, 2017

RBD devices are currently incorrectly initialised with the block queue
discard_alignment set to the underlying RADOS object size.

As per Documentation/ABI/testing/sysfs-block:
  The discard_alignment parameter indicates how many bytes the beginning
  of the device is offset from the internal allocation unit's natural
  alignment.

Correcting the discard_alignment parameter from the RADOS object size to
zero (the blk_set_default_limits() default) has no effect on how discard
requests are propagated through the block layer - @alignment in
__blkdev_issue_discard() remains zero. However, it does fix the UNMAP
granularity alignment value advertised to SCSI initiators via the Block
Limits VPD.
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7c084289

rbd: get rid of rbd_mapping::read_only · 9568c93e

由 Ilya Dryomov 提交于 10月 12, 2017

It is redundant -- rw/ro state is stored in hd_struct and managed by
the block layer.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9568c93e

rbd: fix and simplify rbd_ioctl_set_ro() · 1de797bb

由 Ilya Dryomov 提交于 10月 12, 2017

->open_count/-EBUSY check is bogus and wrong: when an open device is
set read-only, blkdev_write_iter() refuses further writes with -EPERM.
This is standard behaviour and all other block devices allow this.

set_disk_ro() call is also problematic: we affect the entire device
when called on a single partition.

All rbd_ioctl_set_ro() needs to do is refuse ro -> rw transition for
mapped snapshots.  Everything else can be handled by generic code.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1de797bb

09 11月, 2017 1 次提交

rbd: use GFP_NOIO for parent stat and data requests · 1e37f2f8

由 Ilya Dryomov 提交于 11月 06, 2017

rbd_img_obj_exists_submit() and rbd_img_obj_parent_read_full() are on
the writeback path for cloned images -- we attempt a stat on the parent
object to see if it exists and potentially read it in to call copyup.
GFP_NOIO should be used instead of GFP_KERNEL here.

Cc: stable@vger.kernel.org
Link: http://tracker.ceph.com/issues/22014Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NDavid Disseldorp <ddiss@suse.de>

1e37f2f8

07 9月, 2017 1 次提交

rbd: silence bogus uninitialized use warning in rbd_acquire_lock() · 37f13252

由 Kefeng Wang 提交于 7月 13, 2017

  drivers/block/rbd.c: In function 'rbd_acquire_lock':
  drivers/block/rbd.c:3602:44: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Silence the warning, found it when built old kernel(3.10) with
OBS(opensuse build service).
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

37f13252

19 6月, 2017 1 次提交

rbd: use bio_clone_fast() instead of bio_clone() · f856dc36

由 NeilBrown 提交于 6月 18, 2017

bio_clone() makes a copy of the bi_io_vec, but rbd never changes that,
so there is no need for a copy.
bio_clone_fast() can be used instead, which avoids making the copy.

This requires that we provide a bio_set.  bio_clone() uses fs_bio_set,
but it isn't, in general, safe to use the same bio_set at different
levels of the stack, as that can lead to deadlocks.  As filesystems
use fs_bio_set, block devices shouldn't.

As rbd never stacks, it is safe to have a single global bio_set for
all rbd devices to use.  So allocate that when the module is
initialised, and use it with bio_clone_fast().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f856dc36

09 6月, 2017 2 次提交

blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653

由 Christoph Hellwig 提交于 6月 03, 2017

Use the same values for use for request completion errors as the return
value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
a requeue, and all the others are completed as-is.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fc17b653

block: introduce new block status code type · 2a842aca

由 Christoph Hellwig 提交于 6月 03, 2017

Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.

For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.

blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2a842aca

29 5月, 2017 1 次提交

rbd: implement REQ_OP_WRITE_ZEROES · 6ac56951

由 Ilya Dryomov 提交于 5月 22, 2017

Commit 93c1defe ("rbd: remove the discard_zeroes_data flag")
explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
following commit 48920ff2 ("block: remove the discard_zeroes_data
flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.

rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
release either some or all blocks depending on whether the zeroing
request is rbd_obj_bytes() aligned.  This is how we currently implement
discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
now.  Caveats:

- REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
  current implementations - nvme and loop

- there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
  is hence less helpful than blk_bio_discard_split(), but this can (and
  should) be fixed on the rbd side

In the future we will split these into two code paths to respect
REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
released on discard.

Fixes: 93c1defe ("rbd: remove the discard_zeroes_data flag")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

6ac56951

09 5月, 2017 1 次提交

fs: ceph: CURRENT_TIME with ktime_get_real_ts() · 1134e091

由 Deepa Dinamani 提交于 5月 08, 2017

CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.comSigned-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
M:	Ilya Dryomov <idryomov@gmail.com>
M:	"Yan, Zheng" <zyan@redhat.com>
M:	Sage Weil <sage@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1134e091

04 5月, 2017 10 次提交

rbd: exclusive map option · e010dd0a

由 Ilya Dryomov 提交于 4月 13, 2017

Support disabling automatic exclusive lock transfers to allow users
to be in charge of which node should own the lock while being able to
reuse exclusive lock's built-in blacklist/break-lock functionality.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

e010dd0a

rbd: return ResponseMessage result from rbd_handle_request_lock() · 3b77faa0

由 Ilya Dryomov 提交于 4月 13, 2017

Right now it's just 0, but "no automatic exclusive lock transfers" mode
code will need -EROFS.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

3b77faa0

rbd: kill rbd_is_lock_supported() · f9bebd58

由 Ilya Dryomov 提交于 4月 13, 2017

Currently the exclusive lock is acquired only if the mapping is
writable, i.e. an image HEAD mapped in rw mode.  This means that we
don't acquire the lock for executing a read from a snapshot or an image
HEAD mapped in ro mode, even if lock_on_read is set.  This is somewhat
weird and inconsistent with "no automatic exclusive lock transfers"
mode, where the lock is acquired unconditionally.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

f9bebd58

rbd: support updating the lock cookie without releasing the lock · 14bb211d

由 Ilya Dryomov 提交于 4月 13, 2017

As we no longer release the lock before potentially raising BLACKLISTED
in rbd_reregister_watch(), the "either locked or blacklisted" assert in
rbd_queue_workfn() needs to go: we can be both locked and blacklisted
at that point now.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

14bb211d

rbd: store lock cookie · cbbfb0ff

由 Ilya Dryomov 提交于 4月 13, 2017

In preparation for supporting set_cookie method (or rather set_cookie
fallback for older OSDs), store the lock cookie on lock and use it on
unlock instead of recalculating from rbd_dev->watch_cookie.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

cbbfb0ff

rbd: ignore unlock errors · bbead745

由 Ilya Dryomov 提交于 4月 13, 2017

Currently the lock_state is set to UNLOCKED (preventing further I/O),
but RELEASED_LOCK notification isn't sent.  Be consistent with userspace
and treat ceph_cls_unlock() errors as the image is unlocked.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

bbead745

rbd: fix error handling around rbd_init_disk() · 5769ed0c

由 Ilya Dryomov 提交于 4月 13, 2017

add_disk() takes an extra reference on disk->queue, which is put in
put_disk() -> disk_release().  Avoiding blk_cleanup_queue() (which also
puts the queue) until add_disk() sets GENHD_FL_UP works for the queue
itself, but leaks various queue internals.  Conditioning tag_set freeing
on GENHD_FL_UP is wrong too: all error paths after rbd_init_disk() leak
the tag_set.

Move the final "announce" steps out of rbd_dev_device_setup() so that
it can be unwound like any other function.  Leave "announce" steps to
do_rbd_add/remove().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

5769ed0c

rbd: move rbd_unregister_watch() call into rbd_dev_image_release() · fd22aef8

由 Ilya Dryomov 提交于 4月 13, 2017

rbd_dev->disk tear down vs rbd_watch_cb() race shouldn't be a problem
anymore thanks to EXISTS and REMOVING checks in rbd_dev_update_size().
A similar race could occur on "rbd map", see commit 811c6688
("rbd: fix rbd map vs notify races").
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

fd22aef8

rbd: move rbd_dev_destroy() call out of rbd_dev_image_release() · 8b679ec5

由 Ilya Dryomov 提交于 4月 13, 2017

... to simplify error handling in do_rbd_add().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

8b679ec5

libceph, ceph: always advertise all supported features · 74da4a0f

由 Ilya Dryomov 提交于 3月 03, 2017

No reason to hide CephFS-specific features in the rbd case.  Recent
feature bits mix RADOS and CephFS-specific stuff together anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

74da4a0f

02 5月, 2017 1 次提交

blk-mq: update ->init_request and ->exit_request prototypes · d6296d39

由 Christoph Hellwig 提交于 5月 01, 2017

Remove the request_idx parameter, which can't be used safely now that we
support I/O schedulers with blk-mq.  Except for a superflous check in
mtip32xx it was unused anyway.

Also pass the tag_set instead of just the driver data - this allows drivers
to avoid some code duplication in a follow on cleanup.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d6296d39

09 4月, 2017 1 次提交

rbd: remove the discard_zeroes_data flag · 93c1defe

由 Christoph Hellwig 提交于 4月 05, 2017

rbd only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

93c1defe

31 3月, 2017 1 次提交

blk-mq: constify struct blk_mq_ops · f363b089

由 Eric Biggers 提交于 3月 30, 2017

Constify all instances of blk_mq_ops, as they are never modified.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f363b089

07 3月, 2017 1 次提交

rbd: supported_features bus attribute · 8767b293

由 Ilya Dryomov 提交于 3月 02, 2017

... so that userspace can generate meaningful error messages and spell
out unsupported features that need to be disabled.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

8767b293

25 2月, 2017 1 次提交

libceph, rbd, ceph: WRITE | ONDISK -> WRITE · 54ea0046

由 Ilya Dryomov 提交于 2月 11, 2017

CEPH_OSD_FLAG_ONDISK is set in account_request().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

54ea0046

20 2月, 2017 14 次提交

rbd: constify device_type structure · b9942bc9

由 Bhumika Goyal 提交于 2月 11, 2017

Declare device_type structure as const as it is only stored in the
type field of a device structure. This field is of type const, so add
const to the declaration of device_type structure.

File size before:
   text	   data	    bss	    dec	    hex	filename
  61546	  11610	    208	  73364	  11e94	drivers/block/rbd.o

File size after:
   text	   data	    bss	    dec	    hex	filename
  61610	  11578	    208	  73396	  11eb4	drivers/block/rbd.o
Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b9942bc9

I
rbd: kill obj_request->object_name and rbd_segment_name_cache · 6c696d85
由 Ilya Dryomov 提交于 1月 25, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>
```
6c696d85

rbd: store and use obj_request->object_no · a90bb0c1