提交 · 70db4f3629b3476cf506be869ef9d15688d2d44a · openanolis / cloud-kernel

30 10月, 2014 2 次提交

rbd: Fix error recovery in rbd_obj_read_sync() · a8d42056

由 Jan Kara 提交于 10月 22, 2014

When we fail to allocate page vector in rbd_obj_read_sync() we just
basically ignore the problem and continue which will result in an oops
later. Fix the problem by returning proper error.

CC: Yehuda Sadeh <yehuda@inktank.com>
CC: Sage Weil <sage@inktank.com>
CC: ceph-devel@vger.kernel.org
CC: stable@vger.kernel.org
Coverity-id: 1226882
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

a8d42056

rbd: use a single workqueue for all devices · f5ee37bd

由 Ilya Dryomov 提交于 10月 09, 2014

Using one queue per device doesn't make much sense given that our
workfn processes "devices" and not "requests".  Switch to a single
workqueue for all devices.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

f5ee37bd

15 10月, 2014 14 次提交

rbd: rbd workqueues need a resque worker · 792c3a91

由 Ilya Dryomov 提交于 10月 10, 2014

Need to use WQ_MEM_RECLAIM for our workqueues to prevent I/O lockups
under memory pressure - we sit on the memory reclaim path.

Cc: stable@vger.kernel.org # 3.17, needs backporting for 3.16
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Tested-by: NMicha Krause <micha@krausam.de>
Reviewed-by: NSage Weil <sage@redhat.com>

792c3a91

rbd: set the remaining discard properties to enable support · b76f8239

由 Josh Durgin 提交于 4月 07, 2014

max_discard_sectors must be set for the queue to support discard.
Operations implementing discard for rbd zero data, so report that.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

b76f8239

rbd: use helpers to handle discard for layered images correctly · d3246fb0

由 Josh Durgin 提交于 4月 07, 2014

Only allocate two osd ops for discard requests, since the
preallocation hint is only added for regular writes.  Use
rbd_img_obj_request_fill() to recreate the original write or discard
osd operations, isolating that logic to one place, and change the
assert in rbd_osd_req_create_copyup() to accept discard requests as
well.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

d3246fb0

rbd: extract a method for adding object operations · 3b434a2a

由 Josh Durgin 提交于 4月 04, 2014

rbd_img_request_fill() creates a ceph_osd_request and has logic for
adding the appropriate osd ops to it based on the request type and
image properties.

For layered images, the original rbd_obj_request is resent with a
copyup operation in front, using a new ceph_osd_request. The logic for
adding the original operations should be the same as when first
sending them, so move it to a helper function.

op_type only needs to be checked once, so create a helper for that as
well and call it outside the loop in rbd_img_request_fill().
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

3b434a2a

rbd: make discard trigger copy-on-write · 1c220881

由 Josh Durgin 提交于 4月 04, 2014

Discard requests are a form of write, so they should go through the
same process as plain write requests and trigger copy-on-write for
layered images.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

1c220881

rbd: tolerate -ENOENT for discard operations · d0265de7

由 Josh Durgin 提交于 4月 07, 2014

Discard may try to delete an object from a non-layered image that does not exist.
If this occurs, the image already has no data in that range, so change the
result to success.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

d0265de7

rbd: fix snapshot context reference count for discards · bef95455

由 Josh Durgin 提交于 4月 04, 2014

Discards take a reference to the snapshot context of an image when
they are created.  This reference needs to be cleaned up when the
request is done just as it is for regular writes.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

bef95455

rbd: read image size for discard check safely · 3c5df893

由 Josh Durgin 提交于 4月 04, 2014

In rbd_img_request_fill() the image size is only checked to determine
whether we can truncate an object instead of zeroing it for discard
requests. Take rbd_dev->header_rwsem while reading the image size, and
move this read into the discard check, so that non-discard ops don't
need to take the semaphore in this function.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

3c5df893

rbd: initial discard bits from Guangliang Zhao · 90e98c52

由 Guangliang Zhao 提交于 4月 01, 2014

This patch add the discard support for rbd driver.

There are three types operation in the driver:
1. The objects would be removed if they completely contained
   within the discard range.
2. The objects would be truncated if they partly contained within
   the discard range, and align with their boundary.
3. Others would be zeroed.

A discard request from blkdev_issue_discard() is defined which
REQ_WRITE and REQ_DISCARD both marked and no data, so we must
check the REQ_DISCARD first when getting the request type.

This resolve:
	http://tracker.ceph.com/issues/190

[ Ilya Dryomov: This is incomplete and somewhat buggy, see follow up
  commits by Josh Durgin for refinements and fixes which weren't
  folded in to preserve authorship. ]
Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

90e98c52

rbd: extend the operation type · 6d2940c8

由 Guangliang Zhao 提交于 3月 13, 2014

It could only handle the read and write operations now,
extend it for the coming discard support.
Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

6d2940c8

rbd: skip the copyup when an entire object writing · c622d226

由 Guangliang Zhao 提交于 4月 01, 2014

It need to copyup the parent's content when layered writing,
but an entire object write would overwrite it, so skip it.
Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

c622d226

rbd: add img_obj_request_simple() helper · 70d045f6

由 Ilya Dryomov 提交于 9月 12, 2014

To clarify the conditions and make it easier to add new ones.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

70d045f6

rbd: access snapshot context and mapping size safely · 4e752f0a

由 Josh Durgin 提交于 4月 08, 2014

These fields may both change while the image is mapped if a snapshot
is created or deleted or the image is resized. They are guarded by
rbd_dev->header_rwsem, so hold that while reading them, and store a
local copy to refer to outside of the critical section. The local copy
will stay consistent since the snapshot context is reference counted,
and the mapping size is just a u64. This prevents torn loads from
giving us inconsistent values.

Move reading header.snapc into the caller of rbd_img_request_create()
so that we only need to take the semaphore once. The read-only caller,
rbd_parent_request_create() can just pass NULL for snapc, since the
snapshot context is only relevant for writes.
Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>

4e752f0a

rbd: do not return -ERANGE on auth failures · 7dd440c9

由 Ilya Dryomov 提交于 9月 11, 2014

Trying to map an image out of a pool for which we don't have an 'x'
permission bit fails with -ERANGE from ceph_extract_encoded_string()
due to an unsigned vs signed bug.  Fix it and get rid of the -EINVAL
sink, thus propagating rbd::get_id cls method errors.  (I've seen
a bunch of unexplained -ERANGE reports, I bet this is it).
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7dd440c9

10 9月, 2014 2 次提交

rbd: fix error return code in rbd_dev_device_setup() · 255939e7

由 Wei Yongjun 提交于 8月 13, 2014

Fix to return -ENOMEM from the workqueue alloc error handling
case instead of 0, as done elsewhere in this function.
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>

255939e7

rbd: avoid format-security warning inside alloc_workqueue() · 58d1362b

由 Ilya Dryomov 提交于 8月 12, 2014

drivers/block/rbd.c: In function ‘rbd_dev_device_setup’:
drivers/block/rbd.c:5090:19: warning: format not a string literal and no format arguments [-Wformat-security]
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

58d1362b

07 8月, 2014 3 次提交

rbd: remove extra newlines from rbd_warn() messages · 9584d508

由 Ilya Dryomov 提交于 7月 11, 2014

rbd_warn() string should be a single line - rbd_warn() appends \n.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

9584d508

rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC · 7a716aac

由 Ilya Dryomov 提交于 8月 05, 2014

Now that rbd_img_request_create() is called from work functions, no
need to use GFP_ATOMIC.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

7a716aac

rbd: rework rbd_request_fn() · bc1ecc65

由 Ilya Dryomov 提交于 8月 04, 2014

While it was never a good idea to sleep in request_fn(), commit
34c6bc2c ("locking/mutexes: Add extra reschedule point") made it
a *bad* idea.  mutex_lock() since 3.15 may reschedule *before* putting
task on the mutex wait queue, which for tasks in !TASK_RUNNING state
means block forever.  request_fn() may be called with !TASK_RUNNING on
the way to schedule() in io_schedule().

Offload request handling to a workqueue, one per rbd device, to avoid
calling blocking primitives from rbd_request_fn().

Fixes: http://tracker.ceph.com/issues/8818

Cc: stable@vger.kernel.org # 3.16, needs backporting for 3.15
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Tested-by: NEric Eastman <eric0e@aol.com>
Tested-by: NGreg Wilson <greg.wilson@keepertech.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

bc1ecc65

25 7月, 2014 8 次提交

rbd: take snap_id into account when reading in parent info · 4d9b67cd

由 Ilya Dryomov 提交于 7月 24, 2014

If we are mapping a snapshot, we must read in the parent_overlap value
of that snapshot instead of that of the base image.  Not doing so may
in particular result in us returning zeros instead of user data:

    # cat overlap-snap.sh
    #!/bin/bash
    rbd create --size 10 --image-format 2 foo
    FOO_DEV=$(rbd map foo)
    dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
    echo "Base image"
    dd if=$FOO_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    rbd snap create bar@snap
    BAR_DEV=$(rbd map bar@snap)
    echo "Snapshot"
    dd if=$BAR_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
    rbd resize --allow-shrink --size 4 bar
    echo "Snapshot after base image resize"
    dd if=$BAR_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd

    # ./overlap-snap.sh
    Base image
    0000000: e781 e33b d34b 2225 6034 2845 a2e3 36ed  ...;.K"%`4(E..6.
    Snapshot
    0000000: e781 e33b d34b 2225 6034 2845 a2e3 36ed  ...;.K"%`4(E..6.
    Resizing image: 100% complete...done.
    Snapshot after base image resize
    0000000: e781 e33b d34b 2225 0000 0000 0000 0000  ...;.K"%........

Even though bar@snap is taken with the old bar parent_overlap (8M),
reads from bar@snap beyond the new bar parent_overlap (4M) return
zeroes.  Fix it.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

4d9b67cd

rbd: do not read in parent info before snap context · e8f59b59

由 Ilya Dryomov 提交于 7月 24, 2014

Currently rbd_dev_v2_header_info() reads in parent info before the snap
context is read in. This is wrong, because we may need to look at the
the parent_overlap value of the snapshot instead of that of the base
image, for example when mapping a snapshot - see next commit. (When
mapping a snapshot, all we got is its name and we need the snap context
to translate that name into an id to know which parent info to look
for.)

The approach taken here is to make sure rbd_dev_v2_parent_info() is
called after the snap context has been read in. The other approach
would be to add a parent_overlap field to struct rbd_mapping and
maintain it the same way rbd_mapping::size is maintained. The reason
I chose the first approach is that the value of keeping around both
base image values and the actual mapping values is unclear to me.
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e8f59b59

rbd: update mapping size only on refresh · 5ff1108c