提交 · fe5478e0f6694312ad17dea7083296c1aea0a049 · openeuler / Kernel

20 2月, 2017 36 次提交

rbd: do away with obj_request in rbd_obj_read_sync() · fe5478e0

由 Ilya Dryomov 提交于 1月 25, 2017

rbd_obj_request machinery is completely unnecessary here; all that's
being done is fetching a metadata object - no striping, cloning, etc.
More importantly, rbd_osd_req_create() grabs pool id from layout and
that is becoming a data pool id.

Kill offset argument - all metadata objects are small and read in full.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

fe5478e0

rbd: initialize rbd_dev->header_oloc early · 431a02cd

由 Ilya Dryomov 提交于 1月 25, 2017

No reason to delay it until image_id is known.  This will be required
by some rbd_obj_method_sync() callers, after rbd_obj_method_sync() is
changed to take oloc.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

431a02cd

rbd: kill rbd_image_header::{crypt_type,comp_type} · 24dca799

由 Ilya Dryomov 提交于 1月 25, 2017

Image format 1 is deprecated and format 2 doesn't have these.  Also,
__rbd_dev_create() takes care of zeroing (or otherwise initializing)
format 2 specific fields.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

24dca799

rbd: use kstrndup() in rbd_header_from_disk() · 848d796c

由 Ilya Dryomov 提交于 1月 25, 2017

Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

848d796c

libceph: bump CEPH_PG_MAX_SIZE to 32 · 083a51fb

由 Ilya Dryomov 提交于 2月 09, 2017

... to accommodate potentially very wide EC pools.  This increases the
size of a typical rbd ceph_osd_request by ~12% (from 1040 to 1168 bytes),
but I'd rather go future proof here.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

083a51fb

libceph: don't go through with the mapping if the PG is too wide · ef9324bb

由 Ilya Dryomov 提交于 2月 08, 2017

With EC overwrites maturing, the kernel client will be getting exposed
to potentially very wide EC pools. While "min(pi->size, X)" works fine
when the cluster is stable and happy, truncating OSD sets interferes
with resend logic (ceph_is_new_interval(), etc). Abort the mapping if
the pool is too wide, assigning the request to the homeless session.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

ef9324bb

crush: merge working data and scratch · 743efcff

由 Ilya Dryomov 提交于 1月 31, 2017

Much like Arlo Guthrie, I decided that one big pile is better than two
little piles.

Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

743efcff

crush: remove mutable part of CRUSH map · 66a0e2d5

由 Ilya Dryomov 提交于 1月 31, 2017

Then add it to the working state. It would be very nice if we didn't
have to take a lock to calculate a crush placement. By moving the
permutation array into the working data, we can treat the CRUSH map as
immutable.

Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

66a0e2d5

libceph: add osdmap_set_crush() helper · 1b6a78b5

由 Ilya Dryomov 提交于 1月 31, 2017

Simplify osdmap_decode() and osdmap_apply_incremental() a bit.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1b6a78b5

libceph: remove unneeded stddef.h include · 19def166

由 Stafford Horne 提交于 2月 05, 2017

This was causing a build failure for openrisc when using musl and
gcc 5.4.0 since the file is not available in the toolchain.

It doesnt seem this is needed and removing it does not cause any build
warnings for me.
Signed-off-by: NStafford Horne <shorne@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

19def166

ceph: do a LOOKUP in d_revalidate instead of GETATTR · 5eb9f604

由 Jeff Layton 提交于 1月 30, 2017

In commit c3f4688a (ceph: don't set req->r_locked_dir in
ceph_d_revalidate), we changed the code to do a GETATTR instead of a
LOOKUP as the parent info isn't strictly necessary to revalidate the
dentry. What we missed there though is that in order to update the lease
on the dentry after revalidating it, we _do_ need parent info.

Change ceph_d_revalidate back to doing a LOOKUP instead of a GETATTR so
that we can get the parent info in order to update the lease from
ceph_fill_trace. Note that we set req->r_parent here, but we cannot set
the CEPH_MDS_R_PARENT_LOCKED flag as we can't guarantee that it is.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5eb9f604

ceph: call update_dentry_lease even when r_locked dir is not set · cdde7c43

由 Jeff Layton 提交于 1月 27, 2017

We don't really require that the parent be locked in order to update the
lease on a dentry. Lease info is protected by the d_lock. In the event
that the parent is not locked in ceph_fill_trace, and we have both
parent and target info, go ahead and update the dentry lease.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

cdde7c43

ceph: vet the target and parent inodes before updating dentry lease · f5d55f03

由 Jeff Layton 提交于 1月 27, 2017

In a later patch, we're going to need to allow ceph_fill_trace to
update the dentry's lease when the parent is not locked. This is
potentially racy though -- by the time we get around to processing the
trace, the parent may have already changed.

Change update_dentry_lease to take a ceph_vino pointer and use that to
ensure that the dentry's parent still matches it before updating the
lease.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f5d55f03

ceph: don't update_dentry_lease unless we actually got one · 80d025ff

由 Jeff Layton 提交于 1月 26, 2017

This if block updates the dentry lease even in the case where
the MDS didn't grant one.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

80d025ff

ceph: add a new flag to indicate whether parent is locked · 3dd69aab

由 Jeff Layton 提交于 1月 31, 2017

struct ceph_mds_request has an r_locked_dir pointer, which is set to
indicate the parent inode and that its i_rwsem is locked.  In some
critical places, we need to be able to indicate the parent inode to the
request handling code, even when its i_rwsem may not be locked.

Most of the code that operates on r_locked_dir doesn't require that the
i_rwsem be locked. We only really need it to handle manipulation of the
dcache. The rest (filling of the inode, updating dentry leases, etc.)
already has its own locking.

Add a new r_req_flags bit that indicates whether the parent is locked
when doing the request, and rename the pointer to "r_parent". For now,
all the places that set r_parent also set this flag, but that will
change in a later patch.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3dd69aab

ceph: convert bools in ceph_mds_request to a new r_req_flags field · bc2de10d

由 Jeff Layton 提交于 2月 01, 2017

Currently, we have a bunch of bool flags in struct ceph_mds_request. We
need more flags though, but each bool takes (at least) a byte. Those
add up over time.

Merge all of the existing bools in this struct into a single unsigned
long, and use the set/test/clear_bit macros to manipulate them. These
are atomic operations, but that is required here to prevent
load/modify/store races. The existing flags are protected by different
locks, so we can't rely on them for that purpose.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bc2de10d

ceph: drop session argument to ceph_fill_trace · f5a03b08

由 Jeff Layton 提交于 1月 31, 2017

Just get it from r_session since that's what's always passed in.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f5a03b08

ceph: remove "Debugging hook" from ceph_fill_trace · 6fffaef9

由 Jeff Layton 提交于 1月 31, 2017

Keeping around commented out code is just asking for it to bitrot and
makes viewing the code under cscope more confusing.  If
we really need this, then we can revert this patch and put it under a
Kconfig option.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6fffaef9

ceph: avoid calling ceph_renew_caps() infinitely · c1944fed

由 Yan, Zheng 提交于 1月 29, 2017

__ceph_caps_mds_wanted() ignores caps from stale session. So the
return value of __ceph_caps_mds_wanted() can keep the same across
ceph_renew_caps(). This causes try_get_cap_refs() to keep calling
ceph_renew_caps(). The fix is ignore the session valid check for
the try_get_cap_refs() case. If session is stale, just let the
caps requester sleep.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

c1944fed

ceph: make sure flushing inode in proper session's cap_flushing list · 00f06cba

由 Yan, Zheng 提交于 1月 24, 2017

when flushing inode's auth cap changes, we need to move it into the
new auth cap session's cap_flushing list
Signed-off-by: NYan, Zheng <zyan@redhat.com>

00f06cba

ceph: update readpages osd request according to size of pages · d641df81

由 Yan, Zheng 提交于 1月 19, 2017

add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

d641df81

ceph: fix bogus endianness change in ceph_ioctl_set_layout · 24c149ad

由 Jeff Layton 提交于 1月 12, 2017

sparse says:

    fs/ceph/ioctl.c:100:28: warning: cast to restricted __le64

preferred_osd is a __s64 so we don't need to do any conversion. Also,
just remove the cast in ceph_ioctl_get_layout as it's not needed.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

24c149ad

libceph: include linux/sched.h into crypto.c directly · 7fea24c6

由 Ilya Dryomov 提交于 1月 16, 2017

Currently crypto.c gets linux/sched.h indirectly through linux/slab.h
from linux/kasan.h. Include it directly for memalloc_noio_*() inlines.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7fea24c6

libceph: use BUG() instead of BUG_ON(1) · d24cdcd3

由 Arnd Bergmann 提交于 1月 16, 2017

I ran into this compile warning, which is the result of BUG_ON(1)
not always leading to the compiler treating the code path as
unreachable:

    include/linux/ceph/osdmap.h: In function 'ceph_can_shift_osds':
    include/linux/ceph/osdmap.h:62:1: error: control reaches end of non-void function [-Werror=return-type]

Using BUG() here avoids the warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d24cdcd3

ceph: avoid updating mds_wanted too frequently · eb65b919

由 Yan, Zheng 提交于 1月 12, 2017

user space may open/close single file frequently. It's not good
to send a clientcaps message to mds for each open/close syscall.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

eb65b919

ceph: set io_pages bdi hint · 7c94ba27

由 Andreas Gerstmayr 提交于 1月 10, 2017

This patch sets the io_pages bdi hint based on the rsize mount option.
Without this patch large buffered reads (request size > max readahead)
are processed sequentially in chunks of the readahead size (i.e. read
requests are sent out up to the readahead size, then the
do_generic_file_read() function waits until the first page is received).

With this patch read requests are sent out at once up to the size
specified in the rsize mount option (default: 64 MB).
Signed-off-by: NAndreas Gerstmayr <andreas.gerstmayr@catalysts.cc>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

7c94ba27

ceph: fix spelling mistake: "enabing" -> "enabling" · 0fbc5360

由 Colin Ian King 提交于 12月 29, 2016

trivial fix to spelling mistake in debug message
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

0fbc5360

ceph: cleanup ACCESS_ONCE -> READ_ONCE · 52953d55

由 Seraphime Kirkovski 提交于 12月 26, 2016

This removes the uses of ACCESS_ONCE in favor of READ_ONCE
Signed-off-by: NSeraphime Kirkovski <kirkseraph@gmail.com>
Signed-off-by: NYan, Zheng <zyan@redhat.com>

52953d55

ceph: pass parent inode info to ceph_encode_dentry_release if we have it · ca6c8ae0