提交 · 8e48cf00c48fdefb01f70db81f31438cd0c29dcc · openeuler / Kernel

07 7月, 2017 5 次提交

libceph: new pi->last_force_request_resend · 8e48cf00

由 Ilya Dryomov 提交于 6月 05, 2017

The old (v15) pi->last_force_request_resend has been repurposed to
make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do
obey pi->last_force_request_resend resend on splits.  See ceph.git
commit 189ca7ec6420 ("mon/OSDMonitor: make pre-luminous clients resend
ops on split").
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

8e48cf00

I
libceph: fold [l]req->last_force_resend into ceph_osd_request_target · dc93e0e2
由 Ilya Dryomov 提交于 6月 05, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
dc93e0e2

libceph: support SERVER_JEWEL feature bits · 220abf5a

由 Ilya Dryomov 提交于 6月 05, 2017

Only MON_STATEFUL_SUB, really.  MON_ROUTE_OSDMAP and
OSDSUBOP_NO_SNAPCONTEXT are irrelevant.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

220abf5a

I
libceph: handle non-empty dest in ceph_{oloc,oid}_copy() · ca35ffea
由 Ilya Dryomov 提交于 6月 05, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
ca35ffea

libceph: remove ceph_sanitize_features() workaround · dcbbd97c

由 Ilya Dryomov 提交于 6月 05, 2017

Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

dcbbd97c

25 5月, 2017 1 次提交

libceph: cleanup old messages according to reconnect seq · 0a2ad541

由 Yan, Zheng 提交于 5月 05, 2017

when reopen a connection, use 'reconnect seq' to clean up
messages that have already been received by peer.

Link: http://tracker.ceph.com/issues/18690Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0a2ad541

24 5月, 2017 6 次提交

libceph: NULL deref on crush_decode() error path · 293dffaa

由 Dan Carpenter 提交于 5月 23, 2017

If there is not enough space then ceph_decode_32_safe() does a goto bad.
We need to return an error code in that situation.  The current code
returns ERR_PTR(0) which is NULL.  The callers are not expecting that
and it results in a NULL dereference.

Fixes: f24e9980 ("ceph: OSD client")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

293dffaa

libceph: fix error handling in process_one_ticket() · b51456a6

由 Ilya Dryomov 提交于 5月 19, 2017

Don't leak key internals after new_session_key is populated.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

b51456a6

libceph: validate blob_struct_v in process_one_ticket() · d18a1247

由 Ilya Dryomov 提交于 5月 19, 2017

None of these are validated in userspace, but since we do validate
reply_struct_v in ceph_x_proc_ticket_reply(), tkt_struct_v (first) and
CephXServiceTicket struct_v (second) in process_one_ticket(), validate
CephXTicketBlob struct_v as well.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d18a1247

libceph: drop version variable from ceph_monmap_decode() · f3b4e55d

由 Ilya Dryomov 提交于 5月 19, 2017

It's set but not used: CEPH_FEATURE_MONNAMES feature bit isn't
advertised, which guarantees a v1 MonMap.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

f3b4e55d

libceph: make ceph_msg_data_advance() return void · 1759f7b0

由 Ilya Dryomov 提交于 5月 19, 2017

Both callers ignore the returned bool.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

1759f7b0

I
libceph: use kbasename() and kill ceph_file_part() · 6f4dbd14
由 Ilya Dryomov 提交于 5月 19, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
6f4dbd14

09 5月, 2017 2 次提交

fs: ceph: CURRENT_TIME with ktime_get_real_ts() · 1134e091

由 Deepa Dinamani 提交于 5月 08, 2017

CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.comSigned-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
M:	Ilya Dryomov <idryomov@gmail.com>
M:	"Yan, Zheng" <zyan@redhat.com>
M:	Sage Weil <sage@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1134e091

mm, vmalloc: use __GFP_HIGHMEM implicitly · 19809c2d

由 Michal Hocko 提交于 5月 08, 2017

__vmalloc* allows users to provide gfp flags for the underlying
allocation.  This API is quite popular

  $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
  77

The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space.  About half of users don't
use this flag, though.  This signals that we make the API unnecessarily
too complex.

This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space.  Current users which add __GFP_HIGHMEM
are simplified and drop the flag.

Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Cristopher Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

19809c2d

04 5月, 2017 10 次提交

rbd: support updating the lock cookie without releasing the lock · 14bb211d

由 Ilya Dryomov 提交于 4月 13, 2017

As we no longer release the lock before potentially raising BLACKLISTED
in rbd_reregister_watch(), the "either locked or blacklisted" assert in
rbd_queue_workfn() needs to go: we can be both locked and blacklisted
at that point now.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

14bb211d

libceph: add an epoch_barrier field to struct ceph_osd_client · 58eb7932

由 Jeff Layton 提交于 4月 18, 2017

Cephfs can get cap update requests that contain a new epoch barrier in
them. When that happens we want to pause all OSD traffic until the right
map epoch arrives.

Add an epoch_barrier field to ceph_osd_client that is protected by the
osdc->lock rwsem. When the barrier is set, and the current OSD map
epoch is below that, pause the request target when submitting the
request or when revisiting it. Add a way for upper layers (cephfs)
to update the epoch_barrier as well.

If we get a new map, compare the new epoch against the barrier before
kicking requests and request another map if the map epoch is still lower
than the one we want.

If we get a map with a full pool, or at quota condition, then set the
barrier to the current epoch value.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

58eb7932

libceph: abort already submitted but abortable requests when map or pool goes full · fc36d0a4

由 Jeff Layton 提交于 4月 04, 2017

When a Ceph volume hits capacity, a flag is set in the OSD map to
indicate that, and a new map is sprayed around the cluster. With cephfs
we want it to shut down any abortable requests that are in progress with
an -ENOSPC error as they'd just hang otherwise.

Add a new ceph_osdc_abort_on_full helper function to handle this. It
will first check whether there is an out-of-space condition in the
cluster and then walk the tree and abort any request that has
r_abort_on_full set with a -ENOSPC error. Call this new function
directly whenever we get a new OSD map.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fc36d0a4

libceph: allow requests to return immediately on full conditions if caller wishes · a1f4020a

由 Jeff Layton 提交于 4月 04, 2017

Usually, when the osd map is flagged as full or the pool is at quota,
write requests just hang. This is not what we want for cephfs, where
it would be better to simply report -ENOSPC back to userland instead
of stalling.

If the caller knows that it will want an immediate error return instead
of blocking on a full or at-quota error condition then allow it to set a
flag to request that behavior.

Set that flag in ceph_osdc_new_request (since ceph.ko is the only caller),
and on any other write request from ceph.ko.

A later patch will deal with requests that were submitted before the new
map showing the full condition came in.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a1f4020a

libceph: remove req->r_replay_version · aa26d662

由 Jeff Layton 提交于 4月 04, 2017

Nothing uses this anymore with the removal of the ack vs. commit code.
Remove the field and just encode zeroes into place in the request
encoding.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

aa26d662

libceph: convert ceph_pagelist.refcnt from atomic_t to refcount_t · 0e1a5ee6

由 Elena Reshetova 提交于 3月 17, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0e1a5ee6

libceph: convert ceph_osd.o_ref from atomic_t to refcount_t · 02113a0f

由 Elena Reshetova 提交于 3月 17, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

02113a0f

libceph: convert ceph_snap_context.nref from atomic_t to refcount_t · 06dfa963

由 Elena Reshetova 提交于 3月 17, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

06dfa963

libceph: supported_features module parameter · d6a3408a

由 Ilya Dryomov 提交于 3月 03, 2017

Add a readonly, exported to sysfs module parameter so that userspace
can generate meaningful error messages.  It's a bit funky, but there is
no other libceph-specific place.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d6a3408a

libceph, ceph: always advertise all supported features · 74da4a0f

由 Ilya Dryomov 提交于 3月 03, 2017

No reason to hide CephFS-specific features in the rbd case.  Recent
feature bits mix RADOS and CephFS-specific stuff together anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

74da4a0f

23 3月, 2017 1 次提交

libceph: force GFP_NOIO for socket allocations · 633ee407

由 Ilya Dryomov 提交于 3月 21, 2017

sock_alloc_inode() allocates socket+inode and socket_wq with
GFP_KERNEL, which is not allowed on the writeback path:

    Workqueue: ceph-msgr con_work [libceph]
    ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
    0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
    ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
    Call Trace:
    [<ffffffff816dd629>] schedule+0x29/0x70
    [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
    [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
    [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
    [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
    [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
    [<ffffffff81086335>] flush_work+0x165/0x250
    [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
    [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
    [<ffffffff816d6b42>] ? __slab_free+0xee/0x234
    [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
    [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
    [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
    [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
    [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
    [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
    [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
    [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
    [<ffffffff811c0c18>] super_cache_scan+0x178/0x180
    [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
    [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
    [<ffffffff8115af70>] shrink_slab+0x100/0x140
    [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
    [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
    [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
    [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
    [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
    [<ffffffff811a0ac5>] new_slab+0x2c5/0x390
    [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
    [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
    [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
    [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
    [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
    [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
    [<ffffffff811d8566>] alloc_inode+0x26/0xa0
    [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
    [<ffffffff815b933e>] sock_alloc+0x1e/0x80
    [<ffffffff815ba855>] __sock_create+0x95/0x220
    [<ffffffff815baa04>] sock_create_kern+0x24/0x30
    [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
    [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
    [<ffffffff81084c19>] process_one_work+0x159/0x4f0
    [<ffffffff8108561b>] worker_thread+0x11b/0x530
    [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
    [<ffffffff8108b6f9>] kthread+0xc9/0xe0
    [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
    [<ffffffff816e1b98>] ret_from_fork+0x58/0x90
    [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90

Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.

Cc: stable@vger.kernel.org # 3.10+, needs backporting
Link: http://tracker.ceph.com/issues/19309Reported-by: NSergey Jerusalimov <wintchester@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

633ee407

07 3月, 2017 3 次提交

libceph: osd_request_timeout option · 7cc5e38f

由 Ilya Dryomov 提交于 2月 12, 2017

osd_request_timeout specifies how many seconds to wait for a response
from OSDs before returning -ETIMEDOUT from an OSD request.  0 (default)
means no limit.

osd_request_timeout is osdkeepalive-precise -- in-flight requests are
swept through every osdkeepalive seconds.  With ack vs commit behaviour
gone, abort_request() is really simple.

This is based on a patch from Artur Molchanov <artur.molchanov@synesis.ru>.
Tested-by: NArtur Molchanov <artur.molchanov@synesis.ru>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

7cc5e38f

libceph: don't set weight to IN when OSD is destroyed · b581a585

由 Ilya Dryomov 提交于 3月 01, 2017

Since ceph.git commit 4e28f9e63644 ("osd/OSDMap: clear osd_info,
osd_xinfo on osd deletion"), weight is set to IN when OSD is deleted.
This changes the result of applying an incremental for clients, not
just OSDs. Because CRUSH computations are obviously affected,
pre-4e28f9e63644 servers disagree with post-4e28f9e63644 clients on
object placement, resulting in misdirected requests.

Mirrors ceph.git commit a6009d1039a55e2c77f431662b3d6cc5a8e8e63f.

Fixes: 930c5328 ("libceph: apply new_state before new_up_client on incrementals")
Link: http://tracker.ceph.com/issues/19122Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b581a585

libceph: fix crush_decode() for older maps · 9afd30db

由 Ilya Dryomov 提交于 2月 28, 2017

Older (shorter) CRUSH maps too need to be finalized.

Fixes: 66a0e2d5 ("crush: remove mutable part of CRUSH map")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9afd30db

02 3月, 2017 1 次提交

sched/headers: Prepare to move the memalloc_noio_*() APIs to <linux/sched/mm.h> · 5b3cc15a

由 Ingo Molnar 提交于 2月 02, 2017

Update the .c files that depend on these APIs.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

5b3cc15a

25 2月, 2017 2 次提交

libceph, rbd, ceph: WRITE | ONDISK -> WRITE · 54ea0046

由 Ilya Dryomov 提交于 2月 11, 2017

CEPH_OSD_FLAG_ONDISK is set in account_request().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

54ea0046

libceph: get rid of ack vs commit · b18b9550

由 Ilya Dryomov 提交于 2月 11, 2017

- CEPH_OSD_FLAG_ACK shouldn't be set anymore, so assert on it
- remove support for handling ack replies (OSDs will send ack replies
  only if clients request them)
- drop the "do lingering callbacks under osd->lock" logic from
  handle_reply() -- lreq->lock is sufficient in all three cases
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b18b9550

24 2月, 2017 2 次提交

crush: fix dprintk compilation · 7ba0487c

由 Ilya Dryomov 提交于 2月 16, 2017

The syntax error was not noticed because dprintk is a macro
and the code is discarded by default.

Reflects ceph.git commit f29b840c64a933b2cb13e3da6f3d785effd73a57.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7ba0487c

crush: do is_out test only if we do not collide · 98ba6af7

由 Ilya Dryomov 提交于 2月 16, 2017

The is_out() test may require an additional hashing operation, so we
should skip it whenever possible.

Reflects ceph.git commit db107cc7f15cf2481894add325dc93e33479f529.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

98ba6af7

20 2月, 2017 7 次提交

libceph: pass reply buffer length through ceph_osdc_call() · 2544a020

由 Ilya Dryomov 提交于 1月 25, 2017

To spare checking for "this reply fits into a page, but does it fit
into my buffer?" in some callers, osd_req_op_cls_response_data_pages()
needs to know how big it is.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJason Dillaman <dillaman@redhat.com>

2544a020

libceph: don't go through with the mapping if the PG is too wide · ef9324bb

由 Ilya Dryomov 提交于 2月 08, 2017

With EC overwrites maturing, the kernel client will be getting exposed
to potentially very wide EC pools. While "min(pi->size, X)" works fine
when the cluster is stable and happy, truncating OSD sets interferes
with resend logic (ceph_is_new_interval(), etc). Abort the mapping if
the pool is too wide, assigning the request to the homeless session.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

ef9324bb

crush: merge working data and scratch · 743efcff

由 Ilya Dryomov 提交于 1月 31, 2017

Much like Arlo Guthrie, I decided that one big pile is better than two
little piles.

Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

743efcff

crush: remove mutable part of CRUSH map · 66a0e2d5

由 Ilya Dryomov 提交于 1月 31, 2017

Then add it to the working state. It would be very nice if we didn't
have to take a lock to calculate a crush placement. By moving the
permutation array into the working data, we can treat the CRUSH map as
immutable.

Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

66a0e2d5

libceph: add osdmap_set_crush() helper · 1b6a78b5

由 Ilya Dryomov 提交于 1月 31, 2017

Simplify osdmap_decode() and osdmap_apply_incremental() a bit.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

1b6a78b5

libceph: remove unneeded stddef.h include · 19def166

由 Stafford Horne 提交于 2月 05, 2017

This was causing a build failure for openrisc when using musl and
gcc 5.4.0 since the file is not available in the toolchain.

It doesnt seem this is needed and removing it does not cause any build
warnings for me.
Signed-off-by: NStafford Horne <shorne@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

19def166

ceph: update readpages osd request according to size of pages · d641df81

由 Yan, Zheng 提交于 1月 19, 2017

add_to_page_cache_lru() can fails, so the actual pages to read
can be smaller than the initial size of osd request. We need to
update osd request size in that case.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

d641df81

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功