提交 · e37fea58f771c2674709099a09ddafd058fef634 · openeuler / Kernel

23 3月, 2017 1 次提交

libceph: force GFP_NOIO for socket allocations · 633ee407

由 Ilya Dryomov 提交于 3月 21, 2017

sock_alloc_inode() allocates socket+inode and socket_wq with
GFP_KERNEL, which is not allowed on the writeback path:

    Workqueue: ceph-msgr con_work [libceph]
    ffff8810871cb018 0000000000000046 0000000000000000 ffff881085d40000
    0000000000012b00 ffff881025cad428 ffff8810871cbfd8 0000000000012b00
    ffff880102fc1000 ffff881085d40000 ffff8810871cb038 ffff8810871cb148
    Call Trace:
    [<ffffffff816dd629>] schedule+0x29/0x70
    [<ffffffff816e066d>] schedule_timeout+0x1bd/0x200
    [<ffffffff81093ffc>] ? ttwu_do_wakeup+0x2c/0x120
    [<ffffffff81094266>] ? ttwu_do_activate.constprop.135+0x66/0x70
    [<ffffffff816deb5f>] wait_for_completion+0xbf/0x180
    [<ffffffff81097cd0>] ? try_to_wake_up+0x390/0x390
    [<ffffffff81086335>] flush_work+0x165/0x250
    [<ffffffff81082940>] ? worker_detach_from_pool+0xd0/0xd0
    [<ffffffffa03b65b1>] xlog_cil_force_lsn+0x81/0x200 [xfs]
    [<ffffffff816d6b42>] ? __slab_free+0xee/0x234
    [<ffffffffa03b4b1d>] _xfs_log_force_lsn+0x4d/0x2c0 [xfs]
    [<ffffffff811adc1e>] ? lookup_page_cgroup_used+0xe/0x30
    [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa03b4dcf>] xfs_log_force_lsn+0x3f/0xf0 [xfs]
    [<ffffffffa039a723>] ? xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa03a62c6>] xfs_iunpin_wait+0xc6/0x1a0 [xfs]
    [<ffffffff810aa250>] ? wake_atomic_t_function+0x40/0x40
    [<ffffffffa039a723>] xfs_reclaim_inode+0xa3/0x330 [xfs]
    [<ffffffffa039ac07>] xfs_reclaim_inodes_ag+0x257/0x3d0 [xfs]
    [<ffffffffa039bb13>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
    [<ffffffffa03ab745>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
    [<ffffffff811c0c18>] super_cache_scan+0x178/0x180
    [<ffffffff8115912e>] shrink_slab_node+0x14e/0x340
    [<ffffffff811afc3b>] ? mem_cgroup_iter+0x16b/0x450
    [<ffffffff8115af70>] shrink_slab+0x100/0x140
    [<ffffffff8115e425>] do_try_to_free_pages+0x335/0x490
    [<ffffffff8115e7f9>] try_to_free_pages+0xb9/0x1f0
    [<ffffffff816d56e4>] ? __alloc_pages_direct_compact+0x69/0x1be
    [<ffffffff81150cba>] __alloc_pages_nodemask+0x69a/0xb40
    [<ffffffff8119743e>] alloc_pages_current+0x9e/0x110
    [<ffffffff811a0ac5>] new_slab+0x2c5/0x390
    [<ffffffff816d71c4>] __slab_alloc+0x33b/0x459
    [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
    [<ffffffff8164bda1>] ? inet_sendmsg+0x71/0xc0
    [<ffffffff815b906d>] ? sock_alloc_inode+0x2d/0xd0
    [<ffffffff811a21f2>] kmem_cache_alloc+0x1a2/0x1b0
    [<ffffffff815b906d>] sock_alloc_inode+0x2d/0xd0
    [<ffffffff811d8566>] alloc_inode+0x26/0xa0
    [<ffffffff811da04a>] new_inode_pseudo+0x1a/0x70
    [<ffffffff815b933e>] sock_alloc+0x1e/0x80
    [<ffffffff815ba855>] __sock_create+0x95/0x220
    [<ffffffff815baa04>] sock_create_kern+0x24/0x30
    [<ffffffffa04794d9>] con_work+0xef9/0x2050 [libceph]
    [<ffffffffa04aa9ec>] ? rbd_img_request_submit+0x4c/0x60 [rbd]
    [<ffffffff81084c19>] process_one_work+0x159/0x4f0
    [<ffffffff8108561b>] worker_thread+0x11b/0x530
    [<ffffffff81085500>] ? create_worker+0x1d0/0x1d0
    [<ffffffff8108b6f9>] kthread+0xc9/0xe0
    [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90
    [<ffffffff816e1b98>] ret_from_fork+0x58/0x90
    [<ffffffff8108b630>] ? flush_kthread_worker+0x90/0x90

Use memalloc_noio_{save,restore}() to temporarily force GFP_NOIO here.

Cc: stable@vger.kernel.org # 3.10+, needs backporting
Link: http://tracker.ceph.com/issues/19309Reported-by: NSergey Jerusalimov <wintchester@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

633ee407

14 1月, 2017 1 次提交

locking/atomic, kref: Add kref_read() · 2c935bc5

由 Peter Zijlstra 提交于 11月 14, 2016

Since we need to change the implementation, stop exposing internals.

Provide kref_read() to read the current reference count; typically
used for debug messages.

Kills two anti-patterns:

	atomic_read(&kref->refcount)
	kref->refcount.counter
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

2c935bc5

27 12月, 2016 2 次提交
- A
  ceph_tcp_sendpage(): use ITER_BVEC sendmsg · 61ff6e9b
  由 Al Viro 提交于 1月 09, 2016
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  61ff6e9b
- A
  ceph: switch to sock_recvmsg() · 100803a8
  由 Al Viro 提交于 11月 15, 2015
```
... and use ITER_BVEC instead of playing with kmap()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  100803a8
13 12月, 2016 3 次提交

libceph: no need to drop con->mutex for ->get_authorizer() · b3bbd3f2

由 Ilya Dryomov 提交于 12月 02, 2016

->get_authorizer(), ->verify_authorizer_reply(), ->sign_message() and
->check_message_signature() shouldn't be doing anything with or on the
connection (like closing it or sending messages).
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b3bbd3f2

libceph: drop len argument of *verify_authorizer_reply() · 0dde5848

由 Ilya Dryomov 提交于 12月 02, 2016

The length of the reply is protocol-dependent - for cephx it's
ceph_x_authorize_reply.  Nothing sensible can be passed from the
messenger layer anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

0dde5848

libceph: verify authorize reply on connect · 5c056fdc

由 Ilya Dryomov 提交于 12月 02, 2016

After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
the client gets back a ceph_x_authorize_reply, which it is supposed to
verify to ensure the authenticity and protect against replay attacks.
The code for doing this is there (ceph_x_verify_authorizer_reply(),
ceph_auth_verify_authorizer_reply() + plumbing), but it is never
invoked by the the messenger.

AFAICT this goes back to 2009, when ceph authentication protocols
support was added to the kernel client in 4e7a5dcd ("ceph:
negotiate authentication protocol; implement AUTH_NONE protocol").

The second param of ceph_connection_operations::verify_authorizer_reply
is unused all the way down.  Pass 0 to facilitate backporting, and kill
it in the next commit.

Cc: stable@vger.kernel.org
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

5c056fdc

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

26 3月, 2016 2 次提交

libceph: use KMEM_CACHE macro · 5ee61e95

由 Geliang Tang 提交于 3月 13, 2016

Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5ee61e95

libceph: use sizeof_footer() more · 89f08173

由 Ilya Dryomov 提交于 2月 20, 2016

Don't open-code sizeof_footer() in read_partial_message() and
ceph_msg_revoke().  Also, after switching to sizeof_footer(), it's now
possible to use con_out_kvec_add() in prepare_write_message_footer().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

89f08173

25 2月, 2016 2 次提交

libceph: use the right footer size when skipping a message · dbc0d3ca

由 Ilya Dryomov 提交于 2月 19, 2016

ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.

Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

dbc0d3ca

libceph: don't bail early from try_read() when skipping a message · e7a88e82

由 Ilya Dryomov 提交于 2月 17, 2016

The contract between try_read() and try_write() is that when called
each processes as much data as possible.  When instructed by osd_client
to skip a message, try_read() is violating this contract by returning
after receiving and discarding a single message instead of checking for
more.  try_write() then gets a chance to write out more requests,
generating more replies/skips for try_read() to handle, forcing the
messenger into a starvation loop.

Cc: stable@vger.kernel.org # 3.10+
Reported-by: NVarada Kari <Varada.Kari@sandisk.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Tested-by: NVarada Kari <Varada.Kari@sandisk.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

e7a88e82

22 1月, 2016 4 次提交

libceph: clear messenger auth_retry flag if we fault · f6330cc1

由 Ilya Dryomov 提交于 1月 13, 2016

Commit 20e55c4c ("libceph: clear messenger auth_retry flag when we
authenticate") got us only half way there.  We clear the flag if the
second attempt succeeds, but it also needs to be cleared if that
attempt fails, to allow for the exponential backoff to kick in.
Otherwise, if ->should_authenticate() thinks our keys are valid, we
will busy loop, incrementing auth_retry to no avail:

    process_connect ffff880079a63830 got BADAUTHORIZER attempt 1
    process_connect ffff880079a63830 got BADAUTHORIZER attempt 2
    process_connect ffff880079a63830 got BADAUTHORIZER attempt 3
    process_connect ffff880079a63830 got BADAUTHORIZER attempt 4
    process_connect ffff880079a63830 got BADAUTHORIZER attempt 5
    ...
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

f6330cc1

libceph: fix ceph_msg_revoke() · 67645d76

由 Ilya Dryomov 提交于 12月 28, 2015

There are a number of problems with revoking a "was sending" message:

(1) We never make any attempt to revoke data - only kvecs contibute to
con->out_skip.  However, once the header (envelope) is written to the
socket, our peer learns data_len and sets itself to expect at least
data_len bytes to follow front or front+middle.  If ceph_msg_revoke()
is called while the messenger is sending message's data portion,
anything we send after that call is counted by the OSD towards the now
revoked message's data portion.  The effects vary, the most common one
is the eventual hang - higher layers get stuck waiting for the reply to
the message that was sent out after ceph_msg_revoke() returned and
treated by the OSD as a bunch of data bytes.  This is what Matt ran
into.

(2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
is wrong.  If ceph_msg_revoke() is called before the tag is sent out or
while the messenger is sending the header, we will get a connection
reset, either due to a bad tag (0 is not a valid tag) or a bad header
CRC, which kind of defeats the purpose of revoke.  Currently the kernel
client refuses to work with header CRCs disabled, but that will likely
change in the future, making this even worse.

(3) con->out_skip is not reset on connection reset, leading to one or
more spurious connection resets if we happen to get a real one between
con->out_skip is set in ceph_msg_revoke() and before it's cleared in
write_partial_skip().

Fixing (1) and (3) is trivial.  The idea behind fixing (2) is to never
zero the tag or the header, i.e. send out tag+header regardless of when
ceph_msg_revoke() is called.  That way the header is always correct, no
unnecessary resets are induced and revoke stands ready for disabled
CRCs.  Since ceph_msg_revoke() rips out con->out_msg, introduce a new
"message out temp" and copy the header into it before sending.

Cc: stable@vger.kernel.org # 4.0+
Reported-by: NMatt Conner <matt.conner@keepertech.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Tested-by: NMatt Conner <matt.conner@keepertech.com>
Reviewed-by: NSage Weil <sage@redhat.com>

67645d76

libceph: use list_for_each_entry_safe · 10bcee14

由 Geliang Tang 提交于 12月 18, 2015

Use list_for_each_entry_safe() instead of list_for_each_safe() to
simplify the code.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
[idryomov@gmail.com: nuke call to list_splice_init() as well]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

10bcee14

libceph: use list_next_entry instead of list_entry_next · 17ddc49b

由 Geliang Tang 提交于 11月 16, 2015

list_next_entry has been defined in list.h, so I replace list_entry_next
with it.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

17ddc49b

03 11月, 2015 5 次提交

libceph: clear msg->con in ceph_msg_release() only · 583d0fef

由 Ilya Dryomov 提交于 11月 02, 2015

The following bit in ceph_msg_revoke_incoming() is unsafe:

    struct ceph_connection *con = msg->con;
    if (!con)
            return;
    mutex_lock(&con->mutex);
    <more msg->con use>

There is nothing preventing con from getting destroyed right after
msg->con test.  One easy way to reproduce this is to disable message
signing only on the server side and try to map an image.  The system
will go into a

    libceph: read_partial_message ffff880073f0ab68 signature check failed
    libceph: osd0 192.168.255.155:6801 bad crc/signature
    libceph: read_partial_message ffff880073f0ab68 signature check failed
    libceph: osd0 192.168.255.155:6801 bad crc/signature

loop which has to be interrupted with Ctrl-C.  Hit Ctrl-C and you are
likely to end up with a random GP fault if the reset handler executes
"within" ceph_msg_revoke_incoming():

                     <yet another reply w/o a signature>
                                   ...
          <Ctrl-C>
    rbd_obj_request_end
      ceph_osdc_cancel_request
        __unregister_request
          ceph_osdc_put_request
            ceph_msg_revoke_incoming
                                   ...
                                osd_reset
                                  __kick_osd_requests
                                    __reset_osd
                                      remove_osd
                                        ceph_con_close
                                          reset_connection
                                            <clear con->in_msg->con>
                                            <put con ref>
                                              put_osd
                                                <free osd/con>
              <msg->con use> <-- !!!

If ceph_msg_revoke_incoming() executes "before" the reset handler,
osd/con will be leaked because ceph_msg_revoke_incoming() clears
con->in_msg but doesn't put con ref, while reset_connection() only puts
con ref if con->in_msg != NULL.

The current msg->con scheme was introduced by commits 38941f80
("libceph: have messages point to their connection") and 92ce034b
("libceph: have messages take a connection reference"), which defined
when messages get associated with a connection and when that
association goes away.  Part of the problem is that this association is
supposed to go away in much too many places; closing this race entirely
requires either a rework of the existing or an addition of a new layer
of synchronization.

In lieu of that, we can make it *much* less likely to hit by
disassociating messages only on their destruction and resend through
a different connection.  This makes the code simpler and is probably
a good thing to do regardless - this patch adds a msg_con_set() helper
which is is called from only three places: ceph_con_send() and
ceph_con_in_msg_alloc() to set msg->con and ceph_msg_release() to clear
it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

583d0fef

libceph: add nocephx_sign_messages option · a51983e4

由 Ilya Dryomov 提交于 10月 28, 2015

Support for message signing was merged into 3.19, along with
nocephx_require_signatures option.  But, all that option does is allow
the kernel client to talk to clusters that don't support MSG_AUTH
feature bit.  That's pretty useless, given that it's been supported
since bobtail.

Meanwhile, if one disables message signing on the server side with
"cephx sign messages = false", it becomes impossible to use the kernel
client since it expects messages to be signed if MSG_AUTH was
negotiated.  Add nocephx_sign_messages option to support this use case.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a51983e4

libceph: stop duplicating client fields in messenger · 859bff51

由 Ilya Dryomov 提交于 10月 28, 2015

supported_features and required_features serve no purpose at all, while
nocrc and tcp_nodelay belong to ceph_options::flags.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

859bff51

libceph: msg signing callouts don't need con argument · 79dbd1ba

由 Ilya Dryomov 提交于 10月 26, 2015

We can use msg->con instead - at the point we sign an outgoing message
or check the signature on the incoming one, msg->con is always set.  We
wouldn't know how to sign a message without an associated session (i.e.
msg->con == NULL) and being able to sign a message using an explicitly
provided authorizer is of no use.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

79dbd1ba

libceph: use local variable cursor instead of &msg->cursor · 343128ce

由 Shraddha Barke 提交于 10月 19, 2015

Use local variable cursor in place of &msg->cursor in
read_partial_msg_data() and write_partial_msg_data().
Signed-off-by: NShraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

343128ce

18 9月, 2015 1 次提交

libceph: don't access invalid memory in keepalive2 path · 7f61f545

由 Ilya Dryomov 提交于 9月 14, 2015

This

    struct ceph_timespec ceph_ts;
    ...
    con_out_kvec_add(con, sizeof(ceph_ts), &ceph_ts);

wraps ceph_ts into a kvec and adds it to con->out_kvec array, yet
ceph_ts becomes invalid on return from prepare_write_keepalive().  As
a result, we send out bogus keepalive2 stamps.  Fix this by encoding
into a ceph_timespec member, similar to how acks are read and written.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

7f61f545

09 9月, 2015 4 次提交

libceph: check data_len in ->alloc_msg() · d15f9d69

由 Ilya Dryomov 提交于 9月 02, 2015

Only ->alloc_msg() should check data_len of the incoming message
against the preallocated ceph_msg, doing it in the messenger is not
right.  The contract is that either ->alloc_msg() returns a ceph_msg
which will fit all of the portions of the incoming message, or it
returns NULL and possibly sets skip, signaling whether NULL is due to
an -ENOMEM.  ->alloc_msg() should be the only place where we make the
skip/no-skip decision.

I stumbled upon this while looking at con/osd ref counting.  Right now,
if we get a non-extent message with a larger data portion than we are
prepared for, ->alloc_msg() returns a ceph_msg, and then, when we skip
it in the messenger, we don't put the con/osd ref acquired in
ceph_con_in_msg_alloc() (which is normally put in process_message()),
so this also fixes a memory leak.

An existing BUG_ON in ceph_msg_data_cursor_init() ensures we don't
corrupt random memory should a buggy ->alloc_msg() return an unfit
ceph_msg.

While at it, I changed the "unknown tid" dout() to a pr_warn() to make
sure all skips are seen and unified format strings.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

d15f9d69

Y
libceph: use keepalive2 to verify the mon session is alive · 8b9558aa
由 Yan, Zheng 提交于 9月 01, 2015
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
8b9558aa

libceph: rename con_work() to ceph_con_workfn() · 68931622

由 Ilya Dryomov 提交于 7月 03, 2015

Even though it's static, con_work(), being a work func, shows up in
various stacktraces a lot.  Prefix it with ceph_.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

68931622

libceph: Avoid holding the zero page on ceph_msgr_slab_init errors · d920ff6f

由 Benoît Canet 提交于 6月 25, 2015

ceph_msgr_slab_init may fail due to a temporary ENOMEM.

Delay a bit the initialization of zero_page in ceph_msgr_init and
reorder its cleanup in _ceph_msgr_exit so it's done in reverse
order of setup.

BUG_ON() will not suffer to be postponed in case it is triggered.
Signed-off-by: NBenoît Canet <benoit.canet@nodalink.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d920ff6f

10 7月, 2015 2 次提交

libceph: treat sockaddr_storage with uninitialized family as blank · c44bd69c

由 Ilya Dryomov 提交于 7月 09, 2015

addr_is_blank() should return true if family is neither AF_INET nor
AF_INET6.  This is what its counterpart entity_addr_t::is_blank_ip() is
doing and it is the right thing to do: in process_banner() we check if
our address is blank and if it is "learn" it from our peer.  As it is,
we never learn our address and always send out a blank one.  This goes
way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and
some ipv6 support groundwork") from 2009.

While at at, do not open-code ipv6_addr_any() and use INADDR_ANY
constant instead of 0.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

c44bd69c

libceph: enable ceph in a non-default network namespace · 757856d2

由 Ilya Dryomov 提交于 6月 25, 2015

Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net.  Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.

This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

757856d2

30 6月, 2015 1 次提交

libceph: Fix ceph_tcp_sendpage()'s more boolean usage · c2cfa194

由 Benoît Canet 提交于 6月 25, 2015

From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h:

bool    last_piece;     /* current is last piece */

In ceph_msg_data_next():

*last_piece = cursor->last_piece;

A call to ceph_msg_data_next() is followed by:

ret = ceph_tcp_sendpage(con->sock, page, page_offset,
                        length, last_piece);

while ceph_tcp_sendpage() is:

static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
                             int offset, size_t size, bool more)

The logic is inverted: correct it.
Signed-off-by: NBenoît Canet <benoit.canet@nodalink.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c2cfa194

25 6月, 2015 1 次提交

libceph: Remove spurious kunmap() of the zero page · 6ba8edc0

由 Benoît Canet 提交于 6月 24, 2015

ceph_tcp_sendpage already does the work of mapping/unmapping
the zero page if needed.
Signed-off-by: NBenoît Canet <benoit.canet@nodalink.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6ba8edc0

11 5月, 2015 1 次提交

net: Add a struct net parameter to sock_create_kern · eeb1bd5c

由 Eric W. Biederman 提交于 5月 08, 2015

This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeb1bd5c

20 4月, 2015 1 次提交

libceph: don't overwrite specific con error msgs · 67c64eb7

由 Ilya Dryomov 提交于 3月 23, 2015

- specific con->error_msg messages (e.g. "protocol version mismatch")
  end up getting overwritten by a catch-all "socket error on read
  / write", introduced in commit 3a140a0d ("libceph: report socket
  read/write error message")
- "bad message sequence # for incoming message" loses to "bad crc" due
  to the fact that -EBADMSG is used for both

Fix it, and tidy up con->error_msg assignments and pr_errs while at it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

67c64eb7

08 4月, 2015 1 次提交

Revert "libceph: use memalloc flags for net IO" · 6d7fdb0a

由 Ilya Dryomov 提交于 4月 02, 2015

This reverts commit 89baaa57.

Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support.  (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.)  See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.

On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.

[1] "SOCK_MEMALLOC vs loopback"
    http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
    http://www.spinics.net/lists/ceph-devel/msg23392.html

Conflicts:
	net/ceph/messenger.c [ context: tcp_nodelay option ]

Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.18+, needs backporting
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Acked-by: NMike Christie <michaelc@cs.wisc.edu>
Acked-by: NMel Gorman <mgorman@suse.de>

6d7fdb0a

19 2月, 2015 1 次提交

libceph: tcp_nodelay support · ba988f87

由 Chaitanya Huilgol 提交于 1月 23, 2015

TCP_NODELAY socket option set on connection sockets,
disables Nagle’s algorithm and improves latency characteristics.
tcp_nodelay(default)/notcp_nodelay option flags provided to
enable/disable setting the socket option.
Signed-off-by: NChaitanya Huilgol <chaitanya.huilgol@sandisk.com>
[idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

ba988f87

18 12月, 2014 2 次提交

Y
libceph: message signature support · 33d07337
由 Yan, Zheng 提交于 11月 04, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
33d07337

libceph: nuke ceph_kvfree() · 4965fc38

由 Ilya Dryomov 提交于 10月 23, 2014

Use kvfree() from linux/mm.h instead, which is identical.  Also fix the
ceph_buffer comment: we will allocate with kmalloc() up to 32k - the
value of PAGE_ALLOC_COSTLY_ORDER, but that really is just an
implementation detail so don't mention it at all.
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>

4965fc38

30 10月, 2014 1 次提交

libceph: use memalloc flags for net IO · 89baaa57

由 Mike Christie 提交于 10月 16, 2014

This patch has ceph's lib code use the memalloc flags.

If the VM layer needs to write data out to free up memory to handle new
allocation requests, the block layer must be able to make forward progress.
To handle that requirement we use structs like mempools to reserve memory for
objects like bios and requests.

The problem is when we send/receive block layer requests over the network
layer, net skb allocations can fail and the system can lock up.
To solve this, the memalloc related flags were added. NBD, iSCSI
and NFS uses these flags to tell the network/vm layer that it should
use memory reserves to fullfill allcation requests for structs like
skbs.

I am running ceph in a bunch of VMs in my laptop, so this patch was
not tested very harshly.
Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
Reviewed-by: NIlya Dryomov <idryomov@redhat.com>

89baaa57

15 10月, 2014 3 次提交

libceph: ceph-msgr workqueue needs a resque worker · f9865f06

由 Ilya Dryomov 提交于 10月 10, 2014

Commit f363e45f ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq.  This is
wrong - libceph is very much a memory reclaim path, so restore it.

Cc: stable@vger.kernel.org # needs backporting for < 3.12
Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
Tested-by: NMicha Krause <micha@krausam.de>
Reviewed-by: NSage Weil <sage@redhat.com>

f9865f06

libceph: reference counting pagelist · e4339d28

由 Yan, Zheng 提交于 9月 16, 2014

this allow pagelist to present data that may be sent multiple times.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

e4339d28

libceph: Convert pr_warning to pr_warn · b9a67899

由 Joe Perches 提交于 9月 09, 2014

Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>

b9a67899

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功