提交 · 51f1de79ad8ed3555fd01ae8fd432691d397684b · openanolis / cloud-kernel

26 1月, 2018 3 次提交

net/smc: replace sock_put worker by socket refcounting · 51f1de79

由 Ursula Braun 提交于 1月 26, 2018

Proper socket refcounting makes the sock_put worker obsolete.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f1de79

net/smc: smc_poll improvements · 8dce2786

由 Ursula Braun 提交于 1月 26, 2018

Increase the socket refcount during poll wait.
Take the socket lock before checking socket state.
For a listening socket return a mask independent of state SMC_ACTIVE and
cover errors or closed state as well.
Get rid of the accept_q loop in smc_accept_poll().
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dce2786

net/smc: do not reuse a linkgroup with setup problems · 610db66f

由 Ursula Braun 提交于 1月 25, 2018

Once a linkgroup is created successfully, it stays alive for a
certain time to service more connections potentially created.
If one of the initialization steps for a new linkgroup fails,
the linkgroup should not be reused by other connections following.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

610db66f

24 1月, 2018 2 次提交

net/smc: simplify function smc_clcsock_accept() · 35a6b178

由 Ursula Braun 提交于 1月 24, 2018

Cleanup to avoid duplicate code in smc_clcsock_accept().
No functional change.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35a6b178

net/smc: use local struct sock variables consistently · 3163c507

由 Ursula Braun 提交于 1月 24, 2018

Cleanup to consistently exploit the local struct sock definitions.
No functional change.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3163c507

08 12月, 2017 2 次提交

smc: support variable CLC proposal messages · e7b7a64a

由 Ursula Braun 提交于 12月 07, 2017

According to RFC7609 [1] the CLC proposal message contains an area of
unknown length for future growth. Additionally it may contain up to
8 IPv6 prefixes. The current version of the SMC-code does not
understand CLC proposal messages using these variable length fields and,
thus, is incompatible with SMC implementations in other operating
systems.

This patch makes sure, SMC understands incoming CLC proposals
* with arbitrary length values for future growth
* with up to 8 IPv6 prefixes

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NHans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7b7a64a

smc: improve smc_clc_send_decline() error handling · 0c9f1515

由 Ursula Braun 提交于 12月 07, 2017

Let smc_clc_send_decline() return with an error, if the amount
sent is smaller than the length of an smc decline message.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c9f1515

26 10月, 2017 2 次提交

smc: add SMC rendezvous protocol · c5c1cc9c

由 Ursula Braun 提交于 10月 25, 2017

The SMC protocol [1] uses a rendezvous protocol to negotiate SMC
capability between peers. The current Linux implementation does not yet
use this rendezvous protocol and, thus, is not compliant to RFC7609 and
incompatible with other SMC implementations like in zOS.
This patch adds support for the SMC rendezvous protocol. It uses a new
TCP experimental option. With this option, SMC capabilities are
exchanged between the peers during the TCP three way handshake.

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5c1cc9c

smc: fix mutex unlocks during link group creation · 145686ba

由 Ursula Braun 提交于 10月 25, 2017

Link group creation is synchronized with the smc_create_lgr_pending
lock. In smc_listen_work() this mutex is sometimes unlocked, even
though it has not been locked before. This issue will surface in
presence of the SMC rendezvous code.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

145686ba

22 9月, 2017 2 次提交

net/smc: terminate link group if out-of-sync is received · bfbedfd3

由 Ursula Braun 提交于 9月 21, 2017

An out-of-sync condition can just be detected by the client.
If the server receives a CLC DECLINE message indicating an out-of-sync
condition for the link groups, the server must clean up the out-of-sync
link group.
There is no need for an extra third parameter in smc_clc_send_decline().
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfbedfd3

net/smc: take RCU read lock for routing cache lookup · 731b0085

由 Ursula Braun 提交于 9月 21, 2017

smc_netinfo_by_tcpsk() looks up the routing cache. Such a lookup requires
protection by an RCU read lock.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

731b0085

30 7月, 2017 4 次提交

net/smc: synchronize buffer usage with device · 10428dd8

由 Ursula Braun 提交于 7月 28, 2017

Usage of send buffer "sndbuf" is synced
(a) before filling sndbuf for cpu access
(b) after filling sndbuf for device access

Usage of receive buffer "RMB" is synced
(a) before reading RMB content for cpu access
(b) after reading RMB content for device access
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10428dd8

net/smc: common functions for RMBs and send buffers · 3e034725

由 Ursula Braun 提交于 7月 28, 2017

Creation and deletion of SMC receive and send buffers shares a high
amount of common code . This patch introduces common functions to get
rid of duplicate code.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e034725

net/smc: register RMB-related memory region · 652a1e41

由 Ursula Braun 提交于 7月 28, 2017

A memory region created for a new RMB must be registered explicitly,
before the peer can make use of it for remote DMA transfer.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

652a1e41

net/smc: serialize connection creation in all cases · 977bb324

由 Ursula Braun 提交于 7月 28, 2017

If a link group for a new server connection exists already, the mutex
serializing the determination of link groups is given up early.
The coming registration of memory regions benefits from the serialization
as well, if the mutex is held till connection creation is finished.
This patch postpones the unlocking of the link group creation mutex.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

977bb324

19 4月, 2017 1 次提交

mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU · 5f0d5a3a

由 Paul E. McKenney 提交于 1月 18, 2017

A group of Linux kernel hackers reported chasing a bug that resulted
from their assumption that SLAB_DESTROY_BY_RCU provided an existence
guarantee, that is, that no block from such a slab would be reallocated
during an RCU read-side critical section.  Of course, that is not the
case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
slab of blocks.

However, there is a phrase for this, namely "type safety".  This commit
therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
to avoid future instances of this sort of confusion.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
[ paulmck: Add comments mentioning the old name, as requested by Eric
  Dumazet, in order to help people familiar with the old name find
  the new one. ]
Acked-by: NDavid Rientjes <rientjes@google.com>

5f0d5a3a

12 4月, 2017 3 次提交

net/smc: destruct non-accepted sockets · 288c8390

由 Ursula Braun 提交于 4月 10, 2017

Make sure sockets never accepted are removed cleanly.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

288c8390

net/smc: remove duplicate unhash · f5227cd9

由 Ursula Braun 提交于 4月 10, 2017

unhash is already called in sock_put_work. Remove the second call.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5227cd9

net/smc: no socket state changes in tasklet context · 46c28dbd

由 Ursula Braun 提交于 4月 10, 2017

Several state changes occur during SMC socket closing. Currently
state changes triggered locally occur in process context with
lock_sock() taken while state changes triggered by peer occur in
tasklet context with bh_lock_sock() taken. bh_lock_sock() does not
wait till a lock_sock(() task in process context is finished. This
may lead to races in socket state transitions resulting in dangling
SMC-sockets, or it may lead to duplicate SMC socket freeing.
This patch introduces a closing worker to run all state changes under
lock_sock().
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46c28dbd

10 3月, 2017 1 次提交

net: Work around lockdep limitation in sockets that use sockets · cdfbabfb

由 David Howells 提交于 3月 09, 2017

Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdfbabfb

03 3月, 2017 1 次提交

sched/headers: Move task_struct::signal and task_struct::sighand types and... · c3edc401

由 Ingo Molnar 提交于 2月 02, 2017

sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h>

task_struct::signal and task_struct::sighand are pointers, which would normally make it
straightforward to not define those types in sched.h.

That is not so, because the types are accompanied by a myriad of APIs (macros and inline
functions) that dereference them.

Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.

With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
trying to put accessors into sched.h as a test fails the following way:

  ./include/linux/sched.h: In function ‘test_signal_types’:
  ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
                    ^

This reduces the size and complexity of sched.h significantly.

Update all headers and .c code that relied on getting the signal handling
functionality from <linux/sched.h> to include <linux/sched/signal.h>.

The list of affected files in the preparatory patch was partly generated by
grepping for the APIs, and partly by doing coverage build testing, both
all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
cross-architecture builds.

Nevertheless some (trivial) build breakage is still expected related to rare
Kconfig combinations and in-flight patches to various kernel code, but most
of it should be handled by this patch.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c3edc401

10 1月, 2017 13 次提交

smc: netlink interface for SMC sockets · f16a7dd5

由 Ursula Braun 提交于 1月 09, 2017

Support for SMC socket monitoring via netlink sockets of protocol
NETLINK_SOCK_DIAG.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f16a7dd5

smc: socket closing and linkgroup cleanup · b38d7324

由 Ursula Braun 提交于 1月 09, 2017

smc_shutdown() and smc_release() handling
delayed linkgroup cleanup for linkgroups without connections
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b38d7324

smc: receive data from RMBE · 952310cc

由 Ursula Braun 提交于 1月 09, 2017

move RMBE data into user space buffer and update managing cursors
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

952310cc

smc: send data (through RDMA) · e6727f39

由 Ursula Braun 提交于 1月 09, 2017

copy data to kernel send buffer, and trigger RDMA write
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6727f39

smc: connection data control (CDC) · 5f08318f

由 Ursula Braun 提交于 1月 09, 2017

send and receive CDC messages (via IB message send and CQE)
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f08318f

smc: link layer control (LLC) · 9bf9abea

由 Ursula Braun 提交于 1月 09, 2017

send and receive LLC messages CONFIRM_LINK (via IB message send and CQE)
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bf9abea

smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR · bd4ad577

由 Ursula Braun 提交于 1月 09, 2017

Prepare the link for RDMA transport:
Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR).
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd4ad577

smc: remote memory buffers (RMBs) · cd6851f3

由 Ursula Braun 提交于 1月 09, 2017

* allocate data RMB memory for sending and receiving
* size depends on the maximum socket send and receive buffers
* allocated RMBs are kept during life time of the owning link group
* map the allocated RMBs to DMA
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd6851f3

smc: connection and link group creation · 0cfdd8f9

由 Ursula Braun 提交于 1月 09, 2017

* create smc_connection for SMC-sockets
* determine suitable link group for a connection
* create a new link group if necessary
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cfdd8f9

smc: CLC handshake (incl. preparation steps) · a046d57d

由 Ursula Braun 提交于 1月 09, 2017

* CLC (Connection Layer Control) handshake
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a046d57d

smc: establish pnet table management · 6812baab

由 Thomas Richter 提交于 1月 09, 2017

Connection creation with SMC-R starts through an internal
TCP-connection. The Ethernet interface for this TCP-connection is not
restricted to the Ethernet interface of a RoCE device. Any existing
Ethernet interface belonging to the same physical net can be used, as
long as there is a defined relation between the Ethernet interface and
some RoCE devices. This relation is defined with the help of an
identification string called "Physical Net ID" or short "pnet ID".
Information about defined pnet IDs and their related Ethernet
interfaces and RoCE devices is stored in the SMC-R pnet table.

A pnet table entry consists of the identifying pnet ID and the
associated network and IB device.
This patch adds pnet table configuration support using the
generic netlink message interface referring to network and IB device
by their names. Commands exist to add, delete, and display pnet table
entries, and to flush or display the entire pnet table.

There are cross-checks to verify whether the ethernet interfaces
or infiniband devices really exist in the system. If either device
is not available, the pnet ID entry is not created.
Loss of network devices and IB devices is also monitored;
a pnet ID entry is removed when an associated network or
IB device is removed.
Signed-off-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6812baab

smc: introduce SMC as an IB-client · a4cf0443

由 Ursula Braun 提交于 1月 09, 2017

* create a list of SMC IB-devices
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4cf0443

smc: establish new socket family · ac713874

由 Ursula Braun 提交于 1月 09, 2017

* enable smc module loading and unloading
 * register new socket family
 * basic smc socket creation and deletion
 * use backing TCP socket to run CLC (Connection Layer Control)
   handshake of SMC protocol
 * Setup for infiniband traffic is implemented in follow-on patches.
   For now fallback to TCP socket is always used.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NUtz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac713874

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功