提交 · 10428dd8354cc1c74ee806df45c2227c1f9d7b0c · openanolis / cloud-kernel

30 7月, 2017 10 次提交

net/smc: synchronize buffer usage with device · 10428dd8

由 Ursula Braun 提交于 7月 28, 2017

Usage of send buffer "sndbuf" is synced
(a) before filling sndbuf for cpu access
(b) after filling sndbuf for device access

Usage of receive buffer "RMB" is synced
(a) before reading RMB content for cpu access
(b) after reading RMB content for device access
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10428dd8

net/smc: cleanup function __smc_buf_create() · b33982c3

由 Ursula Braun 提交于 7月 28, 2017

Split function __smc_buf_create() for better readability.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b33982c3

net/smc: common functions for RMBs and send buffers · 3e034725

由 Ursula Braun 提交于 7月 28, 2017

Creation and deletion of SMC receive and send buffers shares a high
amount of common code . This patch introduces common functions to get
rid of duplicate code.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e034725

net/smc: introduce sg-logic for send buffers · 9d8fb617

由 Ursula Braun 提交于 7月 28, 2017

SMC send buffers are processed the same way as RMBs. Since RMBs have
been converted to sg-logic, do the same for send buffers.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d8fb617

net/smc: remove Kconfig warning · d5b361b0

由 Ursula Braun 提交于 7月 28, 2017

Now separate memory regions are created and registered for separate
RMBs. The unsafe_global_rkey of the protection domain is no longer
used. Thus the exposing memory warning can be removed.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5b361b0

net/smc: register RMB-related memory region · 652a1e41

由 Ursula Braun 提交于 7月 28, 2017

A memory region created for a new RMB must be registered explicitly,
before the peer can make use of it for remote DMA transfer.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

652a1e41

net/smc: use separate memory regions for RMBs · 897e1c24

由 Ursula Braun 提交于 7月 28, 2017

SMC currently uses the unsafe_global_rkey of the protection domain,
which exposes all memory for remote reads and writes once a connection
is established. This patch introduces separate memory regions with
separate rkeys for every RMB. Now the unsafe_global_rkey of the
protection domain is no longer needed.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

897e1c24

net/smc: introduce sg-logic for RMBs · a3fe3d01

由 Ursula Braun 提交于 7月 28, 2017

The follow-on patch makes use of ib_map_mr_sg() when introducing
separate memory regions for RMBs. This function is based on
scatterlists; thus this patch introduces scatterlists for RMBs.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3fe3d01

net/smc: shorten local bufsize variables · c45abf31

由 Ursula Braun 提交于 7月 28, 2017

Initiate the coming rework of SMC buffer handling with this
small code cleanup. No functional changes here.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c45abf31

net/smc: serialize connection creation in all cases · 977bb324

由 Ursula Braun 提交于 7月 28, 2017

If a link group for a new server connection exists already, the mutex
serializing the determination of link groups is given up early.
The coming registration of memory regions benefits from the serialization
as well, if the mutex is held till connection creation is finished.
This patch postpones the unlocking of the link group creation mutex.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

977bb324

17 5月, 2017 2 次提交

net/smc: Add warning about remote memory exposure · 19a0f7e3

由 Christoph Hellwig 提交于 5月 16, 2017

The driver explicitly bypasses APIs to register all memory once a
connection is made, and thus allows remote access to memory.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Acked-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19a0f7e3

smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY · 263eec9b

由 Ursula Braun 提交于 5月 15, 2017

Currently, SMC enables remote access to physical memory when a user
has successfully configured and established an SMC-connection until ten
minutes after the last SMC connection is closed. Because this is considered
a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
such a case.

This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
This improves user awareness, but does not remove the security risk itself.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

263eec9b

02 5月, 2017 2 次提交

IB/core: Define 'ib' and 'roce' rdma_ah_attr types · 44c58487

由 Dasaratharaman Chandramouli 提交于 4月 29, 2017

rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

44c58487

IB/core: Use rdma_ah_attr accessor functions · d8966fcd

由 Dasaratharaman Chandramouli 提交于 4月 29, 2017

Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d8966fcd

19 4月, 2017 1 次提交

mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU · 5f0d5a3a

由 Paul E. McKenney 提交于 1月 18, 2017

A group of Linux kernel hackers reported chasing a bug that resulted
from their assumption that SLAB_DESTROY_BY_RCU provided an existence
guarantee, that is, that no block from such a slab would be reallocated
during an RCU read-side critical section.  Of course, that is not the
case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
slab of blocks.

However, there is a phrase for this, namely "type safety".  This commit
therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
to avoid future instances of this sort of confusion.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
[ paulmck: Add comments mentioning the old name, as requested by Eric
  Dumazet, in order to help people familiar with the old name find
  the new one. ]
Acked-by: NDavid Rientjes <rientjes@google.com>

5f0d5a3a

12 4月, 2017 10 次提交

net/smc: do not use IB_SEND_INLINE together with mapped data · 2c9c1682

由 Ursula Braun 提交于 4月 10, 2017

smc specifies IB_SEND_INLINE for IB_WR_SEND ib_post_send calls, but
provides a mapped buffer to be sent. This is inconsistent, since
IB_SEND_INLINE works without mapped buffer. Problem has not been
detected in the past, because tests had been limited to Connect X3 cards
from Mellanox, whose mlx4 driver just ignored the IB_SEND_INLINE flag.
For now, the IB_SEND_INLINE flag is removed.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c9c1682

net/smc: destruct non-accepted sockets · 288c8390

由 Ursula Braun 提交于 4月 10, 2017

Make sure sockets never accepted are removed cleanly.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

288c8390

net/smc: remove duplicate unhash · f5227cd9

由 Ursula Braun 提交于 4月 10, 2017

unhash is already called in sock_put_work. Remove the second call.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5227cd9

net/smc: guarantee ConnClosed send after shutdown SHUT_WR · a98bf8c0

由 Ursula Braun 提交于 4月 10, 2017

State SMC_CLOSED should be reached only, if ConnClosed has been sent to
the peer. If ConnClosed is received from the peer, a socket with
shutdown SHUT_WR done, switches errorneously to state SMC_CLOSED, which
means the peer socket is dangling. The local SMC socket is supposed to
switch to state APPFINCLOSEWAIT to make sure smc_close_final() is called
during socket close.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a98bf8c0

net/smc: no socket state changes in tasklet context · 46c28dbd

由 Ursula Braun 提交于 4月 10, 2017

Several state changes occur during SMC socket closing. Currently
state changes triggered locally occur in process context with
lock_sock() taken while state changes triggered by peer occur in
tasklet context with bh_lock_sock() taken. bh_lock_sock() does not
wait till a lock_sock(() task in process context is finished. This
may lead to races in socket state transitions resulting in dangling
SMC-sockets, or it may lead to duplicate SMC socket freeing.
This patch introduces a closing worker to run all state changes under
lock_sock().
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46c28dbd

net/smc: always call the POLL_IN part of sk_wake_async · 90e9517e

由 Ursula Braun 提交于 4月 10, 2017

Wake up reading file descriptors for a closing socket as well, otherwise
some socket applications may stall.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90e9517e

net/smc: guarantee reset of write_blocked for heavy workload · 90cacb2e

由 Ursula Braun 提交于 4月 10, 2017

If peer indicates write_blocked, the cursor state of the received data
should be send to the peer immediately (in smc_tx_consumer_update()).
Afterwards the write_blocked indicator is cleared.

If there is no free slot for another write request, sending is postponed
to worker smc_tx_work, and the write_blocked indicator is not cleared.
Therefore another clearing check is needed in smc_tx_work().
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90cacb2e

net/smc: return active RoCE port only · 5da7e4d3

由 Ursula Braun 提交于 4月 10, 2017

SMC requires an active ib port on the RoCE device.
smc_pnet_find_roce_resource() determines the matching RoCE device port
according to the configured PNET table. Do not return the found
RoCE device port, if it is not flagged active.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5da7e4d3

net/smc: remove useless smc_ib_devices_list check · 249633a4

由 Ursula Braun 提交于 4月 10, 2017

The global event handler is created only, if the ib_device has already
been used by at least one link group. It is guaranteed that there exists
the corresponding entry in the smc_ib_devices list. Get rid of this
superfluous check.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

249633a4

net/smc: get rid of old comment · 3c22e8f3

由 Ursula Braun 提交于 4月 10, 2017

This patch removes an outdated comment.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c22e8f3

31 3月, 2017 1 次提交

drivers: add explicit interrupt.h includes · 282ccf6e

由 Florian Westphal 提交于 3月 29, 2017

These files all use functions declared in interrupt.h, but currently rely
on implicit inclusion of this file (via netns/xfrm.h).

That won't work anymore when the flow cache is removed so include that
header where needed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

282ccf6e

10 3月, 2017 1 次提交

net: Work around lockdep limitation in sockets that use sockets · cdfbabfb

由 David Howells 提交于 3月 09, 2017

Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdfbabfb

03 3月, 2017 1 次提交

sched/headers: Move task_struct::signal and task_struct::sighand types and... · c3edc401

由 Ingo Molnar 提交于 2月 02, 2017

sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h>

task_struct::signal and task_struct::sighand are pointers, which would normally make it
straightforward to not define those types in sched.h.

That is not so, because the types are accompanied by a myriad of APIs (macros and inline
functions) that dereference them.

Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.

With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
trying to put accessors into sched.h as a test fails the following way:

  ./include/linux/sched.h: In function ‘test_signal_types’:
  ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
                    ^

This reduces the size and complexity of sched.h significantly.

Update all headers and .c code that relied on getting the signal handling
functionality from <linux/sched.h> to include <linux/sched/signal.h>.

The list of affected files in the preparatory patch was partly generated by
grepping for the APIs, and partly by doing coverage build testing, both
all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
cross-architecture builds.

Nevertheless some (trivial) build breakage is still expected related to rare
Kconfig combinations and in-flight patches to various kernel code, but most
of it should be handled by this patch.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c3edc401

31 1月, 2017 1 次提交

smc: some potential use after free bugs · cdaf25df

由 Dan Carpenter 提交于 1月 30, 2017

Say we got really unlucky and these failed on the last iteration, then
it could lead to a use after free bug.

Fixes: cd6851f3 ("smc: remote memory buffers (RMBs)")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdaf25df

12 1月, 2017 1 次提交

smc: ETH_ALEN as memcpy length for mac addresses · 143c0171

由 Ursula Braun 提交于 1月 12, 2017

When creating an SMC connection, there is a CLC (connection layer control)
handshake to prepare for RDMA traffic. The corresponding code is part of
commit 0cfdd8f9 ("smc: connection and link group creation").
Mac addresses to be exchanged in the handshake are copied with a wrong
length of 12 instead of 6 bytes. Following code overwrites the wrongly
copied code, but nevertheless the correct length should already be used for
the preceding mac address copying. Use ETH_ALEN for the memcpy length with
mac addresses.
Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Fixes: 0cfdd8f9 ("smc: connection and link group creation")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

143c0171

10 1月, 2017 10 次提交

smc: netlink interface for SMC sockets · f16a7dd5