提交 · 2153bd1e3d3dbf6a3403572084ef6ed31c53c5f0 · openeuler / Kernel

15 11月, 2021 1 次提交

net/smc: Transfer remaining wait queue entries during fallback · 2153bd1e

由 Wen Gu 提交于 11月 13, 2021

The SMC fallback is incomplete currently. There may be some
wait queue entries remaining in smc socket->wq, which should
be removed to clcsocket->wq during the fallback.

For example, in nginx/wrk benchmark, this issue causes an
all-zeros test result:

server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html

  Running 5s test @ http://11.200.15.93/index.html
     1 threads and 1 connections
     Thread Stats   Avg      Stdev     Max   ± Stdev
     	Latency     0.00us    0.00us   0.00us    -nan%
	Req/Sec     0.00      0.00     0.00      -nan%
	0 requests in 5.00s, 0.00B read
     Requests/sec:      0.00
     Transfer/sec:       0.00B

The reason for this all-zeros result is that when wrk used SMC
to replace TCP, it added an eppoll_entry into smc socket->wq
and expected to be notified if epoll events like EPOLL_IN/
EPOLL_OUT occurred on the smc socket.

However, once a fallback occurred, wrk switches to use clcsocket.
Now it is clcsocket->wq instead of smc socket->wq which will
be woken up. The eppoll_entry remaining in smc socket->wq does
not work anymore and wrk stops the test.

This patch fixes this issue by removing remaining wait queue
entries from smc socket->wq to clcsocket->wq during the fallback.

Link: https://www.spinics.net/lists/netdev/msg779769.htmlSigned-off-by: NWen Gu <guwen@linux.alibaba.com>
Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2153bd1e

10 11月, 2021 1 次提交

net/smc: fix sk_refcnt underflow on linkdown and fallback · e5d5aadc

由 Dust Li 提交于 11月 10, 2021

We got the following WARNING when running ab/nginx
test with RDMA link flapping (up-down-up).
The reason is when smc_sock fallback and at linkdown
happens simultaneously, we may got the following situation:

__smc_lgr_terminate()
 --> smc_conn_kill()
    --> smc_close_active_abort()
           smc_sock->sk_state = SMC_CLOSED
           sock_put(smc_sock)

smc_sock was set to SMC_CLOSED and sock_put() been called
when terminate the link group. But later application call
close() on the socket, then we got:

__smc_release():
    if (smc_sock->fallback)
        smc_sock->sk_state = SMC_CLOSED
        sock_put(smc_sock)

Again we set the smc_sock to CLOSED through it's already
in CLOSED state, and double put the refcnt, so the following
warning happens:

refcount_t: underflow; use-after-free.
WARNING: CPU: 5 PID: 860 at lib/refcount.c:28 refcount_warn_saturate+0x8d/0xf0
Modules linked in:
CPU: 5 PID: 860 Comm: nginx Not tainted 5.10.46+ #403
Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
RIP: 0010:refcount_warn_saturate+0x8d/0xf0
Code: 05 5c 1e b5 01 01 e8 52 25 bc ff 0f 0b c3 80 3d 4f 1e b5 01 00 75 ad 48

RSP: 0018:ffffc90000527e50 EFLAGS: 00010286
RAX: 0000000000000026 RBX: ffff8881300df2c0 RCX: 0000000000000027
RDX: 0000000000000000 RSI: ffff88813bd58040 RDI: ffff88813bd58048
RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000001
R10: ffff8881300df2c0 R11: ffffc90000527c78 R12: ffff8881300df340
R13: ffff8881300df930 R14: ffff88810b3dad80 R15: ffff8881300df4f8
FS:  00007f739de8fb80(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000a01b008 CR3: 0000000111b64003 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 smc_release+0x353/0x3f0
 __sock_release+0x3d/0xb0
 sock_close+0x11/0x20
 __fput+0x93/0x230
 task_work_run+0x65/0xa0
 exit_to_user_mode_prepare+0xf9/0x100
 syscall_exit_to_user_mode+0x27/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This patch adds check in __smc_release() to make
sure we won't do an extra sock_put() and set the
socket to CLOSED when its already in CLOSED state.

Fixes: 51f1de79 (net/smc: replace sock_put worker by socket refcounting)
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5d5aadc

01 11月, 2021 1 次提交

net/smc: Introduce tracepoint for fallback · 48262608

由 Tony Lu 提交于 11月 01, 2021

This introduces tracepoint for smc fallback to TCP, so that we can track
which connection and why it fallbacks, and map the clcsocks' pointer with
/proc/net/tcp to find more details about TCP connections. Compared with
kprobe or other dynamic tracing, tracepoints are stable and easy to use.
Signed-off-by: NTony Lu <tonylu@linux.alibaba.com>
Reviewed-by: NWen Gu <guwen@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48262608

28 10月, 2021 1 次提交

net/smc: Correct spelling mistake to TCPF_SYN_RECV · f3a3a0fe

由 Wen Gu 提交于 10月 28, 2021

There should use TCPF_SYN_RECV instead of TCP_SYN_RECV.
Signed-off-by: NWen Gu <guwen@linux.alibaba.com>
Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3a3a0fe

16 10月, 2021 4 次提交

net/smc: extend LLC layer for SMC-Rv2 · b4ba4652

由 Karsten Graul 提交于 10月 16, 2021

Add support for large v2 LLC control messages in smc_llc.c.
The new large work request buffer allows to combine control
messages into one packet that had to be spread over several
packets before.
Add handling of the new v2 LLC messages.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4ba4652

net/smc: add listen processing for SMC-Rv2 · e49300a6

由 Karsten Graul 提交于 10月 16, 2021

Implement the server side of the SMC-Rv2 processing. Process incoming
CLC messages, find eligible devices and check for a valid route to the
remote peer.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e49300a6

net/smc: add SMC-Rv2 connection establishment · e5c4744c

由 Karsten Graul 提交于 10月 16, 2021

Send a CLC proposal message, and the remote side process this type of
message and determine the target GID. Check for a valid route to this
GID, and complete the connection establishment.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5c4744c

net/smc: prepare for SMC-Rv2 connection · 42042dbb

由 Karsten Graul 提交于 10月 16, 2021

Prepare the connection establishment with SMC-Rv2. Detect eligible
RoCE cards and indicate all supported SMC modes for the connection.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42042dbb

14 9月, 2021 2 次提交

net/smc: keep static copy of system EID · 11a26c59

由 Karsten Graul 提交于 9月 14, 2021

The system EID is retrieved using an registered ISM device each time
when needed. This adds some unnecessary complexity at all places where
the system EID is needed, but no ISM device is at hand.
Simplify the code and save the system EID in a static variable in
smc_ism.c.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Reviewed-by: NGuvenc Gulce  <guvenc@linux.ibm.com>
Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11a26c59

net/smc: add support for user defined EIDs · fa086662

由 Karsten Graul 提交于 9月 14, 2021

SMC-Dv2 allows users to define EIDs which allows to create separate
name spaces enabling users to cluster their SMC-Dv2 connections.
Add support for user defined EIDs and extent the generic netlink
interface so users can add, remove and dump EIDs.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Reviewed-by: NGuvenc Gulce  <guvenc@linux.ibm.com>
Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa086662

09 8月, 2021 1 次提交

net/smc: Correct smc link connection counter in case of smc client · 64513d26

由 Guvenc Gulce 提交于 8月 09, 2021

SMC clients may be assigned to a different link after the initial
connection between two peers was established. In such a case,
the connection counter was not correctly set.

Update the connection counter correctly when a smc client connection
is assigned to a different smc link.

Fixes: 07d51580 ("net/smc: Add connection counters for links")
Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
Tested-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64513d26

30 6月, 2021 1 次提交

net: sock: introduce sk_error_report · e3ae2365

由 Alexander Aring 提交于 6月 27, 2021

This patch introduces a function wrapper to call the sk_error_report
callback. That will prepare to add additional handling whenever
sk_error_report is called, for example to trace socket errors.
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3ae2365

17 6月, 2021 2 次提交

net/smc: Make SMC statistics network namespace aware · 194730a9

由 Guvenc Gulce 提交于 6月 16, 2021

Make the gathered SMC statistics network namespace aware, for each
namespace collect an own set of statistic information.
Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

194730a9

net/smc: Add SMC statistics support · e0e4b8fa

由 Guvenc Gulce 提交于 6月 16, 2021

Add the ability to collect SMC statistics information. Per-cpu
variables are used to collect the statistic information for better
performance and for reducing concurrency pitfalls. The code that is
collecting statistic data is implemented in macros to increase code
reuse and readability.
Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0e4b8fa

06 5月, 2021 1 次提交

smc: disallow TCP_ULP in smc_setsockopt() · 86214366

由 Cong Wang 提交于 5月 05, 2021

syzbot is able to setup kTLS on an SMC socket which coincidentally
uses sk_user_data too. Later, kTLS treats it as psock so triggers a
refcnt warning. The root cause is that smc_setsockopt() simply calls
TCP setsockopt() which includes TCP_ULP. I do not think it makes
sense to setup kTLS on top of SMC sockets, so we should just disallow
this setup.

It is hard to find a commit to blame, but we can apply this patch
since the beginning of TCP_ULP.

Reported-and-tested-by: syzbot+b54a1ce86ba4a623b7f0@syzkaller.appspotmail.com
Fixes: 734942cc ("tcp: ULP infrastructure")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86214366

28 4月, 2021 1 次提交

net/smc: Remove redundant assignment to rc · 6fd6c483

由 Jiapeng Chong 提交于 4月 27, 2021

Variable rc is set to zero but this value is never read as it is
overwritten with a new value later on, hence it is a redundant
assignment and can be removed.

Cleans up the following clang-analyzer warning:

net/smc/af_smc.c:1079:3: warning: Value stored to 'rc' is never read
[clang-analyzer-deadcode.DeadStores].
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fd6c483

02 12月, 2020 3 次提交

net/smc: Introduce generic netlink interface for diagnostic purposes · e8372d9d

由 Guvenc Gulce 提交于 12月 01, 2020

Introduce generic netlink interface infrastructure to expose
the diagnostic information regarding smc linkgroups, links and devices.
Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e8372d9d

net/smc: Refactor smc ism v2 capability handling · 49407ae2

由 Guvenc Gulce 提交于 12月 01, 2020

Encapsulate the smc ism v2 capability boolean value
in a function for better information hiding.
Signed-off-by: NGuvenc Gulce <guvenc@linux.ibm.com>
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

49407ae2

net/smc: use helper smc_conn_abort() in listen processing · 8cf3f3e4

由 Karsten Graul 提交于 12月 01, 2020

The helper smc_connect_abort() can be used by the listen processing
functions, too. And rename this helper to smc_conn_abort() to make the
purpose clearer.
No functional change.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8cf3f3e4

20 11月, 2020 1 次提交

net/smc: fix matching of existing link groups · 0530bd6e

由 Karsten Graul 提交于 11月 18, 2020

With the multi-subnet support of SMC-Dv2 the match for existing link
groups should not include the vlanid of the network device.
Set ini->smcd_version accordingly before the call to smc_conn_create()
and use this value in smc_conn_create() to skip the vlanid check.

Fixes: 5c21c4cc ("net/smc: determine accepted ISM devices")
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

0530bd6e

01 11月, 2020 1 次提交

net/smc: improve return codes for SMC-Dv2 · 3752404a

由 Karsten Graul 提交于 10月 31, 2020

To allow better problem diagnosis the return codes for SMC-Dv2 are
improved by this patch. A few more CLC DECLINE codes are defined and
sent to the peer when an SMC connection cannot be established.
There are now multiple SMC variations that are offered by the client and
the server may encounter problems to initialize all of them.
Because only one diagnosis code can be sent to the client the decision
was made to send the first code that was encountered. Because the server
tries the variations in the order of importance (SMC-Dv2, SMC-D, SMC-R)
this makes sure that the diagnosis code of the most important variation
is sent.

v2: initialize rc in smc_listen_v2_check().
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20201031181938.69903-1-kgraul@linux.ibm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

3752404a

27 10月, 2020 1 次提交

net/smc: fix null pointer dereference in smc_listen_decline() · 4a9baf45

由 Karsten Graul 提交于 10月 23, 2020

smc_listen_work() calls smc_listen_decline() on label out_decl,
providing the ini pointer variable. But this pointer can still be null
when the label out_decl is reached.
Fix this by checking the ini variable in smc_listen_work() and call
smc_listen_decline() with the result directly.

Fixes: a7c9c5f4 ("net/smc: CLC accept / confirm V2")
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

4a9baf45

10 10月, 2020 3 次提交

net/smc: restore smcd_version when all ISM V2 devices failed to init · f29fa003

由 Karsten Graul 提交于 10月 07, 2020

Field ini->smcd_version is set to SMC_V2 before calling
smc_listen_ism_init(). This clears the V1 bit that may be set. When all
matching ISM V2 devices fail to initialize then the smcd_version field
needs to get restored to allow any possible V1 devices to initialize.
And be consistent, always go to the not_found label when no device was
found.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

f29fa003

net/smc: cleanup buffer usage in smc_listen_work() · 9047a617

由 Karsten Graul 提交于 10月 07, 2020

coccinelle informs about
net/smc/af_smc.c:1770:10-11: WARNING: opportunity for kzfree/kvfree_sensitive

Its not that kzfree() would help here, the memset() is done to prepare
the buffer for another socket receive.
Fix that warning message by reordering the calls, while at it eliminate
the unneeded variable cclc2 and use sizeof(*buf) as above in the same
function. No functional changes.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

9047a617

net/smc: consolidate unlocking in same function · c60a2cef

由 Karsten Graul 提交于 10月 07, 2020

Static code checkers warn of inconsistent returns because the lgr mutex
is locked in one function and unlocked in a function called by the
locking function:
net/smc/af_smc.c:823 smc_connect_rdma() warn: inconsistent returns 'smc_client_lgr_pending'.
net/smc/af_smc.c:897 smc_connect_ism() warn: inconsistent returns 'smc_server_lgr_pending'.

Make the code consistent by doing the unlock in the same function that
fetches the lock. No functional changes.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

c60a2cef

04 10月, 2020 1 次提交

net/smc: send ISM devices with unique chid in CLC proposal · 839d696f

由 Karsten Graul 提交于 10月 02, 2020

When building a CLC proposal message then the list of ISM devices does
not need to contain multiple devices that have the same chid value,
all these devices use the same function at the end.
Improve smc_find_ism_v2_device_clnt() to collect only ISM devices that
have unique chid values.
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

839d696f

29 9月, 2020 11 次提交

net/smc: CLC decline - V2 enhancements · e8d726c8