未验证 提交 02df2e99 编写于 作者: O openeuler-ci-bot 提交者: Gitee

!948 Dependency of Kmesh on Kernel Modification

Merge Pull Request from: @bitcoffee 
 
In the kernel, Kmesh forwards customer requests to actual backend service nodes through Layer 7 orchestration. This capability is per flow. When sending msg for the first time, Kmesh parses user Layer 7 packets and completes orchestration to complete link establishment. This requires that the pseudo link be established in the connect phase and the actual link be established in the sendmsg phase.
Therefore, the following modifications are involved:
1. The ULP framework needs to be supported in the connect phase. The l4 connect function needs to be replaced with the user-defined connect function.
2. After the L4 connect function is invoked, the L3 function can invoke the actual link establishment logic based on the error code and modify the return value of inet_stream_connect at the L3 layer.
3. In the sendmsg message, you can determine whether the delay in link setup is enabled based on the sock status.

Submission Instructions:
1. Add a writeable_tracepoint to modify the return value of __inet_stream_connect in inet_stream_connect.
2. The bpf_defer_connect flag is added to indicate whether the ebpf defer connect delay link establishment logic is enabled.
3. The ULP framework is added to support the ebpf program. The ULP framework can be used in the ebpf program.
4. A call type in sockops is added. This type is used to invoke the ebpf program in the kernel module and identify it when Kmesh delays link establishment. 
 
Link:https://gitee.com/openeuler/kernel/pulls/948 

Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> 
......@@ -240,10 +240,11 @@ struct inet_sock {
nodefrag:1;
__u8 bind_address_no_port:1,
recverr_rfc4884:1,
defer_connect:1; /* Indicates that fastopen_connect is set
defer_connect:1, /* Indicates that fastopen_connect is set
* and cookie exists so we defer connect
* until first data frame is written
*/
bpf_defer_connect:1;
__u8 rcv_tos;
__u8 convert_csum;
int uc_index;
......
......@@ -203,6 +203,20 @@ TRACE_EVENT(inet_sock_set_state,
show_tcp_state_name(__entry->newstate))
);
#undef NET_DECLARE_TRACE
#ifdef DECLARE_TRACE_WRITABLE
#define NET_DECLARE_TRACE(call, proto, args, size) \
DECLARE_TRACE_WRITABLE(call, PARAMS(proto), PARAMS(args), size)
#else
#define NET_DECLARE_TRACE(call, proto, args, size) \
DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
#endif
NET_DECLARE_TRACE(connect_ret,
TP_PROTO(int *err),
TP_ARGS(err),
sizeof(int));
#endif /* _TRACE_SOCK_H */
/* This part must be outside protection */
......
......@@ -4872,6 +4872,7 @@ enum {
* by the kernel or the
* earlier bpf-progs.
*/
BPF_SOCK_OPS_TCP_DEFER_CONNECT_CB,/* call ebpf to defer connect*/
};
/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
......
......@@ -4837,6 +4837,13 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname,
TCP_CA_NAME_MAX-1));
name[TCP_CA_NAME_MAX-1] = 0;
ret = tcp_set_congestion_control(sk, name, false, true);
} else if (optname == TCP_ULP) {
char name[TCP_ULP_NAME_MAX];
strncpy(name, optval, min_t(long, optlen,
TCP_ULP_NAME_MAX - 1));
name[TCP_ULP_NAME_MAX - 1] = 0;
return tcp_set_ulp(sk, name);
} else {
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
......
......@@ -729,6 +729,7 @@ int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr,
lock_sock(sock->sk);
err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0);
release_sock(sock->sk);
trace_connect_ret(&err);
return err;
}
EXPORT_SYMBOL(inet_stream_connect);
......
......@@ -590,7 +590,8 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
if (tp->urg_data & TCP_URG_VALID)
mask |= EPOLLPRI;
} else if (state == TCP_SYN_SENT && inet_sk(sk)->defer_connect) {
} else if (state == TCP_SYN_SENT &&
(inet_sk(sk)->defer_connect || inet_sk(sk)->bpf_defer_connect)) {
/* Active TCP fastopen socket with defer_connect
* Return EPOLLOUT so application can call write()
* in order for kernel to generate SYN+data
......
......@@ -4872,6 +4872,7 @@ enum {
* by the kernel or the
* earlier bpf-progs.
*/
BPF_SOCK_OPS_TCP_DEFER_CONNECT_CB,/* call ebpf to defer connect*/
};
/* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册