提交 9ecbfb06 编写于 作者: J John Fastabend 提交者: Alexei Starovoitov

bpf, sockmap: On receive programs try to fast track SK_PASS ingress

When we receive an skb and the ingress skb verdict program returns
SK_PASS we currently set the ingress flag and put it on the workqueue
so it can be turned into a sk_msg and put on the sk_msg ingress queue.
Then finally telling userspace with data_ready hook.

Here we observe that if the workqueue is empty then we can try to
convert into a sk_msg type and call data_ready directly without
bouncing through a workqueue. Its a common pattern to have a recv
verdict program for visibility that always returns SK_PASS. In this
case unless there is an ENOMEM error or we overrun the socket we
can avoid the workqueue completely only using it when we fall back
to error cases caused by memory pressure.

By doing this we eliminate another case where data may be dropped
if errors occur on memory limits in workqueue.

Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160226859704.5692.12929678876744977669.stgit@john-Precision-5820-Tower
上级 cfea28f8
...@@ -773,6 +773,7 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, ...@@ -773,6 +773,7 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
{ {
struct tcp_skb_cb *tcp; struct tcp_skb_cb *tcp;
struct sock *sk_other; struct sock *sk_other;
int err = -EIO;
switch (verdict) { switch (verdict) {
case __SK_PASS: case __SK_PASS:
...@@ -784,8 +785,20 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, ...@@ -784,8 +785,20 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
tcp = TCP_SKB_CB(skb); tcp = TCP_SKB_CB(skb);
tcp->bpf.flags |= BPF_F_INGRESS; tcp->bpf.flags |= BPF_F_INGRESS;
skb_queue_tail(&psock->ingress_skb, skb);
schedule_work(&psock->work); /* If the queue is empty then we can submit directly
* into the msg queue. If its not empty we have to
* queue work otherwise we may get OOO data. Otherwise,
* if sk_psock_skb_ingress errors will be handled by
* retrying later from workqueue.
*/
if (skb_queue_empty(&psock->ingress_skb)) {
err = sk_psock_skb_ingress(psock, skb);
}
if (err < 0) {
skb_queue_tail(&psock->ingress_skb, skb);
schedule_work(&psock->work);
}
break; break;
case __SK_REDIRECT: case __SK_REDIRECT:
sk_psock_skb_redirect(skb); sk_psock_skb_redirect(skb);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册