提交 · 29a949325c6c90f1431db9af64592275c83d9b2a · openeuler / Kernel

11 9月, 2020 6 次提交

tcp: simplify tcp_set_congestion_control(): Always reinitialize · 29a94932

由 Neal Cardwell 提交于 9月 10, 2020

Now that the previous patches ensure that all call sites for
tcp_set_congestion_control() want to initialize congestion control, we
can simplify tcp_set_congestion_control() by removing the reinit
argument and the code to support it.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NKevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>

29a94932

tcp: Simplify EBPF TCP_CONGESTION to always init CC · e7b10a4d

由 Neal Cardwell 提交于 9月 10, 2020

Now that the previous patch ensures we don't initialize the congestion
control twice, when EBPF sets the congestion control algorithm at
connection establishment we can simplify the code by simply
initializing the congestion control module at that time.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NKevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>

e7b10a4d

tcp: Only init congestion control if not initialized already · 8919a9b3

由 Neal Cardwell 提交于 9月 10, 2020

Change tcp_init_transfer() to only initialize congestion control if it
has not been initialized already.

With this new approach, we can arrange things so that if the EBPF code
sets the congestion control by calling setsockopt(TCP_CONGESTION) then
tcp_init_transfer() will not re-initialize the CC module.

This is an approach that has the following beneficial properties:

(1) This allows CC module customizations made by the EBPF called in
    tcp_init_transfer() to persist, and not be wiped out by a later
    call to tcp_init_congestion_control() in tcp_init_transfer().

(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
    for existing code upstream that depends on the current order.

(3) Does not cause 2 initializations for for CC in the case where the
    EBPF called in tcp_init_transfer() wants to set the CC to a new CC
    algorithm.

(4) Allows follow-on simplifications to the code in net/core/filter.c
    and net/ipv4/tcp_cong.c, which currently both have some complexity
    to special-case CC initialization to avoid double CC
    initialization if EBPF sets the CC.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NKevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>

8919a9b3

net: Allow iterating sockmap and sockhash · 03653515

由 Lorenz Bauer 提交于 9月 09, 2020

Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
hashtable implementation. sockmap and sockhash share the same iteration
context: a pointer to an arbitrary key and a pointer to a socket. Both
pointers may be NULL, and so BPF has to perform a NULL check before accessing
them. Technically it's not possible for sockhash iteration to yield a NULL
socket, but we ignore this to be able to use a single iteration point.

Iteration will visit all keys that remain unmodified during the lifetime of
the iterator. It may or may not visit newly added ones.

Switch from using rcu_dereference_raw to plain rcu_dereference, so we gain
another guard rail if CONFIG_PROVE_RCU is enabled.
Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200909162712.221874-3-lmb@cloudflare.com

03653515

net: sockmap: Remove unnecessary sk_fullsock checks · 654785a1

由 Lorenz Bauer 提交于 9月 09, 2020

The lookup paths for sockmap and sockhash currently include a check
that returns NULL if the socket we just found is not a full socket.
However, this check is not necessary. On insertion we ensure that
we have a full socket (caveat around sock_ops), so request sockets
are not a problem. Time-wait sockets are allocated separate from
the original socket and then fed into the hashdance. They don't
affect the sockets already stored in the sockmap.
Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200909162712.221874-2-lmb@cloudflare.com

654785a1

bpf: Remove duplicate headers · e9091bb7

由 Chen Zhou 提交于 9月 08, 2020

Remove duplicate headers which are included twice.
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200908132201.184005-1-chenzhou10@huawei.com

e9091bb7

03 9月, 2020 2 次提交

xsk: Fix use-after-free in failed shared_umem bind · 83cf5c68

由 Magnus Karlsson 提交于 9月 02, 2020

Fix use-after-free when a shared umem bind fails. The code incorrectly
tried to free the allocated buffer pool both in the bind code and then
later also when the socket was released. Fix this by setting the
buffer pool pointer to NULL after the bind code has freed the pool, so
that the socket release code will not try to free the pool. This is
the same solution as the regular, non-shared umem code path has. This
was missing from the shared umem path.

Fixes: b5aea28d ("xsk: Add shared umem support between queue ids")
Reported-by: syzbot+5334f62e4d22804e646a@syzkaller.appspotmail.com
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1599032164-25684-1-git-send-email-magnus.karlsson@intel.com

83cf5c68

xsk: Fix null check on error return path · 1d6fd78a

由 Gustavo A. R. Silva 提交于 9月 02, 2020

Currently, dma_map is being checked, when the right object identifier
to be null-checked is dma_map->dma_pages, instead.

Fix this by null-checking dma_map->dma_pages.

Fixes: 921b6869 ("xsk: Enable sharing of dma mappings")
Addresses-Coverity-ID: 1496811 ("Logically dead code")
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20200902150750.GA7257@embeddedor

1d6fd78a

02 9月, 2020 7 次提交

xsk: Fix possible segfault at xskmap entry insertion · 968be23c

由 Magnus Karlsson 提交于 9月 02, 2020

Fix possible segfault when entry is inserted into xskmap. This can
happen if the socket is in a state where the umem has been set up, the
Rx ring created but it has yet to be bound to a device. In this case
the pool has not yet been created and we cannot reference it for the
existence of the fill ring. Fix this by removing the whole
xsk_is_setup_for_bpf_map function. Once upon a time, it was used to
make sure that the Rx and fill rings where set up before the driver
could call xsk_rcv, since there are no tests for the existence of
these rings in the data path. But these days, we have a state variable
that we test instead. When it is XSK_BOUND, everything has been set up
correctly and the socket has been bound. So no reason to have the
xsk_is_setup_for_bpf_map function anymore.

Fixes: 7361f9c3 ("xsk: Move fill and completion rings to buffer pool")
Reported-by: syzbot+febe51d44243fbc564ee@syzkaller.appspotmail.com
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1599037569-26690-1-git-send-email-magnus.karlsson@intel.com

968be23c

xsk: Fix possible segfault in xsk umem diagnostics · 53ea2076

由 Magnus Karlsson 提交于 9月 02, 2020

Fix possible segfault in the xsk diagnostics code when dumping
information about the umem. This can happen when a umem has been
created, but the socket has not been bound yet. In this case, the xsk
buffer pool does not exist yet and we cannot dump the information
that was moved from the umem to the buffer pool. Fix this by testing
for the existence of the buffer pool and if not there, do not dump any
of that information.

Fixes: c2d3d6a4 ("xsk: Move queue_id, dev and need_wakeup to buffer pool")
Fixes: 7361f9c3 ("xsk: Move fill and completion rings to buffer pool")
Reported-by: syzbot+3f04d36b7336f7868066@syzkaller.appspotmail.com
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/1599036743-26454-1-git-send-email-magnus.karlsson@intel.com

53ea2076

net: openvswitch: fixes crash if nf_conncount_init() fails · e0afe914

由 Eelco Chaudron 提交于 8月 31, 2020

If nf_conncount_init fails currently the dispatched work is not canceled,
causing problems when the timer fires. This change fixes this by not
scheduling the work until all initialization is successful.

Fixes: a65878d6 ("net: openvswitch: fixes potential deadlock in dp cleanup code")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Reviewed-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0afe914

net/tls: Implement getsockopt SOL_TLS TLS_RX · ffa81fa4

由 Yutaro Hayakawa 提交于 9月 01, 2020

Implement the getsockopt SOL_TLS TLS_RX which is currently missing. The
primary usecase is to use it in conjunction with TCP_REPAIR to
checkpoint/restore the TLS record layer state.

TLS connection state usually exists on the user space library. So
basically we can easily extract it from there, but when the TLS
connections are delegated to the kTLS, it is not the case. We need to
have a way to extract the TLS state from the kernel for both of TX and
RX side.

The new TLS_RX getsockopt copies the crypto_info to user in the same
way as TLS_TX does.

We have described use cases in our research work in Netdev 0x14
Transport Workshop [1].

Also, there is an TLS implementation called tlse [2] which supports
TLS connection migration. They have support of kTLS and their code
shows that they are expecting the future support of this option.

[1] https://speakerdeck.com/yutarohayakawa/prism-proxies-without-the-pain
[2] https://github.com/eduardsui/tlseSigned-off-by: NYutaro Hayakawa <yhayakawa3720@gmail.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffa81fa4

net: openvswitch: remove unused keep_flows · e6896163

由 Tonghao Zhang 提交于 9月 01, 2020

keep_flows was introduced by [1], which used as flag to delete flows or not.
When rehashing or expanding the table instance, we will not flush the flows.
Now don't use it anymore, remove it.

[1] - https://github.com/openvswitch/ovs/commit/acd051f1761569205827dc9b037e15568a8d59f8
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6896163

net: openvswitch: refactor flow free function · df68d64e

由 Tonghao Zhang 提交于 9月 01, 2020

Decrease table->count and ufid_count unconditionally,
because we only don't use count or ufid_count to count
when flushing the flows. To simplify the codes, we
remove the "count" argument of table_instance_flow_free.

To avoid a bug when deleting flows in the future, add
WARN_ON in flush flows function.

Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df68d64e

net: openvswitch: improve the coding style · cf3266ad

由 Tonghao Zhang 提交于 9月 01, 2020

Not change the logic, just improve the coding style.

Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf3266ad

01 9月, 2020 20 次提交

net: ipv4: remove unused arg exact_dif in compute_score · 34e1ec31

由 Miaohe Lin 提交于 8月 31, 2020

The arg exact_dif is not used anymore, remove it. inet_exact_dif_match()
is no longer needed after the above is removed, so remove it too.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34e1ec31

net: ipv6: remove unused arg exact_dif in compute_score · 3f7d820b

由 Miaohe Lin 提交于 8月 31, 2020

The arg exact_dif is not used anymore, remove it. inet6_exact_dif_match()
is no longer needed after the above is removed, remove it too.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f7d820b

tipc: Remove unused macro TIPC_NACK_INTV · 622a63f6

由 YueHaibing 提交于 8月 29, 2020

There is no caller in tree any more.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

622a63f6

tipc: Remove unused macro TIPC_FWD_MSG · ff007a9b

由 YueHaibing 提交于 8月 29, 2020

There is no caller in tree any more.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff007a9b

mptcp: Remove unused macro MPTCP_SAME_STATE · b1fd4470

由 YueHaibing 提交于 8月 29, 2020

There is no caller in tree any more.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1fd4470

net: clean up codestyle · 5af68891

由 Miaohe Lin 提交于 8月 29, 2020

This is a pure codestyle cleanup patch. No functional change intended.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5af68891

net: Use helper macro IP_MAX_MTU in __ip_append_data() · cbc08a33

由 Miaohe Lin 提交于 8月 29, 2020

What 0xFFFF means here is actually the max mtu of a ip packet. Use help
macro IP_MAX_MTU here.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbc08a33

openvswitch: using ip6_fragment in ipv6_stub · a7c978c6

由 wenxu 提交于 8月 28, 2020

Using ipv6_stub->ipv6_fragment to avoid the netfilter dependency
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7c978c6

ipv6: add ipv6_fragment hook in ipv6_stub · 1d97898b

由 wenxu 提交于 8月 28, 2020

Add ipv6_fragment to ipv6_stub to avoid calling netfilter when
access ip6_fragment.
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d97898b

xsk: Add shared umem support between devices · a1132430

由 Magnus Karlsson 提交于 8月 28, 2020

Add support to share a umem between different devices. This mode
can be invoked with the XDP_SHARED_UMEM bind flag. Previously,
sharing was only supported within the same device. Note that when
sharing a umem between devices, just as in the case of sharing a
umem between queue ids, you need to create a fill ring and a
completion ring and tie them to the socket (with two setsockopts,
one for each ring) before you do the bind with the
XDP_SHARED_UMEM flag. This so that the single-producer
single-consumer semantics of the rings can be upheld.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-13-git-send-email-magnus.karlsson@intel.com

a1132430

xsk: Add shared umem support between queue ids · b5aea28d

由 Magnus Karlsson 提交于 8月 28, 2020

Add support to share a umem between queue ids on the same
device. This mode can be invoked with the XDP_SHARED_UMEM bind
flag. Previously, sharing was only supported within the same
queue id and device, and you shared one set of fill and
completion rings. However, note that when sharing a umem between
queue ids, you need to create a fill ring and a completion ring
and tie them to the socket before you do the bind with the
XDP_SHARED_UMEM flag. This so that the single-producer
single-consumer semantics can be upheld.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-12-git-send-email-magnus.karlsson@intel.com

b5aea28d

xsk: Enable sharing of dma mappings · 921b6869

由 Magnus Karlsson 提交于 8月 28, 2020

Enable the sharing of dma mappings by moving them out from the buffer
pool. Instead we put each dma mapped umem region in a list in the umem
structure. If dma has already been mapped for this umem and device, it
is not mapped again and the existing dma mappings are reused.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-9-git-send-email-magnus.karlsson@intel.com

921b6869

xsk: Move addrs from buffer pool to umem · 7f7ffa4e

由 Magnus Karlsson 提交于 8月 28, 2020

Replicate the addrs pointer in the buffer pool to the umem. This mapping
will be the same for all buffer pools sharing the same umem. In the
buffer pool we leave the addrs pointer for performance reasons.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-8-git-send-email-magnus.karlsson@intel.com

7f7ffa4e

xsk: Move xsk_tx_list and its lock to buffer pool · a5aa8e52

由 Magnus Karlsson 提交于 8月 28, 2020

Move the xsk_tx_list and the xsk_tx_list_lock from the umem to
the buffer pool. This so that we in a later commit can share the
umem between multiple HW queues. There is one xsk_tx_list per
device and queue id, so it should be located in the buffer pool.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-7-git-send-email-magnus.karlsson@intel.com

a5aa8e52

xsk: Move queue_id, dev and need_wakeup to buffer pool · c2d3d6a4

由 Magnus Karlsson 提交于 8月 28, 2020

Move queue_id, dev, and need_wakeup from the umem to the
buffer pool. This so that we in a later commit can share the umem
between multiple HW queues. There is one buffer pool per dev and
queue id, so these variables should belong to the buffer pool, not
the umem. Need_wakeup is also something that is set on a per napi
level, so there is usually one per device and queue id. So move
this to the buffer pool too.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-6-git-send-email-magnus.karlsson@intel.com

c2d3d6a4

xsk: Move fill and completion rings to buffer pool · 7361f9c3

由 Magnus Karlsson 提交于 8月 28, 2020

Move the fill and completion rings from the umem to the buffer
pool. This so that we in a later commit can share the umem
between multiple HW queue ids. In this case, we need one fill and
completion ring per queue id. As the buffer pool is per queue id
and napi id this is a natural place for it and one umem
struture can be shared between these buffer pools.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-5-git-send-email-magnus.karlsson@intel.com

7361f9c3

xsk: Create and free buffer pool independently from umem · 1c1efc2a

由 Magnus Karlsson 提交于 8月 28, 2020

Create and free the buffer pool independently from the umem. Move
these operations that are performed on the buffer pool from the
umem create and destroy functions to new create and destroy
functions just for the buffer pool. This so that in later commits
we can instantiate multiple buffer pools per umem when sharing a
umem between HW queues and/or devices. We also erradicate the
back pointer from the umem to the buffer pool as this will not
work when we introduce the possibility to have multiple buffer
pools per umem.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-4-git-send-email-magnus.karlsson@intel.com

1c1efc2a

xsk: i40e: ice: ixgbe: mlx5: Rename xsk zero-copy driver interfaces · c4655761

由 Magnus Karlsson 提交于 8月 28, 2020

Rename the AF_XDP zero-copy driver interface functions to better
reflect what they do after the replacement of umems with buffer
pools in the previous commit. Mostly it is about replacing the
umem name from the function names with xsk_buff and also have
them take the a buffer pool pointer instead of a umem. The
various ring functions have also been renamed in the process so
that they have the same naming convention as the internal
functions in xsk_queue.h. This so that it will be clearer what
they do and also for consistency.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com

c4655761

xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umem · 1742b3d5

由 Magnus Karlsson 提交于 8月 28, 2020

Replace the explicit umem reference passed to the driver in AF_XDP
zero-copy mode with the buffer pool instead. This in preparation for
extending the functionality of the zero-copy mode so that umems can be
shared between queues on the same netdev and also between netdevs. In
this commit, only an umem reference has been added to the buffer pool
struct. But later commits will add other entities to it. These are
going to be entities that are different between different queue ids
and netdevs even though the umem is shared between them.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com

1742b3d5

netlink: policy: correct validation type check · c30a3c95

由 Johannes Berg 提交于 8月 31, 2020

In the policy export for binary attributes I erroneously used
a != NLA_VALIDATE_NONE comparison instead of checking for the
two possible values, which meant that if a validation function
pointer ended up aliasing the min/max as negatives, we'd hit
a warning in nla_get_range_unsigned().

Fix this to correctly check for only the two types that should
be handled here, i.e. range with or without warn-too-long.

Reported-by: syzbot+353df1490da781637624@syzkaller.appspotmail.com
Fixes: 8aa26c57 ("netlink: make NLA_BINARY validation more flexible")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c30a3c95

29 8月, 2020 1 次提交

netlabel: remove unused param from audit_log_format() · 0f091e43

由 Alex Dewar 提交于 8月 28, 2020

Commit d3b990b7 ("netlabel: fix problems with mapping removal")
added a check to return an error if ret_val != 0, before ret_val is
later used in a log message. Now it will unconditionally print "...
res=1". So just drop the check.

Addresses-Coverity: ("Dead code")
Fixes: d3b990b7 ("netlabel: fix problems with mapping removal")
Signed-off-by: NAlex Dewar <alex.dewar90@gmail.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f091e43

28 8月, 2020 3 次提交

net: add option to not create fall-back tunnels in root-ns as well · 316cdaa1

由 Mahesh Bandewar 提交于 8月 26, 2020

The sysctl that was added  earlier by commit 79134e6c ("net: do
not create fallback tunnels for non-default namespaces") to create
fall-back only in root-ns. This patch enhances that behavior to provide
option not to create fallback tunnels in root-ns as well. Since modules
that create fallback tunnels could be built-in and setting the sysctl
value after booting is pointless, so added a kernel cmdline options to
change this default. The default setting is preseved for backward
compatibility. The kernel command line option of fb_tunnels=initns will
set the sysctl value to 1 and will create fallback tunnels only in initns
while kernel cmdline fb_tunnels=none will set the sysctl value to 2 and
fallback tunnels are skipped in every netns.
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Zenczykowski <maze@google.com>
Cc: Jian Yang <jianyang@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

316cdaa1

bpf: Relax max_entries check for most of the inner map types · 134fede4

由 Martin KaFai Lau 提交于 8月 27, 2020

Most of the maps do not use max_entries during verification time.
Thus, those map_meta_equal() do not need to enforce max_entries
when it is inserted as an inner map during runtime.  The max_entries
check is removed from the default implementation bpf_map_meta_equal().

The prog_array_map and xsk_map are exception.  Its map_gen_lookup
uses max_entries to generate inline lookup code.  Thus, they will
implement its own map_meta_equal() to enforce max_entries.
Since there are only two cases now, the max_entries check
is not refactored and stays in its own .c file.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200828011813.1970516-1-kafai@fb.com

134fede4

bpf: Add map_meta_equal map ops · f4d05259

由 Martin KaFai Lau 提交于 8月 27, 2020

Some properties of the inner map is used in the verification time.
When an inner map is inserted to an outer map at runtime,
bpf_map_meta_equal() is currently used to ensure those properties
of the inserting inner map stays the same as the verification
time.

In particular, the current bpf_map_meta_equal() checks max_entries which
turns out to be too restrictive for most of the maps which do not use
max_entries during the verification time. It limits the use case that
wants to replace a smaller inner map with a larger inner map. There are
some maps do use max_entries during verification though. For example,
the map_gen_lookup in array_map_ops uses the max_entries to generate
the inline lookup code.

To accommodate differences between maps, the map_meta_equal is added
to bpf_map_ops. Each map-type can decide what to check when its
map is used as an inner map during runtime.

Also, some map types cannot be used as an inner map and they are
currently black listed in bpf_map_meta_alloc() in map_in_map.c.
It is not unusual that the new map types may not aware that such
blacklist exists. This patch enforces an explicit opt-in
and only allows a map to be used as an inner map if it has
implemented the map_meta_equal ops. It is based on the
discussion in [1].

All maps that support inner map has its map_meta_equal points
to bpf_map_meta_equal in this patch. A later patch will
relax the max_entries check for most maps. bpf_types.h
counts 28 map types. This patch adds 23 ".map_meta_equal"
by using coccinelle. -5 for
BPF_MAP_TYPE_PROG_ARRAY
BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
BPF_MAP_TYPE_STRUCT_OPS
BPF_MAP_TYPE_ARRAY_OF_MAPS
BPF_MAP_TYPE_HASH_OF_MAPS

The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc()
is moved such that the same error is returned.

[1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com

f4d05259

27 8月, 2020 1 次提交

tipc: fix use-after-free in tipc_bcast_get_mode · fdeba99b

由 Hoang Huu Le 提交于 8月 27, 2020

Syzbot has reported those issues as:

==================================================================
BUG: KASAN: use-after-free in tipc_bcast_get_mode+0x3ab/0x400 net/tipc/bcast.c:759
Read of size 1 at addr ffff88805e6b3571 by task kworker/0:6/3850

CPU: 0 PID: 3850 Comm: kworker/0:6 Not tainted 5.8.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events tipc_net_finalize_work

Thread 1's call trace:
[...]
  kfree+0x103/0x2c0 mm/slab.c:3757 <- bcbase releasing
  tipc_bcast_stop+0x1b0/0x2f0 net/tipc/bcast.c:721
  tipc_exit_net+0x24/0x270 net/tipc/core.c:112
[...]

Thread 2's call trace:
[...]
  tipc_bcast_get_mode+0x3ab/0x400 net/tipc/bcast.c:759 <- bcbase
has already been freed by Thread 1

  tipc_node_broadcast+0x9e/0xcc0 net/tipc/node.c:1744
  tipc_nametbl_publish+0x60b/0x970 net/tipc/name_table.c:752
  tipc_net_finalize net/tipc/net.c:141 [inline]
  tipc_net_finalize+0x1fa/0x310 net/tipc/net.c:131
  tipc_net_finalize_work+0x55/0x80 net/tipc/net.c:150
[...]

==================================================================
BUG: KASAN: use-after-free in tipc_named_reinit+0xef/0x290 net/tipc/name_distr.c:344
Read of size 8 at addr ffff888052ab2000 by task kworker/0:13/30628
CPU: 0 PID: 30628 Comm: kworker/0:13 Not tainted 5.8.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events tipc_net_finalize_work
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1f0/0x31e lib/dump_stack.c:118
 print_address_description+0x66/0x5a0 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report+0x132/0x1d0 mm/kasan/report.c:530
 tipc_named_reinit+0xef/0x290 net/tipc/name_distr.c:344
 tipc_net_finalize+0x85/0xe0 net/tipc/net.c:138
 tipc_net_finalize_work+0x50/0x70 net/tipc/net.c:150
 process_one_work+0x789/0xfc0 kernel/workqueue.c:2269
 worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415
 kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
[...]
Freed by task 14058:
 save_stack mm/kasan/common.c:48 [inline]
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0x114/0x170 mm/kasan/common.c:455
 __cache_free mm/slab.c:3426 [inline]
 kfree+0x10a/0x220 mm/slab.c:3757
 tipc_exit_net+0x29/0x50 net/tipc/core.c:113
 ops_exit_list net/core/net_namespace.c:186 [inline]
 cleanup_net+0x708/0xba0 net/core/net_namespace.c:603
 process_one_work+0x789/0xfc0 kernel/workqueue.c:2269
 worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415
 kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

Fix it by calling flush_scheduled_work() to make sure the
tipc_net_finalize_work() stopped before releasing bcbase object.

Reported-by: syzbot+6ea1f7a8df64596ef4d7@syzkaller.appspotmail.com
Reported-by: syzbot+e9cc557752ab126c1b99@syzkaller.appspotmail.com
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NHoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fdeba99b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功