提交 · 0146dca70b877b73c5fd9c67912b8a0ca8a7bac7 · openeuler / Kernel

28 4月, 2020 1 次提交

xfrm: add support for UDPv6 encapsulation of ESP · 0146dca7

由 Sabrina Dubroca 提交于 4月 27, 2020

This patch adds support for encapsulation of ESP over UDPv6. The code
is very similar to the IPv4 encapsulation implementation, and allows
to easily add espintcp on IPv6 as a follow-up.
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

0146dca7

15 4月, 2020 3 次提交

Bluetooth: Clear HCI_LL_RPA_RESOLUTION flag on reset · 2eb71a3a

由 Marcel Holtmann 提交于 4月 09, 2020

When the controller is being reset or power cycled, then the flag
HCI_LL_RPA_RESOLUTION which indicates if controller based address
resolution is active needs to be also reset.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

2eb71a3a

Bluetooth: Enable LE Enhanced Connection Complete event. · ff3b8df2

由 Marcel Holtmann 提交于 4月 09, 2020

In case LL Privacy is supported by the controller, it is also a good
idea to use the LE Enhanced Connection Complete event for getting all
information about the new connection and its addresses.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

ff3b8df2

Bluetooth: Sort list of LE features constants · 55beec10

由 Marcel Holtmann 提交于 4月 09, 2020

The list of LE features constants has gotten a bit confused. It lost the
order and gained duplicated. Clean this up.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

55beec10

14 4月, 2020 1 次提交

cfg80211: fix kernel-doc notation · a710d214

由 Lothar Rubusch 提交于 4月 08, 2020

Update missing kernel-doc annotations and fix of related warnings
at 'make htmldocs'.
Signed-off-by: NLothar Rubusch <l.rubusch@gmail.com>
Link: https://lore.kernel.org/r/20200408231013.28370-1-l.rubusch@gmail.com
[fix indentation, attribute references]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a710d214

08 4月, 2020 3 次提交

net: ipv6: do not consider routes via gateways for anycast address check · 03e2a984

由 Tim Stallard 提交于 4月 03, 2020

The behaviour for what is considered an anycast address changed in
commit 45e4fd26 ("ipv6: Only create RTF_CACHE routes after
encountering pmtu exception"). This now considers the first
address in a subnet where there is a route via a gateway
to be an anycast address.

This breaks path MTU discovery and traceroutes when a host in a
remote network uses the address at the start of a prefix
(eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors
will not be sent to anycast addresses.

This patch excludes any routes with a gateway, or via point to
point links, like the behaviour previously from
rt6_is_gw_or_nonexthop in net/ipv6/route.c.

This can be tested with:
ip link add v1 type veth peer name v2
ip netns add test
ip netns exec test ip link set lo up
ip link set v2 netns test
ip link set v1 up
ip netns exec test ip link set v2 up
ip addr add 2001:db8::1/64 dev v1 nodad
ip addr add 2001:db8:100:: dev lo nodad
ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad
ip netns exec test ip route add unreachable 2001:db8:1::1
ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1
ip netns exec test sysctl net.ipv6.conf.all.forwarding=1
ip route add 2001:db8:1::1 via 2001:db8::2
ping -I 2001:db8::1 2001:db8:1::1 -c1
ping -I 2001:db8:100:: 2001:db8:1::1 -c1
ip addr delete 2001:db8:100:: dev lo
ip netns delete test

Currently the first ping will get back a destination unreachable ICMP
error, but the second will never get a response, with "icmp6_send:
acast source" logged. After this patch, both get destination
unreachable ICMP replies.

Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: NTim Stallard <code@timstallard.me.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03e2a984

net: sock.h: fix skb_steal_sock() kernel-doc · 045065f0

由 Lothar Rubusch 提交于 4月 07, 2020

Fix warnings related to kernel-doc notation, and wording in
function description.
Signed-off-by: NLothar Rubusch <l.rubusch@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Tested-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

045065f0

Bluetooth: debugfs option to unset MITM flag · c2aa30db

由 Archie Pusaka 提交于 4月 07, 2020

The BT qualification test SM/MAS/PKE/BV-01-C needs us to turn off
the MITM flag when pairing, and at the same time also set the io
capability to something other than no input no output.

Currently the MITM flag is only unset when the io capability is set
to no input no output, therefore the test cannot be executed.

This patch introduces a debugfs option to force MITM flag to be
turned off.
Signed-off-by: NArchie Pusaka <apusaka@chromium.org>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

c2aa30db

06 4月, 2020 1 次提交

netfilter: nf_tables: do not update stateful expressions if lookup is inverted · a26c1e49

由 Pablo Neira Ayuso 提交于 3月 31, 2020

Initialize set lookup matching element to NULL. Otherwise, the
NFT_LOOKUP_F_INV flag reverses the matching logic and it leads to
deference an uninitialized pointer to the matching element. Make sure
element data area and stateful expression are accessed if there is a
matching set element.

This patch undoes 24791b9a ("netfilter: nft_set_bitmap: initialize set
element extension in lookups") which is not required anymore.

Fixes: 339706bc ("netfilter: nft_lookup: update element stateful expression")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a26c1e49

05 4月, 2020 5 次提交

Bluetooth: Add HCI device identifier for VIRTIO devices · d2a3f5f4

由 Marcel Holtmann 提交于 4月 03, 2020

This patch assigns the next free HCI device identifier to Bluetooth
devices based on VIRTIO devices.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

d2a3f5f4

Bluetooth: Add support for reading security information · bc292258

由 Marcel Holtmann 提交于 4月 03, 2020

To allow userspace to make correcty security policy decision, the kernel
needs to export a few details of the supported security features and
encryption key size information. This command exports this information
and also allows future extensions if needed.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Reviewed-by: NAlain Michaud <alainm@chromium.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

bc292258

Bluetooth: Add support for Read Local Simple Pairing Options · a4790360

由 Marcel Holtmann 提交于 4月 03, 2020

With the Read Local Simple Pairing Options command it is possible to
retrieve the support for max encryption key size supported by the
controller and also if the controller correctly verifies the ECDH public
key during pairing.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Reviewed-by: NAlain Michaud <alainm@chromium.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

a4790360

Bluetooth: Add framework for Microsoft vendor extension · 145373cb

由 Miao-chen Chou 提交于 4月 03, 2020

Micrsoft defined a set for HCI vendor extensions. Check the following
link for details:

https://docs.microsoft.com/en-us/windows-hardware/drivers/bluetooth/microsoft-defined-bluetooth-hci-commands-and-events

This provides the basic framework to enable the extension and read its
supported features. Drivers still have to declare support for this
extension before it can be utilized by the host stack.
Signed-off-by: NMiao-chen Chou <mcchou@chromium.org>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

145373cb

Bluetooth: add support to notify using SCO air mode · 1f8330ea

由 Sathish Narsimman 提交于 4月 03, 2020

notifying using HCI_NOTIFY_CONN_ADD for SCO connection is generic in
case of mSBC audio. To differntiate SCO air mode introducing
HCI_NOTIFY_ENABLE_SCO_CVSD and HCI_NOTIFY_ENABLE_SCO_TRANSP.
Signed-off-by: NSathish Narsimman <sathish.narasimman@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>

1f8330ea

02 4月, 2020 1 次提交

Bluetooth: Add BT_MODE socket option · 3ee7b7cd

由 Luiz Augusto von Dentz 提交于 3月 27, 2020

This adds BT_MODE socket option which can be used to set L2CAP modes,
including modes only supported over LE which were not supported using
the L2CAP_OPTIONS.
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

3ee7b7cd

31 3月, 2020 10 次提交

devlink: Allow setting of packet trap group parameters · c064875a

由 Ido Schimmel 提交于 3月 30, 2020

The previous patch allowed device drivers to publish their default
binding between packet trap policers and packet trap groups. However,
some users might not be content with this binding and would like to
change it.

In case user space passed a packet trap policer identifier when setting
a packet trap group, invoke the appropriate device driver callback and
pass the new policer identifier.

v2:
* Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in
  devlink_trap_group_set() and bail if not present
* Add extack error message in case trap group was partially modified
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c064875a

devlink: Add packet trap group parameters support · f9f54392

由 Ido Schimmel 提交于 3月 30, 2020

Packet trap groups are used to aggregate logically related packet traps.
Currently, these groups allow user space to batch operations such as
setting the trap action of all member traps.

In order to prevent the CPU from being overwhelmed by too many trapped
packets, it is desirable to bind a packet trap policer to these groups.
For example, to limit all the packets that encountered an exception
during routing to 10Kpps.

Allow device drivers to bind default packet trap policers to packet trap
groups when the latter are registered with devlink.

The next patch will enable user space to change this default binding.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9f54392

devlink: Add packet trap policers support · 1e8c6619

由 Ido Schimmel 提交于 3月 30, 2020

Devices capable of offloading the kernel's datapath and perform
functions such as bridging and routing must also be able to send (trap)
specific packets to the kernel (i.e., the CPU) for processing.

For example, a device acting as a multicast-aware bridge must be able to
trap IGMP membership reports to the kernel for processing by the bridge
module.

In most cases, the underlying device is capable of handling packet rates
that are several orders of magnitude higher compared to those that can
be handled by the CPU.

Therefore, in order to prevent the underlying device from overwhelming
the CPU, devices usually include packet trap policers that are able to
police the trapped packets to rates that can be handled by the CPU.

This patch allows capable device drivers to register their supported
packet trap policers with devlink. User space can then tune the
parameters of these policer (currently, rate and burst size) and read
from the device the number of packets that were dropped by the policer,
if supported.

Subsequent patches in the series will allow device drivers to create
default binding between these policers and packet trap groups and allow
user space to change the binding.

v2:
* Add 'strict_start_type' in devlink policy
* Have device drivers provide max/min rate/burst size for each policer.
  Use them to check validity of user provided parameters
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e8c6619

bpf: Don't refcount LISTEN sockets in sk_assign() · 7ae215d2

由 Joe Stringer 提交于 3月 29, 2020

Avoid taking a reference on listen sockets by checking the socket type
in the sk_assign and in the corresponding skb_steal_sock() code in the
the transport layer, and by ensuring that the prefetch free (sock_pfree)
function uses the same logic to check whether the socket is refcounted.
Suggested-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NJoe Stringer <joe@wand.net.nz>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200329225342.16317-4-joe@wand.net.nz

7ae215d2

net: Track socket refcounts in skb_steal_sock() · 71489e21

由 Joe Stringer 提交于 3月 29, 2020

Refactor the UDP/TCP handlers slightly to allow skb_steal_sock() to make
the determination of whether the socket is reference counted in the case
where it is prefetched by earlier logic such as early_demux.
Signed-off-by: NJoe Stringer <joe@wand.net.nz>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200329225342.16317-3-joe@wand.net.nz

71489e21

bpf: Add socket assign support · cf7fbe66

由 Joe Stringer 提交于 3月 29, 2020

Add support for TPROXY via a new bpf helper, bpf_sk_assign().

This helper requires the BPF program to discover the socket via a call
to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
helper takes its own reference to the socket in addition to any existing
reference that may or may not currently be obtained for the duration of
BPF processing. For the destination socket to receive the traffic, the
traffic must be routed towards that socket via local route. The
simplest example route is below, but in practice you may want to route
traffic more narrowly (eg by CIDR):

  $ ip route add local default dev lo

This patch avoids trying to introduce an extra bit into the skb->sk, as
that would require more invasive changes to all code interacting with
the socket to ensure that the bit is handled correctly, such as all
error-handling cases along the path from the helper in BPF through to
the orphan path in the input. Instead, we opt to use the destructor
variable to switch on the prefetch of the socket.
Signed-off-by: NJoe Stringer <joe@wand.net.nz>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz

cf7fbe66

net: dsa: add port policers · 34297176

由 Vladimir Oltean 提交于 3月 29, 2020

The approach taken to pass the port policer methods on to drivers is
pragmatic. It is similar to the port mirroring implementation (in that
the DSA core does all of the filter block interaction and only passes
simple operations for the driver to implement) and dissimilar to how
flow-based policers are going to be implemented (where the driver has
full control over the flow_cls_offload data structure).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34297176

devlink: Implicitly set auto recover flag when registering health reporter · ba7d16c7

由 Eran Ben Elisha 提交于 3月 29, 2020

When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.

Remove this flag from all drivers that called
devlink_health_reporter_create.

All existing health reporters set auto recovery to true if they have a
recover method.

Yet, administrator can unset auto recover via netlink command as prior to
this patch.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba7d16c7

net: sched: expose HW stats types per action used by drivers · 93a129eb

由 Jiri Pirko 提交于 3月 28, 2020

It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 10 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93a129eb

net: introduce nla_put_bitfield32() helper and use it · 8953b077

由 Jiri Pirko 提交于 3月 28, 2020

Introduce a helper to pass value and selector to. The helper packs them
into struct and puts them into netlink message.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8953b077

30 3月, 2020 11 次提交

net: ipv6: add rpl sr tunnel · a7a29f9c

由 Alexander Aring 提交于 3月 27, 2020

This patch adds functionality to configure routes for RPL source routing
functionality. There is no IPIP functionality yet implemented which can
be added later when the cases when to use IPv6 encapuslation comes more
clear.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7a29f9c

net: add net available in build_state · faee6769

由 Alexander Aring 提交于 3月 27, 2020

The build_state callback of lwtunnel doesn't contain the net namespace
structure yet. This patch will add it so we can check on specific
address configuration at creation time of rpl source routes.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

faee6769

net: ipv6: add support for rpl sr exthdr · 8610c7c6

由 Alexander Aring 提交于 3月 27, 2020

This patch adds rpl source routing receive handling. Everything works
only if sysconf "rpl_seg_enabled" and source routing is enabled. Mostly
the same behaviour as IPv6 segmentation routing. To handle compression
and uncompression a rpl.c file is created which contains the necessary
functionality. The receive handling will also care about IPv6
encapsulated so far it's specified as possible nexthdr in RFC 6554.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8610c7c6

addrconf: add functionality to check on rpl requirements · f37c6059

由 Alexander Aring 提交于 3月 27, 2020

This patch adds a functionality to addrconf to check on a specific RPL
address configuration. According to RFC 6554:

To detect loops in the SRH, a router MUST determine if the SRH
includes multiple addresses assigned to any interface on that
router. If such addresses appear more than once and are separated by
at least one address not assigned to that router.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f37c6059

mptcp: add and use MIB counter infrastructure · fc518953

由 Florian Westphal 提交于 3月 27, 2020

Exported via same /proc file as the Linux TCP MIB counters, so "netstat -s"
or "nstat" will show them automatically.

The MPTCP MIB counters are allocated in a distinct pcpu area in order to
avoid bloating/wasting TCP pcpu memory.

Counters are allocated once the first MPTCP socket is created in a
network namespace and free'd on exit.

If no sockets have been allocated, all-zero mptcp counters are shown.

The MIB counter list is taken from the multipath-tcp.org kernel, but
only a few counters have been picked up so far.  The counter list can
be increased at any time later on.

v2 -> v3:
 - remove 'inline' in foo.c files (David S. Miller)
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc518953

mptcp: Add handling of outgoing MP_JOIN requests · ec3edaa7

由 Peter Krystad 提交于 3月 27, 2020

Subflow creation may be initiated by the path manager when
the primary connection is fully established and a remote
address has been received via ADD_ADDR.

Create an in-kernel sock and use kernel_connect() to
initiate connection.

Passive sockets can't acquire the mptcp socket lock at
subflow creation time, so an additional list protected by
a new spinlock is used to track the MPJ subflows.

Such list is spliced into conn_list tail every time the msk
socket lock is acquired, so that it will not interfere
with data flow on the original connection.

Data flow and connection failover not addressed by this commit.
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec3edaa7

mptcp: Add handling of incoming MP_JOIN requests · f296234c

由 Peter Krystad 提交于 3月 27, 2020

Process the MP_JOIN option in a SYN packet with the same flow
as MP_CAPABLE but when the third ACK is received add the
subflow to the MPTCP socket subflow list instead of adding it to
the TCP socket accept queue.

The subflow is added at the end of the subflow list so it will not
interfere with the existing subflows operation and no data is
expected to be transmitted on it.
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f296234c

mptcp: Add ADD_ADDR handling · 3df523ab

由 Peter Krystad 提交于 3月 27, 2020

Add handling for sending and receiving the ADD_ADDR, ADD_ADDR6,
and RM_ADDR suboptions.
Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3df523ab

netfilter: conntrack: add nf_ct_acct_add() · 9312eaba

由 wenxu 提交于 3月 28, 2020

Add nf_ct_acct_add function to update the conntrack counter
with packets and bytes.
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

9312eaba

netfilter: nf_tables: skip set types that do not support for expressions · d56aab26

由 Pablo Neira Ayuso 提交于 3月 27, 2020

The bitmap set does not support for expressions, skip it from the
estimation step.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d56aab26

bpf, net: Fix build issue when net ns not configured · 5a95cbb8

由 Daniel Borkmann 提交于 3月 29, 2020

Fix a redefinition of 'net_gen_cookie' error that was overlooked
when net ns is not configured.

Fixes: f318903c ("bpf: Add netns cookie and enable it for bpf cgroup hooks")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

5a95cbb8

29 3月, 2020 2 次提交

netfilter: nf_queue: place bridge physports into queue_entry struct · 119e52e6

由 Florian Westphal 提交于 3月 27, 2020

The refcount is done via entry->skb, which does work fine.
Major problem: When putting the refcount of the bridge ports, we
must always put the references while the skb is still around.

However, we will need to put the references after okfn() to avoid
a possible 1 -> 0 -> 1 refcount transition, so we cannot use the
skb pointer anymore.

Place the physports in the queue entry structure instead to allow
for refcounting changes in the next patch.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

119e52e6

netfilter: nf_queue: make nf_queue_entry_release_refs static · dd3cc111

由 Florian Westphal 提交于 3月 27, 2020

This is a preparation patch, no logical changes.
Move free_entry into core and rename it to something more sensible.

Will ease followup patches which will complicate the refcount handling.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

dd3cc111

28 3月, 2020 2 次提交

bpf: Allow to retrieve cgroup v1 classid from v2 hooks · 5a52ae4e

由 Daniel Borkmann 提交于 3月 27, 2020

Today, Kubernetes is still operating on cgroups v1, however, it is
possible to retrieve the task's classid based on 'current' out of
connect(), sendmsg(), recvmsg() and bind-related hooks for orchestrators
which attach to the root cgroup v2 hook in a mixed env like in case
of Cilium, for example, in order to then correlate certain pod traffic
and use it as part of the key for BPF map lookups.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/555e1c69db7376c0947007b4951c260e1074efc3.1585323121.git.daniel@iogearbox.net

5a52ae4e

bpf: Add netns cookie and enable it for bpf cgroup hooks · f318903c

由 Daniel Borkmann 提交于 3月 27, 2020

In Cilium we're mainly using BPF cgroup hooks today in order to implement
kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*),
ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic
between Cilium managed nodes. While this works in its current shape and avoids
packet-level NAT for inter Cilium managed node traffic, there is one major
limitation we're facing today, that is, lack of netns awareness.

In Kubernetes, the concept of Pods (which hold one or multiple containers)
has been built around network namespaces, so while we can use the global scope
of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing
NodePort ports on loopback addresses), we also have the need to differentiate
between initial network namespaces and non-initial one. For example, ExternalIP
services mandate that non-local service IPs are not to be translated from the
host (initial) network namespace as one example. Right now, we have an ugly
work-around in place where non-local service IPs for ExternalIP services are
not xlated from connect() and friends BPF hooks but instead via less efficient
packet-level NAT on the veth tc ingress hook for Pod traffic.

On top of determining whether we're in initial or non-initial network namespace
we also have a need for a socket-cookie like mechanism for network namespaces
scope. Socket cookies have the nice property that they can be combined as part
of the key structure e.g. for BPF LRU maps without having to worry that the
cookie could be recycled. We are planning to use this for our sessionAffinity
implementation for services. Therefore, add a new bpf_get_netns_cookie() helper
which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would
provide the cookie for the initial network namespace while passing the context
instead of NULL would provide the cookie from the application's network namespace.
We're using a hole, so no size increase; the assignment happens only once.
Therefore this allows for a comparison on initial namespace as well as regular
cookie usage as we have today with socket cookies. We could later on enable
this helper for other program types as well as we would see need.

  (*) Both externalTrafficPolicy={Local|Cluster} types
  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.cSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net

f318903c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功