提交 · 801c6058d14a82179a7ee17a4b532cac6fad067f · openeuler / Kernel

03 5月, 2021 1 次提交

bpf: Fix leakage of uninitialized bpf stack under speculation · 801c6058

由 Daniel Borkmann 提交于 4月 29, 2021

The current implemented mechanisms to mitigate data disclosure under
speculation mainly address stack and map value oob access from the
speculative domain. However, Piotr discovered that uninitialized BPF
stack is not protected yet, and thus old data from the kernel stack,
potentially including addresses of kernel structures, could still be
extracted from that 512 bytes large window. The BPF stack is special
compared to map values since it's not zero initialized for every
program invocation, whereas map values /are/ zero initialized upon
their initial allocation and thus cannot leak any prior data in either
domain. In the non-speculative domain, the verifier ensures that every
stack slot read must have a prior stack slot write by the BPF program
to avoid such data leaking issue.

However, this is not enough: for example, when the pointer arithmetic
operation moves the stack pointer from the last valid stack offset to
the first valid offset, the sanitation logic allows for any intermediate
offsets during speculative execution, which could then be used to
extract any restricted stack content via side-channel.

Given for unprivileged stack pointer arithmetic the use of unknown
but bounded scalars is generally forbidden, we can simply turn the
register-based arithmetic operation into an immediate-based arithmetic
operation without the need for masking. This also gives the benefit
of reducing the needed instructions for the operation. Given after
the work in 7fedb63a ("bpf: Tighten speculative pointer arithmetic
mask"), the aux->alu_limit already holds the final immediate value for
the offset register with the known scalar. Thus, a simple mov of the
immediate to AX register with using AX as the source for the original
instruction is sufficient and possible now in this case.
Reported-by: NPiotr Krysiuk <piotras@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Tested-by: NPiotr Krysiuk <piotras@gmail.com>
Reviewed-by: NPiotr Krysiuk <piotras@gmail.com>
Reviewed-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>

801c6058

30 4月, 2021 1 次提交

seg6: add counters support for SRv6 Behaviors · 94604548

由 Andrea Mayer 提交于 4月 27, 2021

This patch provides counters for SRv6 Behaviors as defined in [1],
section 6. For each SRv6 Behavior instance, counters defined in [1] are:

 - the total number of packets that have been correctly processed;
 - the total amount of traffic in bytes of all packets that have been
   correctly processed;

In addition, this patch introduces a new counter that counts the number of
packets that have NOT been properly processed (i.e. errors) by an SRv6
Behavior instance.

Counters are not only interesting for network monitoring purposes (i.e.
counting the number of packets processed by a given behavior) but they also
provide a simple tool for checking whether a behavior instance is working
as we expect or not.
Counters can be useful for troubleshooting misconfigured SRv6 networks.
Indeed, an SRv6 Behavior can silently drop packets for very different
reasons (i.e. wrong SID configuration, interfaces set with SID addresses,
etc) without any notification/message to the user.

Due to the nature of SRv6 networks, diagnostic tools such as ping and
traceroute may be ineffective: paths used for reaching a given router can
be totally different from the ones followed by probe packets. In addition,
paths are often asymmetrical and this makes it even more difficult to keep
up with the journey of the packets and to understand which behaviors are
actually processing our traffic.

When counters are enabled on an SRv6 Behavior instance, it is possible to
verify if packets are actually processed by such behavior and what is the
outcome of the processing. Therefore, the counters for SRv6 Behaviors offer
an non-invasive observability point which can be leveraged for both traffic
monitoring and troubleshooting purposes.

[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters

Troubleshooting using SRv6 Behavior counters
--------------------------------------------

Let's make a brief example to see how helpful counters can be for SRv6
networks. Let's consider a node where an SRv6 End Behavior receives an SRv6
packet whose Segment Left (SL) is equal to 0. In this case, the End
Behavior (which accepts only packets with SL >= 1) discards the packet and
increases the error counter.
This information can be leveraged by the network operator for
troubleshooting. Indeed, the error counter is telling the user that the
packet:

  (i) arrived at the node;
 (ii) the packet has been taken into account by the SRv6 End behavior;
(iii) but an error has occurred during the processing.

The error (iii) could be caused by different reasons, such as wrong route
settings on the node or due to an invalid SID List carried by the SRv6
packet. Anyway, the error counter is used to exclude that the packet did
not arrive at the node or it has not been processed by the behavior at
all.

Turning on/off counters for SRv6 Behaviors
------------------------------------------

Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters.
This is done through iproute2 which allows the user to create an SRv6
Behavior instance specifying the optional "count" attribute as shown in the
following example:

 $ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0

per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:

 $ ip -s -6 route show 2001:db8::1
 2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Impact of counters for SRv6 Behaviors on performance
====================================================

To determine the performance impact due to the introduction of counters in
the SRv6 Behavior subsystem, we have carried out extensive tests.

We chose to test the throughput achieved by the SRv6 End.DX2 Behavior
because, among all the other behaviors implemented so far, it reaches the
highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a
Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100
bytes.

Three different tests were conducted in order to evaluate the overall
throughput of the SRv6 End.DX2 Behavior in the following scenarios:

 1) vanilla kernel (without the SRv6 Behavior counters patch) and a single
    instance of an SRv6 End.DX2 Behavior;
 2) patched kernel with SRv6 Behavior counters and a single instance of
    an SRv6 End.DX2 Behavior with counters turned off;
 3) patched kernel with SRv6 Behavior counters and a single instance of
    SRv6 End.DX2 Behavior with counters turned on.

All tests were performed on a testbed deployed on the CloudLab facilities
[2], a flexible infrastructure dedicated to scientific research on the
future of Cloud Computing.

Results of tests are shown in the following table:

Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps
Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps
Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps

As can be observed, throughputs achieved in scenarios (2),(3) did not
suffer any observable degradation compared to scenario (1).

Thanks to Jakub Kicinski and David Ahern for their valuable suggestions
and comments provided during the discussion of the proposed RFCs.

[2] https://www.cloudlab.usSigned-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94604548

29 4月, 2021 2 次提交

net: selftest: fix build issue if INET is disabled · 4a52dd8f

由 Oleksij Rempel 提交于 4月 28, 2021

In case ethernet driver is enabled and INET is disabled, selftest will
fail to build.
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Fixes: 3e1e58d6 ("net: add generic selftest support")
Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210428130947.29649-1-o.rempel@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>

4a52dd8f

icmp: standardize naming of RFC 8335 PROBE constants · e542d29c

由 Andreas Roeseler 提交于 4月 27, 2021

The current definitions of constants for PROBE, currently defined only
in the net-next kernel branch, are inconsistent, with
some beginning with ICMP and others with simply EXT. This patch
attempts to standardize the naming conventions of the constants for
PROBE before their release into a stable Kernel, and to update the
relevant definitions in net/ipv4/icmp.c.

Similarly, the definitions for the code field (previously
ICMP_EXT_MAL_QUERY, etc) use the same prefixes as the type field. This
patch adds _CODE_ to the prefix to clarify the distinction of these
constants.
Signed-off-by: NAndreas Roeseler <andreas.a.roeseler@gmail.com>
Acked-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20210427153635.2591-1-andreas.a.roeseler@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e542d29c

28 4月, 2021 11 次提交

Fix misc new gcc warnings · e7c6e405

由 Linus Torvalds 提交于 4月 27, 2021

It seems like Fedora 34 ends up enabling a few new gcc warnings, notably
"-Wstringop-overread" and "-Warray-parameter".

Both of them cause what seem to be valid warnings in the kernel, where
we have array size mismatches in function arguments (that are no longer
just silently converted to a pointer to element, but actually checked).

This fixes most of the trivial ones, by making the function declaration
match the function definition, and in the case of intel_pm.c, removing
the over-specified array size from the argument declaration.

At least one 'stringop-overread' warning remains in the i915 driver, but
that one doesn't have the same obvious trivial fix, and may or may not
actually be indicative of a bug.

[ It was a mistake to upgrade one of my machines to Fedora 34 while
  being busy with the merge window, but if this is the extent of the
  compiler upgrade problems, things are better than usual    - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e7c6e405

bpf: Implement formatted output helpers with bstr_printf · 48cac3f4

由 Florent Revest 提交于 4月 27, 2021

BPF has three formatted output helpers: bpf_trace_printk, bpf_seq_printf
and bpf_snprintf. Their signatures specify that all arguments are
provided from the BPF world as u64s (in an array or as registers). All
of these helpers are currently implemented by calling functions such as
snprintf() whose signatures take a variable number of arguments, then
placed in a va_list by the compiler to call vsnprintf().

"d9c9e4db bpf: Factorize bpf_trace_printk and bpf_seq_printf" introduced
a bpf_printf_prepare function that fills an array of u64 sanitized
arguments with an array of "modifiers" which indicate what the "real"
size of each argument should be (given by the format specifier). The
BPF_CAST_FMT_ARG macro consumes these arrays and casts each argument to
its real size. However, the C promotion rules implicitely cast them all
back to u64s. Therefore, the arguments given to snprintf are u64s and
the va_list constructed by the compiler will use 64 bits for each
argument. On 64 bit machines, this happens to work well because 32 bit
arguments in va_lists need to occupy 64 bits anyway, but on 32 bit
architectures this breaks the layout of the va_list expected by the
called function and mangles values.

In "88a5c690 bpf: fix bpf_trace_printk on 32 bit archs", this problem
had been solved for bpf_trace_printk only with a "horrid workaround"
that emitted multiple calls to trace_printk where each call had
different argument types and generated different va_list layouts. One of
the call would be dynamically chosen at runtime. This was ok with the 3
arguments that bpf_trace_printk takes but bpf_seq_printf and
bpf_snprintf accept up to 12 arguments. Because this approach scales
code exponentially, it is not a viable option anymore.

Because the promotion rules are part of the language and because the
construction of a va_list is an arch-specific ABI, it's best to just
avoid variadic arguments and va_lists altogether. Thankfully the
kernel's snprintf() has an alternative in the form of bstr_printf() that
accepts arguments in a "binary buffer representation". These binary
buffers are currently created by vbin_printf and used in the tracing
subsystem to split the cost of printing into two parts: a fast one that
only dereferences and remembers values, and a slower one, called later,
that does the pretty-printing.

This patch refactors bpf_printf_prepare to construct binary buffers of
arguments consumable by bstr_printf() instead of arrays of arguments and
modifiers. This gets rid of BPF_CAST_FMT_ARG and greatly simplifies the
bpf_printf_prepare usage but there are a few gotchas that change how
bpf_printf_prepare needs to do things.

Currently, bpf_printf_prepare uses a per cpu temporary buffer as a
generic storage for strings and IP addresses. With this refactoring, the
temporary buffers now holds all the arguments in a structured binary
format.

To comply with the format expected by bstr_printf, certain format
specifiers also need to be pre-formatted: %pB and %pi6/%pi4/%pI4/%pI6.
Because vsnprintf subroutines for these specifiers are hard to expose,
we pre-format these arguments with calls to snprintf().
Reported-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NFlorent Revest <revest@chromium.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210427174313.860948-3-revest@chromium.org

48cac3f4

seq_file: Add a seq_bprintf function · 76d6a133

由 Florent Revest 提交于 4月 27, 2021

Similarly to seq_buf_bprintf in lib/seq_buf.c, this function writes a
printf formatted string with arguments provided in a "binary
representation" built by functions such as vbin_printf.
Signed-off-by: NFlorent Revest <revest@chromium.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210427174313.860948-2-revest@chromium.org

76d6a133

net: phy: Add support for microchip SMI0 MDIO bus · 800fcab8

由 Andrew Lunn 提交于 4月 27, 2021

SMI0 is a mangled version of MDIO. The main low level difference is
the MDIO C22 OP code is always 0, not 0x2 or 0x1 for Read/Write. The
read/write information is instead encoded in the PHY address.

Extend the bit-bang code to allow the op code to be overridden, but
default to normal C22 values. Add an extra compatible to the mdio-gpio
driver, and when this compatible is present, set the op codes to 0.

A higher level driver, sitting on top of the basic MDIO bus driver can
then implement the rest of the microchip SMI0 odderties.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NMichael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

800fcab8

net: mscc: ocelot: support PTP Sync one-step timestamping · 39e5308b

由 Yangbo Lu 提交于 4月 27, 2021

Although HWTSTAMP_TX_ONESTEP_SYNC existed in ioctl for hardware timestamp
configuration, the PTP Sync one-step timestamping had never been supported.

This patch is to truely support it.

- ocelot_port_txtstamp_request()
  This function handles tx timestamp request by storing
  ptp_cmd(tx timestamp type) in OCELOT_SKB_CB(skb)->ptp_cmd,
  and additionally for two-step timestamp storing ts_id in
  OCELOT_SKB_CB(clone)->ptp_cmd.

- ocelot_ptp_rew_op()
  During xmit, this function is called to get rew_op (rewriter option) by
  checking skb->cb for tx timestamp request, and configure to transmitting.

Non-onestep-Sync packet with one-step timestamp request falls back to use
two-step timestamp.
Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39e5308b

net: mscc: ocelot: convert to ocelot_port_txtstamp_request() · 682eaad9

由 Yangbo Lu 提交于 4月 27, 2021

Convert to a common ocelot_port_txtstamp_request() for TX timestamp
request handling.
Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

682eaad9

net: dsa: free skb->cb usage in core driver · c4b364ce

由 Yangbo Lu 提交于 4月 27, 2021

Free skb->cb usage in core driver and let device drivers decide to
use or not. The reason having a DSA_SKB_CB(skb)->clone was because
dsa_skb_tx_timestamp() which may set the clone pointer was called
before p->xmit() which would use the clone if any, and the device
driver has no way to initialize the clone pointer.

This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning
of dsa_slave_xmit(). Some new features in the future, like one-step
timestamp may need more bytes of skb->cb to use in
dsa_skb_tx_timestamp(), and p->xmit().
Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4b364ce

net: dsa: no longer clone skb in core driver · 5c5416f5

由 Yangbo Lu 提交于 4月 27, 2021

It was a waste to clone skb directly in dsa_skb_tx_timestamp().
For one-step timestamping, a clone was not needed. For any failure of
port_txtstamp (this may usually happen), the skb clone had to be freed.

So this patch moves skb cloning for tx timestamp out of dsa core, and
let drivers clone skb in port_txtstamp if they really need.
Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
Tested-by: NKurt Kanzenbach <kurt@linutronix.de>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c5416f5

net: dsa: no longer identify PTP packet in core driver · cf536ea3

由 Yangbo Lu 提交于 4月 27, 2021

Move ptp_classify_raw out of dsa core driver for handling tx
timestamp request. Let device drivers do this if they want.
Not all drivers want to limit tx timestamping for only PTP
packet.
Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
Tested-by: NKurt Kanzenbach <kurt@linutronix.de>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf536ea3

net: bridge: mcast: fix broken length + header check for MRDv6 Adv. · 99014088

由 Linus Lüssing 提交于 4月 25, 2021

The IPv6 Multicast Router Advertisements parsing has the following two
issues:

For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD
messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes,
assuming MLDv2 Reports with at least one multicast address entry).
When ipv6_mc_check_mld_msg() tries to parse an Multicast Router
Advertisement its MLD length check will fail - and it will wrongly
return -EINVAL, even if we have a valid MRD Advertisement. With the
returned -EINVAL the bridge code will assume a broken packet and will
wrongly discard it, potentially leading to multicast packet loss towards
multicast routers.

The second issue is the MRD header parsing in
br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header
immediately after the IPv6 header (IPv6 next header type). However
according to RFC4286, section 2 all MRD messages contain a Router Alert
option (just like MLD). So instead there is an IPv6 Hop-by-Hop option
for the Router Alert between the IPv6 and ICMPv6 header, again leading
to the bridge wrongly discarding Multicast Router Advertisements.

To fix these two issues, introduce a new return value -ENODATA to
ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop
option which is not an MLD but potentially an MRD packet. This also
simplifies further parsing in the bridge code, as ipv6_mc_check_mld()
already fully checks the ICMPv6 header and hop-by-hop option.

These issues were found and fixed with the help of the mrdisc tool
(https://github.com/troglobit/mrdisc).

Fixes: 4b3087c7 ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99014088

netfilter: nftables: add catch-all set element support · aaa31047

由 Pablo Neira Ayuso 提交于 4月 27, 2021

This patch extends the set infrastructure to add a special catch-all set
element. If the lookup fails to find an element (or range) in the set,
then the catch-all element is selected. Users can specify a mapping,
expression(s) and timeout to be attached to the catch-all element.

This patch adds a catchall list to the set, this list might contain more
than one single catch-all element (e.g. in case that the catch-all
element is removed and a new one is added in the same transaction).
However, most of the time, there will be either one element or no
elements at all in this list.

The catch-all element is identified via NFT_SET_ELEM_CATCHALL flag and
such special element has no NFTA_SET_ELEM_KEY attribute. There is a new
nft_set_elem_catchall object that stores a reference to the dummy
catch-all element (catchall->elem) whose layout is the same of the set
element type to reuse the existing set element codebase.

The set size does not apply to the catch-all element, users can define a
catch-all element even if the set is full.

The check for valid set element flags hava been updates to report
EOPNOTSUPP in case userspace requests flags that are not supported when
using new userspace nftables and old kernel.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

aaa31047

27 4月, 2021 6 次提交

iov_iter: Four fixes for ITER_XARRAY · 3d14ec1f

由 David Howells 提交于 4月 25, 2021

Fix four things[1] in the patch that adds ITER_XARRAY[2]:

 (1) Remove the address_space struct predeclaration.  This is a holdover
     from when it was ITER_MAPPING.

 (2) Fix _copy_mc_to_iter() so that the xarray segment updates count and
     iov_offset in the iterator before returning.

 (3) Fix iov_iter_alignment() to not loop in the xarray case.  Because the
     middle pages are all whole pages, only the end pages need be
     considered - and this can be reduced to just looking at the start
     position in the xarray and the iteration size.

 (4) Fix iov_iter_advance() to limit the size of the advance to no more
     than the remaining iteration size.
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NJeff Layton <jlayton@redhat.com>
Tested-by: NDave Wysochanski <dwysocha@redhat.com>
Link: https://lore.kernel.org/r/YIVrJT8GwLI0Wlgx@zeniv-ca.linux.org.uk [1]
Link: https://lore.kernel.org/r/161918448151.3145707.11541538916600921083.stgit@warthog.procyon.org.uk [2]

3d14ec1f

net: Fix typo in comment about ancillary data · 63fa73e2

由 Arnaldo Carvalho de Melo 提交于 4月 26, 2021

Ingo sent typo fixes for tools/ and this resulted in a warning when
building the perf/core branch that will be sent upstream in the next
merge window:

Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h

Fix the typo on the kernel file to address this.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63fa73e2

netfilter: allow to turn off xtables compat layer · 47a6959f

由 Florian Westphal 提交于 4月 26, 2021

The compat layer needs to parse untrusted input (the ruleset)
to translate it to a 64bit compatible format.

We had a number of bugs in this department in the past, so allow users
to turn this feature off.

Add CONFIG_NETFILTER_XTABLES_COMPAT kconfig knob and make it default to y
to keep existing behaviour.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

47a6959f

netfilter: nfnetlink: consolidate callback types · 50f2db9e

由 Pablo Neira Ayuso 提交于 4月 23, 2021

Add enum nfnl_callback_type to identify the callback type to provide one
single callback.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

50f2db9e

netfilter: nfnetlink: pass struct nfnl_info to batch callbacks · 7dab8ee3

由 Pablo Neira Ayuso 提交于 4月 23, 2021

Update batch callbacks to use the nfnl_info structure. Rename one
clashing info variable to expr_info.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7dab8ee3

P
netfilter: nfnetlink: pass struct nfnl_info to rcu callbacks · 797d4980
由 Pablo Neira Ayuso 提交于 4月 23, 2021
```
Update rcu callbacks to use the nfnl_info structure.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
```
797d4980

26 4月, 2021 19 次提交

netfilter: nfnetlink: add struct nfnl_info and pass it to callbacks · a6555365

由 Pablo Neira Ayuso 提交于 4月 23, 2021

Add a new structure to reduce callback footprint and to facilite
extensions of the nfnetlink callback interface in the future.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a6555365

netfilter: nftables: add nft_pernet() helper function · d59d2f82

由 Pablo Neira Ayuso 提交于 4月 23, 2021

Consolidate call to net_generic(net, nf_tables_net_id) in this
wrapper function.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d59d2f82

macvlan: Add nodst option to macvlan type source · 427f0c8c

由 Jethro Beekman 提交于 4月 25, 2021

The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.

This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.

v2: netdev wants non-standard line length
Signed-off-by: NJethro Beekman <kernel@jbeekman.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

427f0c8c

net: ethernet: ixp4xx: Support device tree probing · 95aafe91

由 Linus Walleij 提交于 4月 25, 2021

This adds device tree probing to the IXP4xx ethernet
driver.

Add a platform data bool to tell us whether to
register an MDIO bus for the device or not, as well
as the corresponding NPE.

We need to drop the memory region request as part of
this since the OF core will request the memory for the
device.

Cc: Zoltan HERPAI <wigyori@uid0.hu>
Cc: Raylynn Knight <rayknight@me.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95aafe91

netfilter: remove all xt_table anchors from struct net · f7163c48

由 Florian Westphal 提交于 4月 21, 2021

No longer needed, table pointer arg is now passed via netfilter core.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f7163c48

netfilter: ip6_tables: pass table pointer via nf_hook_ops · ee177a54

由 Florian Westphal 提交于 4月 21, 2021

Same patch as the ip_tables one: removal of all accesses to ip6_tables
xt_table pointers.  After this patch the struct net xt_table anchors
can be removed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ee177a54

netfilter: arp_tables: pass table pointer via nf_hook_ops · f9006acc

由 Florian Westphal 提交于 4月 21, 2021

Same change as previous patch.  Only difference:
no need to handle NULL template_ops parameter, the only caller
(arptable_filter) always passes non-NULL argument.

This removes all remaining accesses to net->ipv4.arptable_filter.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f9006acc

netfilter: ip_tables: pass table pointer via nf_hook_ops · ae689334

由 Florian Westphal 提交于 4月 21, 2021

iptable_x modules rely on 'struct net' to contain a pointer to the
table that should be evaluated.

In order to remove these pointers from struct net, pass them via
the 'priv' pointer in a similar fashion as nf_tables passes the
rule data.

To do that, duplicate the nf_hook_info array passed in from the
iptable_x modules, update the ops->priv pointers of the copy to
refer to the table and then change the hookfn implementations to
just pass the 'priv' argument to the traverser.

After this patch, the xt_table pointers can already be removed
from struct net.

However, changes to struct net result in re-compile of the entire
network stack, so do the removal after arptables and ip6tables
have been converted as well.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ae689334

netfilter: arptables: unregister the tables by name · 4d705399

由 Florian Westphal 提交于 4月 21, 2021

and again, this time for arptables.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4d705399

netfilter: ip6tables: unregister the tables by name · 6c071754

由 Florian Westphal 提交于 4月 21, 2021

Same as the previous patch, but for ip6tables.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6c071754

netfilter: iptables: unregister the tables by name · 20a9df33

由 Florian Westphal 提交于 4月 21, 2021

xtables stores the xt_table structs in the struct net.  This isn't
needed anymore, the structures could be passed via the netfilter hook
'private' pointer to the hook functions, which would allow us to remove
those pointers from struct net.

As a first step, reduce the number of accesses to the
net->ipv4.ip6table_{raw,filter,...} pointers.
This allows the tables to get unregistered by name instead of having to
pass the raw address.

The xt_table structure cane looked up by name+address family instead.

This patch is useless as-is (the backends still have the raw pointer
address), but it lowers the bar to remove those.

It also allows to put the 'was table registered in the first place' check
into ip_tables.c rather than have it in each table sub module.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

20a9df33

netfilter: x_tables: add xt_find_table · 1ef4d6d1

由 Florian Westphal 提交于 4月 21, 2021

This will be used to obtain the xt_table struct given address family and
table name.

Followup patches will reduce the number of direct accesses to the xt_table
structures via net->ipv{4,6}.ip(6)table_{nat,mangle,...} pointers, then
remove them.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1ef4d6d1

netfilter: x_tables: remove ipt_unregister_table · 7716bf09

由 Florian Westphal 提交于 4月 21, 2021

Its the same function as ipt_unregister_table_exit.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7716bf09

netfilter: ebtables: remove the 3 ebtables pointers from struct net · 4c95e072

由 Florian Westphal 提交于 4月 21, 2021

ebtables stores the table internal data (what gets passed to the
ebt_do_table() interpreter) in struct net.

nftables keeps the internal interpreter format in pernet lists
and passes it via the netfilter core infrastructure (priv pointer).

Do the same for ebtables: the nf_hook_ops are duplicated via kmemdup,
then the ops->priv pointer is set to the table that is being registered.

After that, the netfilter core passes this table info to the hookfn.

This allows to remove the pointers from struct net.

Same pattern can be applied to ip/ip6/arptables.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4c95e072

netfilter: disable defrag once its no longer needed · de8c1211

由 Florian Westphal 提交于 4月 21, 2021

When I changed defrag hooks to no longer get registered by default I
intentionally made it so that registration can only be un-done by unloading
the nf_defrag_ipv4/6 module.

In hindsight this was too conservative; there is no reason to keep defrag
on while there is no feature dependency anymore.

Moreover, this won't work if user isn't allowed to remove nf_defrag module.

This adds the disable() functions for both ipv4 and ipv6 and calls them
from conntrack, TPROXY and the xtables socket module.

ipvs isn't converted here, it will behave as before this patch and
will need module removal.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

de8c1211

netfilter: nft_socket: add support for cgroupsv2 · e0bb96db

由 Pablo Neira Ayuso 提交于 4月 21, 2021

Allow to match on the cgroupsv2 id from ancestor level.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e0bb96db

netfilter: nat: move nf_xfrm_me_harder to where it is used · 885e8c68

由 Florian Westphal 提交于 4月 19, 2021

remove the export and make it static.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

885e8c68

io_uring: add full-fledged dynamic buffers support · 634d00df

由 Pavel Begunkov 提交于 4月 25, 2021

Hook buffers into all rsrc infrastructure, including tagging and
updates.
Suggested-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/119ed51d68a491dae87eb55fb467a47870c86aad.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

634d00df

io_uring: add generic rsrc update with tags · c3bdad02

由 Pavel Begunkov 提交于 4月 25, 2021

Add IORING_REGISTER_RSRC_UPDATE, which also supports passing in rsrc
tags. Implement it for registered files.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d4dc66df204212f64835ffca2c4eb5e8363f2f05.1619356238.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c3bdad02

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功