提交 · cee5f20fece32cd1722230cb05333f39db860698 · openeuler / Kernel

27 1月, 2020 6 次提交

devlink: add macro for "fw.roce" · 41c0d917

由 Vasundhara Volam 提交于 1月 27, 2020

Add definition and documentation for the new generic info "fw.roce".

v2: Remove board.nvm_cfg since fw.psid is similar.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41c0d917

udp: Support UDP fraglist GRO/GSO. · 9fd1ff5d

由 Steffen Klassert 提交于 1月 25, 2020

This patch extends UDP GRO to support fraglist GRO/GSO
by using the previously introduced infrastructure.
If the feature is enabled, all UDP packets are going to
fraglist GRO (local input and forward).

After validating the csum,  we mark ip_summed as
CHECKSUM_UNNECESSARY for fraglist GRO packets to
make sure that the csum is not touched.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fd1ff5d

net_sched: fix ops->bind_class() implementations · 2e24cd75

由 Cong Wang 提交于 1月 23, 2020

The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().

In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.

Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.

Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class")
Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e24cd75

nf_tables: Add set type for arbitrary concatenation of ranges · 3c4287f6

由 Stefano Brivio 提交于 1月 22, 2020

This new set type allows for intervals in concatenated fields,
which are expressed in the usual way, that is, simple byte
concatenation with padding to 32 bits for single fields, and
given as ranges by specifying start and end elements containing,
each, the full concatenation of start and end values for the
single fields.

Ranges are expanded to composing netmasks, for each field: these
are inserted as rules in per-field lookup tables. Bits to be
classified are divided in 4-bit groups, and for each group, the
lookup table contains 4^2 buckets, representing all the possible
values of a bit group. This approach was inspired by the Grouper
algorithm:
	http://www.cse.usf.edu/~ligatti/projects/grouper/

Matching is performed by a sequence of AND operations between
bucket values, with buckets selected according to the value of
packet bits, for each group. The result of this sequence tells
us which rules matched for a given field.

In order to concatenate several ranged fields, per-field rules
are mapped using mapping arrays, one per field, that specify
which rules should be considered while matching the next field.
The mapping array for the last field contains a reference to
the element originally inserted.

The notes in nft_set_pipapo.c cover the algorithm in deeper
detail.

A pure hash-based approach is of no use here, as ranges need
to be classified. An implementation based on "proxying" the
existing red-black tree set type, creating a tree for each
field, was considered, but deemed impractical due to the fact
that elements would need to be shared between trees, at least
as long as we want to keep UAPI changes to a minimum.

A stand-alone implementation of this algorithm is available at:
	https://pipapo.lameexcu.se
together with notes about possible future optimisations
(in pipapo.c).

This algorithm was designed with data locality in mind, and can
be highly optimised for SIMD instruction sets, as the bulk of
the matching work is done with repetitive, simple bitwise
operations.

At this point, without further optimisations, nft_concat_range.sh
reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
L2$):

TEST: performance
  net,port                                                      [ OK ]
    baseline (drop from netdev hook):              10190076pps
    baseline hash (non-ranged entries):             6179564pps
    baseline rbtree (match on first field only):    2950341pps
    set with  1000 full, ranged entries:            2304165pps
  port,net                                                      [ OK ]
    baseline (drop from netdev hook):              10143615pps
    baseline hash (non-ranged entries):             6135776pps
    baseline rbtree (match on first field only):    4311934pps
    set with   100 full, ranged entries:            4131471pps
  net6,port                                                     [ OK ]
    baseline (drop from netdev hook):               9730404pps
    baseline hash (non-ranged entries):             4809557pps
    baseline rbtree (match on first field only):    1501699pps
    set with  1000 full, ranged entries:            1092557pps
  port,proto                                                    [ OK ]
    baseline (drop from netdev hook):              10812426pps
    baseline hash (non-ranged entries):             6929353pps
    baseline rbtree (match on first field only):    3027105pps
    set with 30000 full, ranged entries:             284147pps
  net6,port,mac                                                 [ OK ]
    baseline (drop from netdev hook):               9660114pps
    baseline hash (non-ranged entries):             3778877pps
    baseline rbtree (match on first field only):    3179379pps
    set with    10 full, ranged entries:            2082880pps
  net6,port,mac,proto                                           [ OK ]
    baseline (drop from netdev hook):               9718324pps
    baseline hash (non-ranged entries):             3799021pps
    baseline rbtree (match on first field only):    1506689pps
    set with  1000 full, ranged entries:             783810pps
  net,mac                                                       [ OK ]
    baseline (drop from netdev hook):              10190029pps
    baseline hash (non-ranged entries):             5172218pps
    baseline rbtree (match on first field only):    2946863pps
    set with  1000 full, ranged entries:            1279122pps

v4:
 - fix build for 32-bit architectures: 64-bit division needs
   div_u64() (kbuild test robot <lkp@intel.com>)
v3:
 - rework interface for field length specification,
   NFT_SET_SUBKEY disappears and information is stored in
   description
 - remove scratch area to store closing element of ranges,
   as elements now come with an actual attribute to specify
   the upper range limit (Pablo Neira Ayuso)
 - also remove pointer to 'start' element from mapping table,
   closing key is now accessible via extension data
 - use bytes right away instead of bits for field lengths,
   this way we can also double the inner loop of the lookup
   function to take care of upper and lower bits in a single
   iteration (minor performance improvement)
 - make it clearer that set operations are actually atomic
   API-wise, but we can't e.g. implement flush() as one-shot
   action
 - fix type for 'dup' in nft_pipapo_insert(), check for
   duplicates only in the next generation, and in general take
   care of differentiating generation mask cases depending on
   the operation (Pablo Neira Ayuso)
 - report C implementation matching rate in commit message, so
   that AVX2 implementation can be compared (Pablo Neira Ayuso)
v2:
 - protect access to scratch maps in nft_pipapo_lookup() with
   local_bh_disable/enable() (Florian Westphal)
 - drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
   already implied (Florian Westphal)
 - explain why partial allocation failures don't need handling
   in pipapo_realloc_scratch(), rename 'm' to clone and update
   related kerneldoc to make it clear we're not operating on
   the live copy (Florian Westphal)
 - add expicit check for priv->start_elem in
   nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
   with a NULL start element, and also zero it out in every
   operation that might make it invalid, so that insertion
   doesn't proceed with an invalid element (Florian Westphal)
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3c4287f6

netfilter: nf_tables: Support for sets with multiple ranged fields · f3a2181e

由 Stefano Brivio 提交于 1月 22, 2020

Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.

This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc->field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.

In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.

While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.

For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:

  NFTA_SET_ELEM_KEY:            192.0.2.0 . 22
  NFTA_SET_ELEM_KEY_END:        192.0.2.42 . 25

and NFTA_SET_DESC_CONCAT attribute would contain:

  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		4
  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN:		2

v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f3a2181e

netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute · 7b225d0b

由 Pablo Neira Ayuso 提交于 1月 22, 2020

Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.

This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; add corresponding
  nft_set_ext_type for new key; rebase]
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7b225d0b

25 1月, 2020 3 次提交

net: sched: Make TBF Qdisc offloadable · ef6aadcc

由 Petr Machata 提交于 1月 24, 2020

Invoke ndo_setup_tc as appropriate to signal init / replacement, destroying
and dumping of TBF Qdisc.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef6aadcc

mptcp: do not inherit inet proto ops · e42f1ac6

由 Florian Westphal 提交于 1月 24, 2020

We need to initialise the struct ourselves, else we expose tcp-specific
callbacks such as tcp_splice_read which will then trigger splat because
the socket is an mptcp one:

BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478

CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3
Call Trace:
 tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
 tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612
 tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674
 tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791
 do_splice+0x1259/0x1560 fs/splice.c:1205

To prevent build error with ipv6, add the recv/sendmsg function
declaration to ipv6.h.  The functions are already accessible "thanks"
to retpoline related work, but they are currently only made visible
by socket.c specific INDIRECT_CALLABLE macros.
Reported-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e42f1ac6

netfilter: nf_tables: autoload modules from the abort path · eb014de4

由 Pablo Neira Ayuso 提交于 1月 21, 2020

This patch introduces a list of pending module requests. This new module
list is composed of nft_module_request objects that contain the module
name and one status field that tells if the module has been already
loaded (the 'done' field).

In the first pass, from the preparation phase, the netlink command finds
that a module is missing on this list. Then, a module request is
allocated and added to this list and nft_request_module() returns
-EAGAIN. This triggers the abort path with the autoload parameter set on
from nfnetlink, request_module() is called and the module request enters
the 'done' state. Since the mutex is released when loading modules from
the abort phase, the module list is zapped so this is iteration occurs
over a local list. Therefore, the request_module() calls happen when
object lists are in consistent state (after fulling aborting the
transaction) and the commit list is empty.

On the second pass, the netlink command will find that it already tried
to load the module, so it does not request it again and
nft_request_module() returns 0. Then, there is a look up to find the
object that the command was missing. If the module was successfully
loaded, the command proceeds normally since it finds the missing object
in place, otherwise -ENOENT is reported to userspace.

This patch also updates nfnetlink to include the reason to enter the
abort phase, which is required for this new autoload module rationale.

Fixes: ec7470b8 ("netfilter: nf_tables: store transaction list locally while requesting module")
Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

eb014de4

24 1月, 2020 6 次提交

mptcp: parse and emit MP_CAPABLE option according to v1 spec · cc7972ea

由 Christoph Paasch 提交于 1月 21, 2020

This implements MP_CAPABLE options parsing and writing according
to RFC 6824 bis / RFC 8684: MPTCP v1.

Local key is sent on syn/ack, and both keys are sent on 3rd ack.
MP_CAPABLE messages len are updated accordingly. We need the skbuff to
correctly emit the above, so we push the skbuff struct as an argument
all the way from tcp code to the relevant mptcp callbacks.

When processing incoming MP_CAPABLE + data, build a full blown DSS-like
map info, to simplify later processing.  On child socket creation, we
need to record the remote key, if available.
Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc7972ea

mptcp: Implement MPTCP receive path · 648ef4b8

由 Mat Martineau 提交于 1月 21, 2020

Parses incoming DSS options and populates outgoing MPTCP ACK
fields. MPTCP fields are parsed from the TCP option header and placed in
an skb extension, allowing the upper MPTCP layer to access MPTCP
options after the skb has gone through the TCP stack.

The subflow implements its own data_ready() ops, which ensures that the
pending data is in sequence - according to MPTCP seq number - dropping
out-of-seq skbs. The DATA_READY bit flag is set if this is the case.
This allows the MPTCP socket layer to determine if more data is
available without having to consult the individual subflows.

It additionally validates the current mapping and propagates EoF events
to the connection socket.
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Co-developed-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Co-developed-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

648ef4b8

mptcp: Write MPTCP DSS headers to outgoing data packets · 6d0060f6

由 Mat Martineau 提交于 1月 21, 2020

Per-packet metadata required to write the MPTCP DSS option is written to
the skb_ext area. One write to the socket may contain more than one
packet of data, which is copied to page fragments and mapped in to MPTCP
DSS segments with size determined by the available page fragments and
the maximum mapping length allowed by the MPTCP specification. If
do_tcp_sendpages() splits a DSS segment in to multiple skbs, that's ok -
the later skbs can either have duplicated DSS mapping information or
none at all, and the receiver can handle that.

The current implementation uses the subflow frag cache and tcp
sendpages to avoid excessive code duplication. More work is required to
ensure that it works correctly under memory pressure and to support
MPTCP-level retransmissions.

The MPTCP DSS checksum is not yet implemented.
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Co-developed-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d0060f6

mptcp: Handle MP_CAPABLE options for outgoing connections · cec37a6e

由 Peter Krystad 提交于 1月 21, 2020

Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request,
to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to
the final ACK of the three-way handshake.

Use the .sk_rx_dst_set() handler in the subflow proto to capture when the
responding SYN-ACK is received and notify the MPTCP connection layer.
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cec37a6e

mptcp: Handle MPTCP TCP options · eda7acdd

由 Peter Krystad 提交于 1月 21, 2020

Add hooks to parse and format the MP_CAPABLE option.

This option is handled according to MPTCP version 0 (RFC6824).
MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in
coordination with related code changes.
Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Co-developed-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eda7acdd

mptcp: Add MPTCP socket stubs · f870fa0b

由 Mat Martineau 提交于 1月 21, 2020

Implements the infrastructure for MPTCP sockets.

MPTCP sockets open one in-kernel TCP socket per subflow. These subflow
sockets are only managed by the MPTCP socket that owns them and are not
visible from userspace. This commit allows a userspace program to open
an MPTCP socket with:

  sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);

The resulting socket is simply a wrapper around a single regular TCP
socket, without any of the MPTCP protocol implemented over the wire.
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Co-developed-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f870fa0b

23 1月, 2020 8 次提交

net: sched: add Flow Queue PIE packet scheduler · ec97ecf1

由 Mohit P. Tahiliani 提交于 1月 22, 2020

Principles:
  - Packets are classified on flows.
  - This is a Stochastic model (as we use a hash, several flows might
                                be hashed to the same slot)
  - Each flow has a PIE managed queue.
  - Flows are linked onto two (Round Robin) lists,
    so that new flows have priority on old ones.
  - For a given flow, packets are not reordered.
  - Drops during enqueue only.
  - ECN capability is off by default.
  - ECN threshold (if ECN is enabled) is at 10% by default.
  - Uses timestamps to calculate queue delay by default.

Usage:
tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ]
                    [ target TIME ] [ tupdate TIME ]
                    [ alpha NUMBER ] [ beta NUMBER ]
                    [ quantum BYTES ] [ memory_limit BYTES ]
                    [ ecnprob PERCENTAGE ] [ [no]ecn ]
                    [ [no]bytemode ] [ [no_]dq_rate_estimator ]

defaults:
  limit: 10240 packets, flows: 1024
  target: 15 ms, tupdate: 15 ms (in jiffies)
  alpha: 1/8, beta : 5/4
  quantum: device MTU, memory_limit: 32 Mb
  ecnprob: 10%, ecn: off
  bytemode: off, dq_rate_estimator: off
Signed-off-by: NMohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: NSachin D. Patil <sdp.sachin@gmail.com>
Signed-off-by: NV. Saicharan <vsaicharan1998@gmail.com>
Signed-off-by: NMohit Bhasi <mohitbhasi1998@gmail.com>
Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
Signed-off-by: NGautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec97ecf1

net: sched: pie: export symbols to be reused by FQ-PIE · 5205ea00

由 Mohit P. Tahiliani 提交于 1月 22, 2020

This patch makes the drop_early(), calculate_probability() and
pie_process_dequeue() functions generic enough to be used by
both PIE and FQ-PIE (to be added in a future commit). The major
change here is in the way the functions take in arguments. This
patch exports these functions and makes FQ-PIE dependent on
sch_pie.
Signed-off-by: NMohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
Signed-off-by: NGautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5205ea00

pie: improve comments and commenting style · b42a3d7c

由 Mohit P. Tahiliani 提交于 1月 22, 2020

Improve the comments along with the commenting style used to
describe the members of the structures and their initial values
in the init functions.
Signed-off-by: NMohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
Signed-off-by: NGautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b42a3d7c

pie: rearrange structure members and their initializations · 2dfb1952

由 Mohit P. Tahiliani 提交于 1月 22, 2020

Rearrange the members of the structure such that closely
referenced members appear together and/or fit in the same
cacheline. Also, change the order of their initializations to
match the order in which they appear in the structure.
Signed-off-by: NMohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
Signed-off-by: NGautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2dfb1952

pie: use u8 instead of bool in pie_vars · 1dbfc5e0

由 Mohit P. Tahiliani 提交于 1月 22, 2020

Linux best practice recommends using u8 for true/false values in
structures.
Signed-off-by: NMohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
Signed-off-by: NGautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1dbfc5e0

pie: rearrange macros in order of length · cf4eeee5

由 Mohit P. Tahiliani 提交于 1月 22, 2020

Rearrange macros in order of length and align the values to
improve readability.
Signed-off-by: NMohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
Signed-off-by: NGautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf4eeee5

pie: use U64_MAX to denote (2^64 - 1) · 805a5a23

由 Mohit P. Tahiliani 提交于 1月 22, 2020

Use the U64_MAX macro to denote the constant (2^64 - 1).
Signed-off-by: NMohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
Signed-off-by: NGautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

805a5a23

net: sched: pie: move common code to pie.h · 84bf557f

由 Mohit P. Tahiliani 提交于 1月 22, 2020

This patch moves macros, structures and small functions common
to PIE and FQ-PIE (to be added in a future commit) from the file
net/sched/sch_pie.c to the header file include/net/pie.h.
All the moved functions are made inline.
Signed-off-by: NMohit P. Tahiliani <tahiliani@nitk.edu.in>
Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
Signed-off-by: NGautam Ramakrishnan <gautamramk@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84bf557f

22 1月, 2020 1 次提交

xsk, net: Make sock_def_readable() have external linkage · 43a825af

由 Björn Töpel 提交于 1月 20, 2020

XDP sockets use the default implementation of struct sock's
sk_data_ready callback, which is sock_def_readable(). This function
is called in the XDP socket fast-path, and involves a retpoline. By
letting sock_def_readable() have external linkage, and being called
directly, the retpoline can be avoided.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200120092917.13949-1-bjorn.topel@gmail.com

43a825af

19 1月, 2020 3 次提交

devlink: Add overlay source MAC is multicast trap · c3cae491

由 Amit Cohen 提交于 1月 19, 2020

Add packet trap that can report NVE packets that the device decided to
drop because their overlay source MAC is multicast.
Signed-off-by: NAmit Cohen <amitc@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3cae491

devlink: Add tunnel generic packet traps · 13c056ec

由 Amit Cohen 提交于 1月 19, 2020

Add packet traps that can report packets that were dropped during tunnel
decapsulation.
Signed-off-by: NAmit Cohen <amitc@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13c056ec

devlink: Add non-routable packet trap · 95f0ead8

由 Amit Cohen 提交于 1月 19, 2020

Add packet trap that can report packets that reached the router, but are
non-routable. For example, IGMP queries can be flooded by the device in
layer 2 and reach the router. Such packets should not be routed and
instead dropped.
Signed-off-by: NAmit Cohen <amitc@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95f0ead8

17 1月, 2020 1 次提交

netns: Constify exported functions · 56f200c7

由 Guillaume Nault 提交于 1月 16, 2020

Mark function parameters as 'const' where possible.
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56f200c7

16 1月, 2020 8 次提交

netfilter: flowtable: refresh flow if hardware offload fails · f698fe40

由 Pablo Neira Ayuso 提交于 1月 06, 2020

If nf_flow_offload_add() fails to add the flow to hardware, then the
NF_FLOW_HW_REFRESH flag bit is set and the flow remains in the flowtable
software path.

If flowtable hardware offload is enabled, this patch enqueues a new
request to offload this flow to hardware.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f698fe40

netfilter: flowtable: add nf_flowtable_hw_offload() helper function · a5449cdc

由 Pablo Neira Ayuso 提交于 1月 07, 2020

This function checks for the NF_FLOWTABLE_HW_OFFLOAD flag, meaning that
the flowtable hardware offload is enabled.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a5449cdc

netfilter: flowtable: use atomic bitwise operations for flow flags · 355a8b13

由 Pablo Neira Ayuso 提交于 1月 05, 2020

Originally, all flow flag bits were set on only from the workqueue. With
the introduction of the flow teardown state and hardware offload this is
no longer true. Let's be safe and use atomic bitwise operation to
operation with flow flags.

Fixes: 59c466dd ("netfilter: nf_flow_table: add a new flow state for tearing down offloading")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

355a8b13

netfilter: flowtable: remove dying bit, use teardown bit instead · 445db8d0

由 Pablo Neira Ayuso 提交于 1月 05, 2020

The dying bit removes the conntrack entry if the netdev that owns this
flow is going down. Instead, use the teardown mechanism to push back the
flow to conntrack to let the classic software path decide what to do
with it.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

445db8d0

bpf: Sockmap/tls, push write_space updates through ulp updates · 33bfe20d

由 John Fastabend 提交于 1月 11, 2020

When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
don't push the write_space callback up and instead simply overwrite the
op with the psock stored previous op. This may or may not be correct so
to ensure we don't overwrite the TLS write space hook pass this field to
the ULP and have it fixup the ctx.

This completes a previous fix that pushed the ops through to the ULP
but at the time missed doing this for write_space, presumably because
write_space TLS hook was added around the same time.

Fixes: 95fa1454 ("bpf: sockmap/tls, close can race with map free")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com

33bfe20d

Bluetooth: monitor: Add support for ISO packets · f9a619db

由 Luiz Augusto von Dentz 提交于 1月 15, 2020

This enables passing ISO packets to the monitor socket.
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

f9a619db

Bluetooth: Add definitions for CIS connections · 4de0fc59

由 Luiz Augusto von Dentz 提交于 1月 15, 2020

These adds the HCI definitions for handling CIS connections along with
ISO data packets.
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

4de0fc59

Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS. · 600a8749

由 Alain Michaud 提交于 1月 07, 2020

MGMT command is added to receive the list of blocked keys from
user-space.

The list is used to:
1) Block keys from being distributed by the device during
   the ke distribution phase of SMP.
2) Filter out any keys that were previously saved so
   they are no longer used.
Signed-off-by: NAlain Michaud <alainm@chromium.org>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

600a8749

15 1月, 2020 4 次提交

cfg80211: Fix radar event during another phy CAC · 26ec17a1

由 Orr Mazor 提交于 12月 22, 2019

In case a radar event of CAC_FINISHED or RADAR_DETECTED
happens during another phy is during CAC we might need
to cancel that CAC.

If we got a radar in a channel that another phy is now
doing CAC on then the CAC should be canceled there.

If, for example, 2 phys doing CAC on the same channels,
or on comptable channels, once on of them will finish his
CAC the other might need to cancel his CAC, since it is no
longer relevant.

To fix that the commit adds an callback and implement it in
mac80211 to end CAC.
This commit also adds a call to said callback if after a radar
event we see the CAC is no longer relevant
Signed-off-by: NOrr Mazor <Orr.Mazor@tandemg.com>
Reviewed-by: NSergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Link: https://lore.kernel.org/r/20191222145449.15792-1-Orr.Mazor@tandemg.com
[slightly reformat/reword commit message]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

26ec17a1

ipv6: Add "offload" and "trap" indications to routes · bb3c4ab9

由 Ido Schimmel 提交于 1月 14, 2020

In a similar fashion to previous patch, add "offload" and "trap"
indication to IPv6 routes.

This is done by using two unused bits in 'struct fib6_info' to hold
these indications. Capable drivers are expected to set these when
processing the various in-kernel route notifications.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb3c4ab9

ipv4: Add "offload" and "trap" indications to routes · 90b93f1b

由 Ido Schimmel 提交于 1月 14, 2020

When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.

While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.

This patch adds an "offload" and "trap" indications to IPv4 routes, so
that users will have better visibility into the offload process.

'struct fib_alias' is extended with two new fields that indicate if the
route resides in hardware or not and if it is offloading traffic from
the kernel or trapping packets to it. Note that the new fields are added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].

Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
route's key in order to set the flags.

The indications are dumped to user space via a new flags (i.e.,
'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
ancillary header.

v2:
* Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()

[1]
struct fib_alias {
        struct hlist_node  fa_list;                      /*     0    16 */
        struct fib_info *          fa_info;              /*    16     8 */
        u8                         fa_tos;               /*    24     1 */
        u8                         fa_type;              /*    25     1 */
        u8                         fa_state;             /*    26     1 */
        u8                         fa_slen;              /*    27     1 */
        u32                        tb_id;                /*    28     4 */
        s16                        fa_default;           /*    32     2 */
        u8                         offload:1;            /*    34: 0  1 */
        u8                         trap:1;               /*    34: 1  1 */
        u8                         unused:6;             /*    34: 2  1 */

        /* XXX 5 bytes hole, try to pack */

        struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */

        /* size: 56, cachelines: 1, members: 12 */
        /* sum members: 50, holes: 1, sum holes: 5 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90b93f1b

ipv4: Encapsulate function arguments in a struct · 1e301fd0

由 Ido Schimmel 提交于 1月 14, 2020

fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages
using the passed arguments. Currently, the function takes 11 arguments,
6 of which are attributes of the route being dumped (e.g., prefix, TOS).

The next patch will need the function to also dump to user space an
indication if the route is present in hardware or not. Instead of
passing yet another argument, change the function to take a struct
containing the different route attributes.

v2:
* Name last argument of fib_dump_info()
* Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could
  later be passed to fib_alias_hw_flags_set()
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e301fd0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功