提交 · 8fa9137180b2fd8482b671f7e0bd8cf7538cbf59 · openeuler / Kernel

27 3月, 2020 1 次提交

taprio: do not use BIT() in TCA_TAPRIO_ATTR_FLAG_* definitions · 673040c3

由 Eugene Syromiatnikov 提交于 3月 24, 2020

BIT() macro definition is internal to the Linux kernel and is not
to be used in UAPI headers; replace its usage with the _BITUL() macro
that is already used elsewhere in the header.

Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
Acked-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

673040c3

24 3月, 2020 1 次提交

net: sched: rename more stats_types · 0dfb2d82

由 Jakub Kicinski 提交于 3月 19, 2020

Commit 53eca1f3 ("net: rename flow_action_hw_stats_types* ->
flow_action_hw_stats*") renamed just the flow action types and
helpers. For consistency rename variables, enums, struct members
and UAPI too (note that this UAPI was not in any official release,
yet).
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dfb2d82

20 3月, 2020 9 次提交

net: bridge: vlan options: move the tunnel command to the nested attribute · c443758b

由 Nikolay Aleksandrov 提交于 3月 20, 2020

Now that we have a nested tunnel info attribute we can add a separate
one for the tunnel command and require it explicitly from user-space. It
must be one of RTM_SETLINK/DELLINK. Only RTM_SETLINK requires a valid
tunnel id, DELLINK just removes it if it was set before. This allows us
to have all tunnel attributes and control in one place, thus removing
the need for an outside vlan info flag.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c443758b

net: bridge: vlan options: nest the tunnel id into a tunnel info attribute · fa388f29

由 Nikolay Aleksandrov 提交于 3月 20, 2020

While discussing the new API, Roopa mentioned that we'll be adding more
tunnel attributes and options in the future, so it's better to make it a
nested attribute, since this is still in net-next we can easily change it
and nest the tunnel id attribute under BRIDGE_VLANDB_ENTRY_TUNNEL_INFO.

The new format is:
 [BRIDGE_VLANDB_ENTRY]
     [BRIDGE_VLANDB_ENTRY_TUNNEL_INFO]
         [BRIDGE_VLANDB_TINFO_ID]

Any new tunnel attributes can be nested under
BRIDGE_VLANDB_ENTRY_TUNNEL_INFO.
Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa388f29

cfg80211: Configure PMK lifetime and reauth threshold for PMKSA entries · 7fc82af8

由 Veerendranath Jakkam 提交于 3月 13, 2020

Drivers that trigger roaming need to know the lifetime of the configured
PMKSA for deciding whether to trigger the full or PMKSA cache based
authentication. The configured PMKSA is invalid after the PMK lifetime
has expired and must not be used after that and the STA needs to
disassociate if the PMK expires. Hence the STA is expected to refresh
the PMK with a full authentication before this happens (e.g., when
reassociating to a new BSS the next time or by performing EAPOL
reauthentication depending on the AKM) to avoid unnecessary
disconnection.

The PMK reauthentication threshold is the percentage of the PMK lifetime
value and indicates to the driver to trigger a full authentication roam
(without PMKSA caching) after the reauthentication threshold time, but
before the PMK timer has expired. Authentication methods like SAE need
to be able to generate a new PMKSA entry without having to force a
disconnection after this threshold timeout. If no roaming occurs between
the reauthentication threshold time and PMK lifetime expiration,
disassociation is still forced.

The new attributes for providing these values correspond to the dot11
MIB variables dot11RSNAConfigPMKLifetime and
dot11RSNAConfigPMKReauthThreshold.

This type of functionality is already available in cases where user
space component is in control of roaming. This commit extends that same
capability into cases where parts or all of this functionality is
offloaded to the driver.
Signed-off-by: NVeerendranath Jakkam <vjakkam@codeaurora.org>
Signed-off-by: NJouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200312235903.18462-1-jouni@codeaurora.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

7fc82af8

cfg80211: Add support for userspace to reset stations in IBSS mode · edafcf42

由 Nicolas Cavallari 提交于 3月 05, 2020

Sometimes, userspace is able to detect that a peer silently lost its
state (like, if the peer reboots). wpa_supplicant does this for IBSS-RSN
by registering for auth/deauth frames, but when it detects this, it is
only able to remove the encryption keys of the peer and close its port.

However, the kernel also hold other state about the station, such as BA
sessions, probe response parameters and the like.  They also need to be
resetted correctly.

This patch adds the NL80211_EXT_FEATURE_DEL_IBSS_STA feature flag
indicating the driver accepts deleting stations in IBSS mode, which
should send a deauth and reset the state of the station, just like in
mesh point mode.
Signed-off-by: NNicolas Cavallari <nicolas.cavallari@green-communications.fr>
Link: https://lore.kernel.org/r/20200305135754.12094-1-cavallar@lri.fr
[preserve -EINVAL return]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

edafcf42

nl80211: add PROTECTED_TWT nl80211 extended feature · 0c138a5c

由 Shaul Triebitz 提交于 1月 31, 2020

Add API for telling whether the driver supports protected TWT.
The protected_twt capability in the RSNXE will be based on this.
Signed-off-by: NShaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200131111300.891737-23-luca@coelho.fiSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

0c138a5c

nl80211/cfg80211: add support for non EDCA based ranging measurement · efb5520d

由 Avraham Stern 提交于 1月 31, 2020

Add support for requesting that the ranging measurement will use
the trigger-based / non trigger-based flow instead of the EDCA based
flow.
Signed-off-by: NAvraham Stern <avraham.stern@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/20200131111300.891737-2-luca@coelho.fiSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

efb5520d

nl80211: add no pre-auth attribute and ext. feature flag for ctrl. port · 5631d96a

由 Markus Theil 提交于 3月 12, 2020

If the nl80211 control port is used before this patch, pre-auth frames
(0x88c7) are send to userspace uncoditionally. While this enables userspace
to only use nl80211 on the station side, it is not always useful for APs.
Furthermore, pre-auth frames are ordinary data frames and not related to
the control port. Therefore it should for example be possible for pre-auth
frames to be bridged onto a wired network on AP side without touching
userspace.

For backwards compatibility to code already using pre-auth over nl80211,
this patch adds a feature flag to disable this behavior, while it remains
enabled by default. An additional ext. feature flag is added to detect this
from userspace.

Thanks to Jouni for pointing out, that pre-auth frames should be handled as
ordinary data frames.
Signed-off-by: NMarkus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200312091055.54257-2-markus.theil@tu-ilmenau.deSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

5631d96a

mac80211_hwsim: add frame transmission support over virtio · 5d44fe7c

由 Erel Geron 提交于 3月 05, 2020

This allows communication with external entities.

It also required fixing up the netlink policy, since NLA_UNSPEC
attributes are no longer accepted.
Signed-off-by: NErel Geron <erelx.geron@intel.com>
[port to backports, inline the ID, use 29 as the ID as requested,
 drop != NULL checks, reduce ifdefs]
Link: https://lore.kernel.org/r/20200305143212.c6e4c87d225b.I7ce60bf143e863dcdf0fb8040aab7168ba549b99@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

5d44fe7c

net: bridge: vlan: include stats in dumps if requested · 56d09976

由 Nikolay Aleksandrov 提交于 3月 19, 2020

This patch adds support for vlan stats to be included when dumping vlan
information. We have to dump them only when explicitly requested (thus the
flag below) because that disables the vlan range compression and will make
the dump significantly larger. In order to request the stats to be
included we add a new dump attribute called BRIDGE_VLANDB_DUMP_FLAGS which
can affect dumps with the following first flag:
  - BRIDGE_VLANDB_DUMPF_STATS
The stats are intentionally nested and put into separate attributes to make
it easier for extending later since we plan to add per-vlan mcast stats,
drop stats and possibly STP stats. This is the last missing piece from the
new vlan API which makes the dumped vlan information complete.

A dump request which should include stats looks like:
 [BRIDGE_VLANDB_DUMP_FLAGS] |= BRIDGE_VLANDB_DUMPF_STATS

A vlandb entry attribute with stats looks like:
 [BRIDGE_VLANDB_ENTRY] = {
     [BRIDGE_VLANDB_ENTRY_STATS] = {
         [BRIDGE_VLANDB_STATS_RX_BYTES]
         [BRIDGE_VLANDB_STATS_RX_PACKETS]
         ...
     }
 }
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56d09976

19 3月, 2020 1 次提交

netfilter: revert introduction of egress hook · 357b6cc5

由 Daniel Borkmann 提交于 3月 18, 2020

This reverts the following commits:

  8537f786 ("netfilter: Introduce egress hook")
  5418d388 ("netfilter: Generalize ingress hook")
  b030f194 ("netfilter: Rename ingress hook include file")

>From the discussion in [0], the author's main motivation to add a hook
in fast path is for an out of tree kernel module, which is a red flag
to begin with. Other mentioned potential use cases like NAT{64,46}
is on future extensions w/o concrete code in the tree yet. Revert as
suggested [1] given the weak justification to add more hooks to critical
fast-path.

  [0] https://lore.kernel.org/netdev/cover.1583927267.git.lukas@wunner.de/
  [1] https://lore.kernel.org/netdev/20200318.011152.72770718915606186.davem@davemloft.net/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: David Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Nacked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

357b6cc5

18 3月, 2020 5 次提交

net: phylink: pcs: add 802.3 clause 22 helpers · 74db1c18

由 Russell King 提交于 3月 17, 2020

Implement helpers for PCS accessed via the MII bus using 802.3 clause
22 cycles, conforming to 802.3 clause 37 and Cisco SGMII specifications
for the advertisement word.
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74db1c18

net: bridge: vlan options: add support for tunnel mapping set/del · 569da082

由 Nikolay Aleksandrov 提交于 3月 17, 2020

This patch adds support for manipulating vlan/tunnel mappings. The
tunnel ids are globally unique and are one per-vlan. There were two
trickier issues - first in order to support vlan ranges we have to
compute the current tunnel id in the following way:
 - base tunnel id (attr) + current vlan id - starting vlan id
This is in line how the old API does vlan/tunnel mapping with ranges. We
already have the vlan range present, so it's redundant to add another
attribute for the tunnel range end. It's simply base tunnel id + vlan
range. And second to support removing mappings we need an out-of-band way
to tell the option manipulating function because there are no
special/reserved tunnel id values, so we use a vlan flag to denote the
operation is tunnel mapping removal.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

569da082

net: bridge: vlan options: add support for tunnel id dumping · 188c67dd

由 Nikolay Aleksandrov 提交于 3月 17, 2020

Add a new option - BRIDGE_VLANDB_ENTRY_TUNNEL_ID which is used to dump
the tunnel id mapping. Since they're unique per vlan they can enter a
vlan range if they're consecutive, thus we can calculate the tunnel id
range map simply as: vlan range end id - vlan range start id. The
starting point is the tunnel id in BRIDGE_VLANDB_ENTRY_TUNNEL_ID. This
is similar to how the tunnel entries can be created in a range via the
old API (a vlan range maps to a tunnel range).
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

188c67dd

net_sched: sch_fq: enable use of hrtimer slack · 583396f4

由 Eric Dumazet 提交于 3月 16, 2020

Add a new attribute to control the fq qdisc hrtimer slack.

Default is set to 10 usec.

When/if packets are throttled, fq set up an hrtimer that can
lead to one interrupt per packet in the throttled queue.

By using a timer slack, we allow better use of timer interrupts,
by giving them a chance to call multiple timer callbacks
at each hardware interrupt.

Also, giving a slack allows FQ to dequeue batches of packets
instead of a single one, thus increasing xmit_more efficiency.

This has no negative effect on the rate a TCP flow can sustain,
since each TCP flow maintains its own precise vtime (tp->tcp_wstamp_ns)

v2: added strict netlink checking (as feedback from Jakub Kicinski)

Tested:
1000 concurrent flows all using paced packets.
1,000,000 packets sent per second.

Before the patch :

$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 60726784 23628 3485992 0 0 138 1 977 535 0 12 87 0 0
0 0 0 60714700 23628 3485628 0 0 0 0 1568827 26462 0 22 78 0 0
1 0 0 60716012 23628 3485656 0 0 0 0 1570034 26216 0 22 78 0 0
0 0 0 60722420 23628 3485492 0 0 0 0 1567230 26424 0 22 78 0 0
0 0 0 60727484 23628 3485556 0 0 0 0 1568220 26200 0 22 78 0 0
2 0 0 60718900 23628 3485380 0 0 0 40 1564721 26630 0 22 78 0 0
2 0 0 60718096 23628 3485332 0 0 0 0 1562593 26432 0 22 78 0 0
0 0 0 60719608 23628 3485064 0 0 0 0 1563806 26238 0 22 78 0 0
1 0 0 60722876 23628 3485236 0 0 0 130 1565874 26566 0 22 78 0 0
1 0 0 60722752 23628 3484908 0 0 0 0 1567646 26247 0 22 78 0 0

After the patch, slack of 10 usec, we can see a reduction of interrupts
per second, and a small decrease of reported cpu usage.

$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 60722564 23628 3484728 0 0 133 1 696 545 0 13 87 0 0
1 0 0 60722568 23628 3484824 0 0 0 0 977278 25469 0 20 80 0 0
0 0 0 60716396 23628 3484764 0 0 0 0 979997 25326 0 20 80 0 0
0 0 0 60713844 23628 3484960 0 0 0 0 981394 25249 0 20 80 0 0
2 0 0 60720468 23628 3484916 0 0 0 0 982860 25062 0 20 80 0 0
1 0 0 60721236 23628 3484856 0 0 0 0 982867 25100 0 20 80 0 0
1 0 0 60722400 23628 3484456 0 0 0 8 982698 25303 0 20 80 0 0
0 0 0 60715396 23628 3484428 0 0 0 0 981777 25176 0 20 80 0 0
0 0 0 60716520 23628 3486544 0 0 0 36 978965 27857 0 21 79 0 0
0 0 0 60719592 23628 3486516 0 0 0 22 977318 25106 0 20 80 0 0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

583396f4

netfilter: Introduce egress hook · 8537f786

由 Lukas Wunner 提交于 3月 11, 2020

Commit e687ad60 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets on ingress.

Allow the same on egress.  Position the hook immediately before a packet
is handed to tc and then sent out on an interface, thereby mirroring the
ingress order.  This order allows marking packets in the netfilter
egress hook and subsequently using the mark in tc.  Another benefit of
this order is consistency with a lot of existing documentation which
says that egress tc is performed after netfilter hooks.

Egress hooks already exist for the most common protocols, such as
NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
they are executed earlier during packet processing.  However for more
exotic protocols, there is currently no provision to apply netfilter on
egress.  A common workaround is to enslave the interface to a bridge and
use ebtables, or to resort to tc.  But when the ingress hook was
introduced, consensus was that users should be given the choice to use
netfilter or tc, whichever tool suits their needs best:
https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/
This hook is also useful for NAT46/NAT64, tunneling and filtering of
locally generated af_packet traffic such as dhclient.

There have also been occasional user requests for a netfilter egress
hook in the past, e.g.:
https://www.spinics.net/lists/netfilter/msg50038.html

Performance measurements with pktgen surprisingly show a speedup rather
than a slowdown with this commit:

* Without this commit:
  Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
  2920481pps 1401Mb/sec (1401830880bps) errors: 0

* With this commit:
  Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
  2941410pps 1411Mb/sec (1411876800bps) errors: 0

* Without this commit + tc egress:
  Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
  2562631pps 1230Mb/sec (1230062880bps) errors: 0

* With this commit + tc egress:
  Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
  2659259pps 1276Mb/sec (1276444320bps) errors: 0

* With this commit + nft egress:
  Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
  2413320pps 1158Mb/sec (1158393600bps) errors: 0

Tested on a bare-metal Core i7-3615QM, each measurement was performed
three times to verify that the numbers are stable.

Commands to perform a measurement:
modprobe pktgen
echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000

Commands for testing tc egress:
tc qdisc add dev lo clsact
tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32

Commands for testing nft egress:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop

All testing was performed on the loopback interface to avoid distorting
measurements by the packet handling in the low-level Ethernet driver.
Signed-off-by: NLukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8537f786

16 3月, 2020 1 次提交

macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw) · 48ef50fa

由 Era Mayflower 提交于 3月 09, 2020

Netlink support of extended packet number cipher suites,
allows adding and updating XPN macsec interfaces.

Added support in:
    * Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites.
    * Setting and getting 64bit packet numbers with of SAs.
    * Setting (only on SA creation) and getting ssci of SAs.
    * Setting salt when installing a SAK.

Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1:
    * MACSEC_CIPHER_ID_GCM_AES_XPN_128
    * MACSEC_CIPHER_ID_GCM_AES_XPN_256

In addition, added 2 new netlink attribute types:
    * MACSEC_SA_ATTR_SSCI
    * MACSEC_SA_ATTR_SALT

Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw.
Signed-off-by: NEra Mayflower <mayflowerera@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48ef50fa

15 3月, 2020 5 次提交

netfilter: Replace zero-length array with flexible-array member · 6daf1414

由 Gustavo A. R. Silva 提交于 2月 20, 2020

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

Lastly, fix checkpatch.pl warning
WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
in net/bridge/netfilter/ebtables.c

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6daf1414

netfilter: nft_tunnel: add support for geneve opts · 925d8446

由 Xin Long 提交于 2月 10, 2020

Like vxlan and erspan opts, geneve opts should also be supported in
nft_tunnel. The difference is geneve RFC (draft-ietf-nvo3-geneve-14)
allows a geneve packet to carry multiple geneve opts. So with this
patch, nftables/libnftnl would do:

  # nft add table ip filter
  # nft add chain ip filter input { type filter hook input priority 0 \; }
  # nft add tunnel filter geneve_02 { type geneve\; id 2\; \
    ip saddr 192.168.1.1\; ip daddr 192.168.1.2\; \
    sport 9000\; dport 9001\; dscp 1234\; ttl 64\; flags 1\; \
    opts \"1:1:34567890,2:2:12121212,3:3:1212121234567890\"\; }
  # nft list tunnels table filter
    table ip filter {
    	tunnel geneve_02 {
    		id 2
    		ip saddr 192.168.1.1
    		ip daddr 192.168.1.2
    		sport 9000
    		dport 9001
    		tos 18
    		ttl 64
    		flags 1
    		geneve opts 1:1:34567890,2:2:12121212,3:3:1212121234567890
    	}
    }

v1->v2:
  - no changes, just post it separately.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

925d8446

netfilter: xtables: Add snapshot of hardidletimer target · 68983a35

由 Manoj Basapathi 提交于 2月 06, 2020

This is a snapshot of hardidletimer netfilter target.

This patch implements a hardidletimer Xtables target that can be
used to identify when interfaces have been idle for a certain period
of time.

Timers are identified by labels and are created when a rule is set
with a new label. The rules also take a timeout value (in seconds) as
an option. If more than one rule uses the same timer label, the timer
will be restarted whenever any of the rules get a hit.

One entry for each timer is created in sysfs. This attribute contains
the timer remaining for the timer to expire. The attributes are
located under the xt_idletimer class:

/sys/class/xt_idletimer/timers/<label>

When the timer expires, the target module sends a sysfs notification
to the userspace, which can then decide what to do (eg. disconnect to
save power)

Compared to IDLETIMER, HARDIDLETIMER can send notifications when
CPU is in suspend too, to notify the timer expiry.

v1->v2: Moved all functionality into IDLETIMER module to avoid
code duplication per comment from Florian.
Signed-off-by: NManoj Basapathi <manojbm@codeaurora.org>
Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

68983a35

net: sched: RED: Introduce an ECN nodrop mode · 0a7fad23

由 Petr Machata 提交于 3月 13, 2020

When the RED Qdisc is currently configured to enable ECN, the RED algorithm
is used to decide whether a certain SKB should be marked. If that SKB is
not ECN-capable, it is early-dropped.

It is also possible to keep all traffic in the queue, and just mark the
ECN-capable subset of it, as appropriate under the RED algorithm. Some
switches support this mode, and some installations make use of it.

To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is
configured with this flag, non-ECT traffic is enqueued instead of being
early-dropped.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a7fad23

net: sched: Allow extending set of supported RED flags · 14bc175d

由 Petr Machata 提交于 3月 13, 2020

The qdiscs RED, GRED, SFQ and CHOKE use different subsets of the same pool
of global RED flags. These are passed in tc_red_qopt.flags. However none of
these qdiscs validate the flag field, and just copy it over wholesale to
internal structures, and later dump it back. (An exception is GRED, which
does validate for VQs -- however not for the main setup.)

A broken userspace can therefore configure a qdisc with arbitrary
unsupported flags, and later expect to see the flags on qdisc dump. The
current ABI therefore allows storage of several bits of custom data to
qdisc instances of the types mentioned above. How many bits, depends on
which flags are meaningful for the qdisc in question. E.g. SFQ recognizes
flags ECN and HARDDROP, and the rest is not interpreted.

If SFQ ever needs to support ADAPTATIVE, it needs another way of doing it,
and at the same time it needs to retain the possibility to store 6 bits of
uninterpreted data. Likewise RED, which adds a new flag later in this
patchset.

To that end, this patch adds a new function, red_get_flags(), to split the
passed flags of RED-like qdiscs to flags and user bits, and
red_validate_flags() to validate the resulting configuration. It further
adds a new attribute, TCA_RED_FLAGS, to pass arbitrary flags.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14bc175d

13 3月, 2020 14 次提交

bpf: Add bpf_xdp_output() helper · d831ee84

由 Eelco Chaudron 提交于 3月 06, 2020

Introduce new helper that reuses existing xdp perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct xdp_buff *' as a tracepoint argument.
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial

d831ee84

bpf: Added new helper bpf_get_ns_current_pid_tgid · b4490c5c

由 Carlos Neira 提交于 3月 04, 2020

New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.
Signed-off-by: NCarlos Neira <cneirabustos@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com

b4490c5c

ethtool: add CHANNELS_NTF notification · 546379b9

由 Michal Kubecek 提交于 3月 12, 2020

Send ETHTOOL_MSG_CHANNELS_NTF notification whenever channel counts of
a network device are modified using ETHTOOL_MSG_CHANNELS_SET netlink
message or ETHTOOL_SCHANNELS ioctl request.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

546379b9

ethtool: set device channel counts with CHANNELS_SET request · e19c591e

由 Michal Kubecek 提交于 3月 12, 2020

Implement CHANNELS_SET netlink request to set channel counts of a network
device. These are traditionally set with ETHTOOL_SCHANNELS ioctl request.

Like the ioctl implementation, the generic ethtool code checks if supplied
values do not exceed driver defined limits; if they do, first offending
attribute is reported using extack. Checks preventing removing channels
used for RX indirection table or zerocopy AF_XDP socket are also
implemented.

Move ethtool_get_max_rxfh_channel() helper into common.c so that it can be
used by both ioctl and netlink code.

v2:
  - fix netdev reference leak in error path (found by Jakub Kicinsky)
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e19c591e

ethtool: provide channel counts with CHANNELS_GET request · 0c84979c

由 Michal Kubecek 提交于 3月 12, 2020

Implement CHANNELS_GET request to get channel counts of a network device.
These are traditionally available via ETHTOOL_GCHANNELS ioctl request.

Omit attributes for channel types which are not supported by driver or
device (zero reported for maximum).

v2: (all suggested by Jakub Kicinski)
  - minor cleanup in channels_prepare_data()
  - more descriptive channels_reply_size()
  - omit attributes with zero max count
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c84979c

ethtool: add RINGS_NTF notification · bc9d1c99

由 Michal Kubecek 提交于 3月 12, 2020

Send ETHTOOL_MSG_RINGS_NTF notification whenever ring sizes of a network
device are modified using ETHTOOL_MSG_RINGS_SET netlink message or
ETHTOOL_SRINGPARAM ioctl request.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc9d1c99

ethtool: set device ring sizes with RINGS_SET request · 2fc2929e

由 Michal Kubecek 提交于 3月 12, 2020

Implement RINGS_SET netlink request to set ring sizes of a network device.
These are traditionally set with ETHTOOL_SRINGPARAM ioctl request.

Like the ioctl implementation, the generic ethtool code checks if supplied
values do not exceed driver defined limits; if they do, first offending
attribute is reported using extack.

v2:
  - fix netdev reference leak in error path (found by Jakub Kicinsky)
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fc2929e

ethtool: provide ring sizes with RINGS_GET request · e4a1717b

由 Michal Kubecek 提交于 3月 12, 2020

Implement RINGS_GET request to get ring sizes of a network device. These
are traditionally available via ETHTOOL_GRINGPARAM ioctl request.

Omit attributes for ring types which are not supported by driver or device
(zero reported for maximum).

v2: (all suggested by Jakub Kicinski)
  - minor cleanup in rings_prepare_data()
  - more descriptive rings_reply_size()
  - omit attributes with zero max size
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4a1717b

ethtool: add PRIVFLAGS_NTF notification · 111dcba3

由 Michal Kubecek 提交于 3月 12, 2020

Send ETHTOOL_MSG_PRIVFLAGS_NTF notification whenever private flags of
a network device are modified using ETHTOOL_MSG_PRIVFLAGS_SET netlink
message or ETHTOOL_SPFLAGS ioctl request.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

111dcba3

ethtool: set device private flags with PRIVFLAGS_SET request · f265d799

由 Michal Kubecek 提交于 3月 12, 2020

Implement PRIVFLAGS_SET netlink request to set private flags of a network
device. These are traditionally set with ETHTOOL_SPFLAGS ioctl request.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f265d799

ethtool: provide private flags with PRIVFLAGS_GET request · e16c3386

由 Michal Kubecek 提交于 3月 12, 2020

Implement PRIVFLAGS_GET request to get private flags for a network device.
These are traditionally available via ETHTOOL_GPFLAGS ioctl request.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e16c3386

ethtool: add FEATURES_NTF notification · 9c6451ef

由 Michal Kubecek 提交于 3月 12, 2020

Send ETHTOOL_MSG_FEATURES_NTF notification whenever network device features
are modified using ETHTOOL_MSG_FEATURES_SET netlink message, ethtool ioctl
request or any other way resulting in call to netdev_update_features() or
netdev_change_features()
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c6451ef

ethtool: set netdev features with FEATURES_SET request · 0980bfcd

由 Michal Kubecek 提交于 3月 12, 2020

Implement FEATURES_SET netlink request to set network device features.
These are traditionally set using ETHTOOL_SFEATURES ioctl request.

Actual change is subject to netdev_change_features() sanity checks so that
it can differ from what was requested. Unlike with most other SET requests,
in addition to error code and optional extack, kernel provides an optional
reply message (ETHTOOL_MSG_FEATURES_SET_REPLY) in the same format but with
different semantics: information about difference between user request and
actual result and difference between old and new state of dev->features.
This reply message can be suppressed by setting ETHTOOL_FLAG_OMIT_REPLY
flag in request header.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0980bfcd

ethtool: provide netdev features with FEATURES_GET request · 0524399d

由 Michal Kubecek 提交于 3月 12, 2020

Implement FEATURES_GET request to get network device features. These are
traditionally available via ETHTOOL_GFEATURES ioctl request.

v2:
  - style cleanup suggested by Jakub Kicinski
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0524399d

12 3月, 2020 1 次提交

seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number · 26776253

由 Paolo Lungaroni 提交于 3月 11, 2020

The Internet Assigned Numbers Authority (IANA) has recently assigned
a protocol number value of 143 for Ethernet [1].

Before this assignment, encapsulation mechanisms such as Segment Routing
used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated
payload is an Ethernet frame.

In this patch, we add the definition of the Ethernet protocol number to the
kernel headers and update the SRv6 L2 tunnels to use it.

[1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtmlSigned-off-by: NPaolo Lungaroni <paolo.lungaroni@cnit.it>
Reviewed-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
Acked-by: NAhmed Abdelsalam <ahmed.abdelsalam@gssi.it>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26776253

10 3月, 2020 1 次提交

tcp: add bytes not sent to SCM_TIMESTAMPING_OPT_STATS · e08ab0b3

由 Yousuk Seung 提交于 3月 09, 2020

Add TCP_NLA_BYTES_NOTSENT to SCM_TIMESTAMPING_OPT_STATS that reports
bytes in the write queue but not sent. This is the same metric as
what is exported with tcp_info.tcpi_notsent_bytes.
Signed-off-by: NYousuk Seung <ysseung@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e08ab0b3

09 3月, 2020 1 次提交

sched: act: allow user to specify type of HW stats for a filter · 44f86580

由 Jiri Pirko 提交于 3月 07, 2020

Currently, user who is adding an action expects HW to report stats,
however it does not have exact expectations about the stats types.
That is aligned with TCA_ACT_HW_STATS_TYPE_ANY.

Allow user to specify the type of HW stats for an action and require it.

Pass the information down to flow_offload layer.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44f86580

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功