提交 · e68b6e50fa359cc5aad4d2f8ac2bdbc1a8f4fd59 · openeuler / Kernel

18 11月, 2016 1 次提交

net_sched: sch_fq: use hash_ptr() · 29c58472

由 Eric Dumazet 提交于 11月 17, 2016

When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful,
and I chose hash_32().

Linus Torvalds and George Spelvin fixed this issue, so we can
use hash_ptr() to get more entropy on 64bit arches with Terabytes
of memory, and avoid the cast games.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29c58472

10 11月, 2016 5 次提交

net/sched: act_tunnel_key: Add UDP dst port option · 75bfbca0

由 Hadar Hen Zion 提交于 11月 07, 2016

The current tunnel set action supports only IP addresses and key
options. Add UDP dst port option.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75bfbca0

net/dst: Add dst port to dst_metadata utility functions · 24ba898d

由 Hadar Hen Zion 提交于 11月 07, 2016

Add dst port parameter to __ip_tun_set_dst and __ipv6_tun_set_dst
utility functions.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24ba898d

net/sched: cls_flower: Add UDP port to tunnel parameters · f4d997fd

由 Hadar Hen Zion 提交于 11月 07, 2016

The current IP tunneling classification supports only IP addresses and key.
Enhance UDP based IP tunneling classification parameters by adding UDP
src and dst port.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4d997fd

net/sched: cls_flower: Allow setting encapsulation fields as used key · 519d1052

由 Hadar Hen Zion 提交于 11月 07, 2016

When encapsulation field is set, mark it as used key for the flow
dissector. This will be used by offloading drivers.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

519d1052

net/sched: act_tunnel_key: add helper inlines to access tcf_tunnel_key · 9ce183b4

由 Hadar Hen Zion 提交于 11月 07, 2016

Needed for drivers to pick the relevant action when offloading tunnel
key act.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ce183b4

08 11月, 2016 1 次提交

qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device · 84c46dd8

由 Jesper Dangaard Brouer 提交于 11月 03, 2016

It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.

Why should the kernel catch such a misconfiguration?  Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device.  The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).

This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device.  The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc.  The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].

[1] https://github.com/docker/libcontainer/pull/193
[2] https://github.com/openshift/origin/pull/11126

Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
catch the misconfiguration on the kernel side.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84c46dd8

04 11月, 2016 1 次提交

net/sched: cls_flower: Support matching on SCTP ports · 5976c5f4

由 Simon Horman 提交于 11月 03, 2016

Support matching on SCTP ports in the same way that matching
on TCP and UDP ports is already supported.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol ip parent ffff: \
        flower indev eth0 ip_proto sctp dst_port 80 \
        action drop
Signed-off-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5976c5f4

03 11月, 2016 3 次提交

netfilter: x_tables: move hook state into xt_action_param structure · 613dbd95

由 Pablo Neira Ayuso 提交于 11月 03, 2016

Place pointer to hook state in xt_action_param structure instead of
copying the fields that we need. After this change xt_action_param fits
into one cacheline.

This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

613dbd95

net/sched: cls_flower: merge filter delete/destroy common code · 13fa876e

由 Roi Dayan 提交于 11月 01, 2016

Move common code from fl_delete and fl_detroy to __fl_delete.
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13fa876e

net/sched: cls_flower: add missing unbind call when destroying flows · a1a8f7fe

由 Roi Dayan 提交于 11月 01, 2016

tcf_unbind was called in fl_delete but was missing in fl_destroy when
force deleting flows.

Fixes: 77b9900e ('tc: introduce Flower classifier')
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1a8f7fe

30 10月, 2016 2 次提交

net_sched actions: use nla_parse_nested() · 4700e9ce

由 Johannes Berg 提交于 10月 26, 2016

Use nla_parse_nested instead of open-coding the call to
nla_parse() with the attribute data/len.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4700e9ce

netlink: Add nla_memdup() to wrap kmemdup() use on nlattr · b15ca182

由 Thomas Graf 提交于 10月 26, 2016

Wrap several common instances of:
	kmemdup(nla_data(attr), nla_len(attr), GFP_KERNEL);
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NJohannes Berg <johannes@sipsolutions.net>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b15ca182

28 10月, 2016 2 次提交

net sched filters: fix notification of filter delete with proper handle · 9ee78374

由 Jamal Hadi Salim 提交于 10月 24, 2016

Daniel says:

While trying out [1][2], I noticed that tc monitor doesn't show the
correct handle on delete:

$ tc monitor
qdisc clsact ffff: dev eno1 parent ffff:fff1
filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80

some context to explain the above:
The user identity of any tc filter is represented by a 32-bit
identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
above. A user wishing to delete, get or even modify a specific filter
uses this handle to reference it.
Every classifier is free to provide its own semantics for the 32 bit handle.
Example: classifiers like u32 use schemes like 800:1:801 to describe
the semantics of their filters represented as hash table, bucket and
node ids etc.
Classifiers also have internal per-filter representation which is different
from this externally visible identity. Most classifiers set this
internal representation to be a pointer address (which allows fast retrieval
of said filters in their implementations). This internal representation
is referenced with the "fh" variable in the kernel control code.

When a user successfuly deletes a specific filter, by specifying the correct
tcm->tcm_handle, an event is generated to user space which indicates
which specific filter was deleted.

Before this patch, the "fh" value was sent to user space as the identity.
As an example what is shown in the sample bpf filter delete event above
is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
which happens to be a 64-bit memory address of the internal filter
representation (address of the corresponding filter's struct cls_bpf_prog);

After this patch the appropriate user identifiable handle as encoded
in the originating request tcm->tcm_handle is generated in the event.
One of the cardinal rules of netlink rules is to be able to take an
event (such as a delete in this case) and reflect it back to the
kernel and successfully delete the filter. This patch achieves that.

Note, this issue has existed since the original TC action
infrastructure code patch back in 2004 as found in:
https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/

[1] http://patchwork.ozlabs.org/patch/682828/
[2] http://patchwork.ozlabs.org/patch/682829/

Fixes: 4e54c4816bfe ("[NET]: Add tc extensions infrastructure.")
Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ee78374

skbedit: allow the user to specify bitmask for mark · 4fe77d82

由 Antonio Quartulli 提交于 10月 24, 2016

The user may want to use only some bits of the skb mark in
his skbedit rules because the remaining part might be used by
something else.

Introduce the "mask" parameter to the skbedit actor in order
to implement such functionality.

When the mask is specified, only those bits selected by the
latter are altered really changed by the actor, while the
rest is left untouched.
Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fe77d82

27 10月, 2016 1 次提交

sch_htb: do not report fake rate estimators · 73e42ff7

由 Eric Dumazet 提交于 10月 21, 2016

When I prepared commit d250a5f9 ("pkt_sched: gen_estimator: Dont
report fake rate estimators"), htb still had an implicit rate estimator
for all its classes.

Then later, I made this rate estimator optional in commit 64153ce0
("net_sched: htb: do not setup default rate estimators"), but I forgot
to update htb use of gnet_stats_copy_rate_est()

After this patch, "tc -s qdisc ..." no longer report fake rate
estimators for HTB classes.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73e42ff7

24 10月, 2016 1 次提交

net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames · d65f2fa6

由 Shmulik Ladkani 提交于 10月 21, 2016

META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
is zero, then no vlan accel tag is present.

This is incorrect for zero VID vlan accel packets, making the following
match fail:
  tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...

Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
introduced in 05423b24 "vlan: allow null VLAN ID to be used"
(and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
 adapted).

Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
value.

Fixes: 05423b24 ("vlan: allow null VLAN ID to be used")
Fixes: 1a31f204 ("netsched: Allow meta match on vlan tag on receive")
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d65f2fa6

21 10月, 2016 1 次提交

net: use core MTU range checking in core net infra · 91572088

由 Jarod Wilson 提交于 10月 20, 2016

geneve:
- Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
- This one isn't quite as straight-forward as others, could use some
  closer inspection and testing

macvlan:
- set min/max_mtu

tun:
- set min/max_mtu, remove tun_net_change_mtu

vxlan:
- Merge __vxlan_change_mtu back into vxlan_change_mtu
- Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function
- This one is also not as straight-forward and could use closer inspection
  and testing from vxlan folks

bridge:
- set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function

openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
  is the largest possible size supported

sch_teql:
- set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

macsec:
- min_mtu = 0, max_mtu = 65535

macvlan:
- min_mtu = 0, max_mtu = 65535

ntb_netdev:
- min_mtu = 0, max_mtu = 65535

veth:
- min_mtu = 68, max_mtu = 65535

8021q:
- min_mtu = 0, max_mtu = 65535

CC: netdev@vger.kernel.org
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Tom Herbert <tom@herbertland.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Jiri Benc <jbenc@redhat.com>
CC: WANG Cong <xiyou.wangcong@gmail.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Pravin B Shelar <pshelar@ovn.org>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Pravin Shelar <pshelar@nicira.com>
CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91572088

20 10月, 2016 1 次提交

net/sched: act_mirred: Use passed lastuse argument · 5712bf9c

由 Paul Blakey 提交于 10月 19, 2016

stats_update callback is called by NIC drivers doing hardware
offloading of the mirred action. Lastuse is passed as argument
to specify when the stats was actually last updated and is not
always the current time.

Fixes: 9798e6fe ('net: act_mirred: allow statistic updates from offloaded actions')
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5712bf9c

14 10月, 2016 3 次提交

net/sched: act_mirred: Implement ingress actions · 53592b36

由 Shmulik Ladkani 提交于 10月 13, 2016

Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).

This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.

This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53592b36

net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header · dcf80034

由 Shmulik Ladkani 提交于 10月 13, 2016

Move detection logic that tests whether device expects skb data to point
at mac_header upon xmit into a function.
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcf80034

net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit and make it a bool · 16577923

由 Shmulik Ladkani 提交于 10月 13, 2016

'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
egress to the target device (if action is performed at ingress).

Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
the target device (and use a bool instead of int).

This allows to decouple the attribute from the action to be taken.
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16577923

13 10月, 2016 2 次提交

net_sched: reorder pernet ops and act ops registrations · ab102b80

由 WANG Cong 提交于 10月 11, 2016

Krister reported a kernel NULL pointer dereference after
tcf_action_init_1() invokes a_o->init(), it is a race condition
where one thread calling tcf_register_action() to initialize
the netns data after putting act ops in the global list and
the other thread searching the list and then calling
a_o->init(net, ...).

Fix this by moving the pernet ops registration before making
the action ops visible. This is fine because: a) we don't
rely on act_base in pernet ops->init(), b) in the worst case we
have a fully initialized netns but ops is still not ready so
new actions still can't be created.
Reported-by: NKrister Johansen <kjlx@templeofstupid.com>
Tested-by: NKrister Johansen <kjlx@templeofstupid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab102b80

net_sched: do not broadcast RTM_GETTFILTER result · fa59b27c

由 Eric Dumazet 提交于 10月 09, 2016

There are two ways to get tc filters from kernel to user space.

1) Full dump (tc_dump_tfilter())
2) RTM_GETTFILTER to get one precise filter, reducing overhead.

The second operation is unfortunately broadcasting its result,
polluting "tc monitor" users.

This patch makes sure only the requester gets the result, using
netlink_unicast() instead of rtnetlink_send()

Jamal cooked an iproute2 patch to implement "tc filter get" operation,
but other user space libraries already use RTM_GETTFILTER when a single
filter is queried, instead of dumping all filters.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa59b27c

04 10月, 2016 1 次提交

net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() functions · f39acc84

由 Shmulik Ladkani 提交于 9月 29, 2016

Generic skb_vlan_push/skb_vlan_pop functions don't properly handle the
case where the input skb data pointer does not point at the mac header:

- They're doing push/pop, but fail to properly unwind data back to its
  original location.
  For example, in the skb_vlan_push case, any subsequent
  'skb_push(skb, skb->mac_len)' calls make the skb->data point 4 bytes
  BEFORE start of frame, leading to bogus frames that may be transmitted.

- They update rcsum per the added/removed 4 bytes tag.
  Alas if data is originally after the vlan/eth headers, then these
  bytes were already pulled out of the csum.

OTOH calling skb_vlan_push/skb_vlan_pop with skb->data at mac_header
present no issues.

act_vlan is the only caller to skb_vlan_*() that has skb->data pointing
at network header (upon ingress).
Other calles (ovs, bpf) already adjust skb->data at mac_header.

This patch fixes act_vlan to point to the mac_header prior calling
skb_vlan_*() functions, as other callers do.
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Pravin Shelar <pshelar@ovn.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f39acc84

28 9月, 2016 1 次提交

net/sched: cls_flower: Use a proper mask value for enc key id parameter · eb523f42

由 Hadar Hen Zion 提交于 9月 27, 2016

The current code use the encapsulation key id value as the mask of that
parameter which is wrong. Fix that by using a full mask.

Fixes: bc3103f1 ('net/sched: cls_flower: Classify packet in ip tunnels')
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Acked-by: NAmir Vadai <amir@vadai.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb523f42

27 9月, 2016 2 次提交

act_ife: Fix false encoding · c006da0b

由 Yotam Gigi 提交于 9月 26, 2016

On ife encode side, the action stores the different tlvs inside the ife
header, where each tlv length field should refer to the length of the
whole tlv (without additional padding) and not just the data length.

On ife decode side, the action iterates over the tlvs in the ife header
and parses them one by one, where in each iteration the current pointer is
advanced according to the tlv size.

Before, the encoding encoded only the data length inside the tlv, which led
to false parsing of ife the header. In addition, due to the fact that the
loop counter was unsigned, it could lead to infinite parsing loop.

This fix changes the loop counter to be signed and fixes the encoding to
take into account the tlv type and size.

Fixes: 28a10c42 ("net sched: fix encoding to use real length")
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c006da0b

act_ife: Fix external mac header on encode · 4b1d488a

由 Yotam Gigi 提交于 9月 26, 2016

On ife encode side, external mac header is copied from the original packet
and may be overridden if the user requests. Before, the mac header copy
was done from memory region that might not be accessible anymore, as
skb_cow_head might free it and copy the packet. This led to random values
in the external mac header once the values were not set by user.

This fix takes the internal mac header from the packet, after the call to
skb_cow_head.

Fixes: ef6980b6 ("net sched: introduce IFE action")
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b1d488a

23 9月, 2016 4 次提交

net_sched: sch_fq: account for schedule/timers drifts · fefa569a

由 Eric Dumazet 提交于 9月 22, 2016

It looks like the following patch can make FQ very precise, even in VM
or stressed hosts. It matters at high pacing rates.

We take into account the difference between the time that was programmed
when last packet was sent, and current time (a drift of tens of usecs is
often observed)

Add an EWMA of the unthrottle latency to help diagnostics.

This latency is the difference between current time and oldest packet in
delayed RB-tree. This accounts for the high resolution timer latency,
but can be different under stress, as fq_check_throttled() can be
opportunistically be called from a dequeue() called after an enqueue()
for a different flow.

Tested:
// Start a 10Gbit flow
$ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr &

Before patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17106.04 756876.84   1102.75 1119049.02      0.00      0.00      0.52

After patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17867.00 800245.90   1151.77 1183172.12      0.00      0.00      0.52

A new iproute2 tc can output the 'unthrottle latency' :

$ tc -s qd sh dev eth0 | grep latency
  0 gc, 0 highprio, 32490767 throttled, 2382 ns latency
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fefa569a

sch_sfb: keep backlog updated with qlen · 3d4357fb

由 WANG Cong 提交于 9月 18, 2016

Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d4357fb

sch_qfq: keep backlog updated with qlen · 2ed5c3f0

由 WANG Cong 提交于 9月 18, 2016

Reported-by: NStas Nichiporovich <stasn77@gmail.com>
Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ed5c3f0

net_sched: check NULL on error path in route4_change() · 21641c2e

由 WANG Cong 提交于 9月 18, 2016

On error path in route4_change(), 'f' could be NULL,
so we should check NULL before calling tcf_exts_destroy().

Fixes: b9a24bb7 ("net_sched: properly handle failure case of tcf_exts_init()")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21641c2e

22 9月, 2016 6 次提交

net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action · 45a497f2

由 Shmulik Ladkani 提交于 9月 19, 2016

TCA_VLAN_ACT_MODIFY allows one to change an existing tag.

It accepts same attributes as TCA_VLAN_ACT_PUSH (protocol, id,
priority).
If packet is vlan tagged, then the tag gets overwritten according to
user specified attributes.

For example, this allows user to replace a tag's vid while preserving
its priority bits (as opposed to "action vlan pop pipe action vlan push").
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45a497f2

net: act_mirred: allow statistic updates from offloaded actions · 9798e6fe

由 Jakub Kicinski 提交于 9月 21, 2016

Implement .stats_update() callback.  The implementation
is generic and can be reused by other simple actions if
needed.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9798e6fe

net: cls_bpf: allow offloaded filters to update stats · 68d64063

由 Jakub Kicinski 提交于 9月 21, 2016

Call into offloaded filters to update stats.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68d64063

net: cls_bpf: add support for marking filters as hardware-only · eadb4148

由 Jakub Kicinski 提交于 9月 21, 2016

Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_SW flag.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eadb4148

net: cls_bpf: limit hardware offload by software-only flag · 0d01d45f

由 Jakub Kicinski 提交于 9月 21, 2016

Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
Unlike U32 and flower cls_bpf already has some netlink
flags defined.  Create a new attribute to be able to use
the same flag values as the above.

Unlike U32 and flower reject unknown flags.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d01d45f

net: cls_bpf: add hardware offload · 332ae8e2

由 Jakub Kicinski 提交于 9月 21, 2016

This patch adds hardware offload capability to cls_bpf classifier,
similar to what have been done with U32 and flower.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

332ae8e2

21 9月, 2016 2 次提交

net_sched: sch_fq: add low_rate_threshold parameter · 77879147

由 Eric Dumazet 提交于 9月 19, 2016

This commit adds to the fq module a low_rate_threshold parameter to
insert a delay after all packets if the socket requests a pacing rate
below the threshold.

This helps achieve more precise control of the sending rate with
low-rate paths, especially policers. The basic issue is that if a
congestion control module detects a policer at a certain rate, it may
want fq to be able to shape to that policed rate. That way the sender
can avoid policer drops by having the packets arrive at the policer at
or just under the policed rate.

The default threshold of 550Kbps was chosen analytically so that for
policers or links at 500Kbps or 512Kbps fq would very likely invoke
this mechanism, even if the pacing rate was briefly slightly above the
available bandwidth. This value was then empirically validated with
two years of production testing on YouTube video servers.
Signed-off-by: NVan Jacobson <vanj@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77879147

net sched actions: fix GETing actions · aecc5cef

由 Jamal Hadi Salim 提交于 9月 19, 2016

With the batch changes that translated transient actions into
a temporary list lost in the translation was the fact that
tcf_action_destroy() will eventually delete the action from
the permanent location if the refcount is zero.

Example of what broke:
...add a gact action to drop
sudo $TC actions add action drop index 10
...now retrieve it, looks good
sudo $TC actions get action gact index 10
...retrieve it again and find it is gone!
sudo $TC actions get action gact index 10

Fixes: 22dc13c8 ("net_sched: convert tcf_exts from list to pointer array"),
Fixes: 824a7e88 ("net_sched: remove an unnecessary list_del()")
Fixes: f07fed82 ("net_sched: remove the leftover cleanup_a()")
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aecc5cef

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功