提交 · 3c546728df983c3c1c232241249c238c143bcb9e · openanolis / cloud-kernel

12 7月, 2018 10 次提交

wimax/i2400m: remove redundant variables ack_status, bcf and protocol · 3c546728

由 Colin Ian King 提交于 7月 09, 2018

Variables ack_status, bcf and protocol are being assigned but are
never used hence they are redundant and can be removed.

Also declare ack_type as unsigned int rather than unsigned to clean
up a checkpatch warning.

Cleans up clang warnings:
warning: variable 'ack_status' set but not used [-Wunused-but-set-variable]
warning: variable 'bcf' set but not used [-Wunused-but-set-variable]
warning: variable 'protocol' set but not used [-Wunused-but-set-variable]
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c546728

net: sched: act_ife: fix memory leak in ife init · 01e866bf

由 Vlad Buslov 提交于 7月 09, 2018

Free params if tcf_idr_check_alloc() returned error.

Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01e866bf

cxgb4: specify IQTYPE in fw_iq_cmd · 8dce04f1

由 Arjun Vynipadath 提交于 7月 09, 2018

congestion argument passed to t4_sge_alloc_rxq() is used
to differentiate between nic/ofld queues.
Signed-off-by: NArjun Vynipadath <arjun@chelsio.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dce04f1

Merge branch 'net-ipv6-addr_gen_mode-fixes' · 57cd07fb

由 David S. Miller 提交于 7月 11, 2018

Sabrina Dubroca says:

====================
net/ipv6: addr_gen_mode fixes

This series fixes bugs in handling of the addr_gen_mode option, mainly
related to the sysctl. A minor netlink issue was also present in the
initial commit introducing the option on a per-netdevice basis.

v2: add patch 4, requested by David Ahern during review of v1
    add patch 5, missing documentation for the sysctl
    patches 1, 2, 3 are unchanged
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57cd07fb

Documentation: ip-sysctl.txt: document addr_gen_mode · f168db5e

由 Sabrina Dubroca 提交于 7月 09, 2018

addr_gen_mode was introduced in without documentation, add it now.

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f168db5e

net/ipv6: propagate net.ipv6.conf.all.addr_gen_mode to devices · f24c5987

由 Sabrina Dubroca 提交于 7月 09, 2018

This aligns the addr_gen_mode sysctl with the expected behavior of the
"all" variant.

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Suggested-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f24c5987

net/ipv6: reserve room for IFLA_INET6_ADDR_GEN_MODE · bdd72f41

由 Sabrina Dubroca 提交于 7月 09, 2018

inet6_ifla6_size() is called to check how much space is needed by
inet6_fill_link_af() and inet6_fill_ifinfo(), both of which include
the IFLA_INET6_ADDR_GEN_MODE attribute. Reserve some room for it.

Fixes: bc91b0f0 ("ipv6: addrconf: implement address generation modes")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdd72f41

net/ipv6: don't reinitialize ndev->cnf.addr_gen_mode on new inet6_dev · 70c30d76

由 Sabrina Dubroca 提交于 7月 09, 2018

The value has already been copied from this netns's devconf_dflt, it
shouldn't be reset to the global kernel default.

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70c30d76

net/ipv6: fix addrconf_sysctl_addr_gen_mode · c6dbf7aa

由 Sabrina Dubroca 提交于 7月 09, 2018

addrconf_sysctl_addr_gen_mode() has multiple problems. First, it ignores
the errors returned by proc_dointvec().

addrconf_sysctl_addr_gen_mode() calls proc_dointvec() directly, which
writes the value to memory, and then checks if it's valid and may return
EINVAL. If a bad value is given, the value displayed when reading
net.ipv6.conf.foo.addr_gen_mode next time will be invalid. In case the
value provided by the user was valid, addrconf_dev_config() won't be
called since idev->cnf.addr_gen_mode has already been updated.

Fix this in the usual way we deal with values that need to be checked
after the proc_do*() helper has returned: define a local ctl_table and
storage, call proc_dointvec() on that temporary area, then check and
store.

addrconf_sysctl_addr_gen_mode() also writes the new value to the global
ipv6_devconf_dflt, when we're writing to some netns's default, so that
new netns will inherit the value that was set by the change occuring in
any netns. That doesn't make any sense, so let's drop this assignment.

Finally, since addr_gen_mode is a __u32, switch to proc_douintvec().

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6dbf7aa

net/sched: flower: Fix null pointer dereference when run tc vlan command · 5e9a0fe4

由 Jianbo Liu 提交于 7月 09, 2018

Zahari issued tc vlan command without setting vlan_ethtype, which will
crash kernel. To avoid this, we must check tb[TCA_FLOWER_KEY_VLAN_ETH_TYPE]
is not null before use it.
Also we don't need to dump vlan_ethtype or cvlan_ethtype in this case.

Fixes: d64efd09 ('net/sched: flower: Add supprt for matching on QinQ vlan headers')
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reported-by: NZahari Doychev <zahari.doychev@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e9a0fe4

11 7月, 2018 10 次提交

selftests: forwarding: mirror_lib: Tighten up VLAN capture · db560d16

由 Petr Machata 提交于 7月 08, 2018

The function do_test_span_vlan_dir_ips() is used for testing whether
mirrored packets are VLAN-encapsulated. But since it only considers
VLAN encapsulation, it may end up matching unmirrored ARP traffic as
well. One consequence is a rare failure of mirror_gre_vlan_bridge_1q's
test_gretap_untagged_egress. Decreasing ping cadence in mirror_test()
makes the problem easily reproducible.

Therefore tighten up the match criterion to only count those 802.1q
packets where the next header is IP.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db560d16

Merge branch 'cake-qdisc' · 5025b99c

由 David S. Miller 提交于 7月 10, 2018

Toke Høiland-Jørgensen says:

====================
sched: Add Common Applications Kept Enhanced (cake) qdisc

This patch series adds the CAKE qdisc, and has been split up to ease
review.

I have attempted to split out each configurable feature into its own patch.
The first commit adds the base shaper and packet scheduler, while
subsequent commits add the optional features. The full userspace API and
most data structures are included in this commit, but options not
understood in the base version will be ignored.

The result of applying the entire series is identical to the out of tree
version that have seen extensive testing in previous deployments, most
notably as an out of tree patch to OpenWrt. However, note that I have only
compile tested the individual patches; so the whole series should be
considered as a unit.

---
Changelog

v19:
  - Rebase to current net-next.
  - Don't rely on the value of sch->q.qlen to break loops; fixes possible
    infinite loop on multi-queue devices.
  - Don't overwrite NAT flag when setting flow mode.

v18:
  - Rework classification logic in the diffserv case to always hash if
    filter doesn't select a queue, and to run TC filters before
    selecting the diffserv tin (allowing filter to influence this).
  - Make sure we always call qdisc_watchdog_init() in cake_init(), so we
    don't crash in cake_destroy().

v17:
  - Rebase to newest net-next and move the conntrack callback to
    nf_ct_hook
  - Fix a compile error when NF_CONNTRACK is unset.

v16:
  - Move conntrack lookup function into conntrack core and read it via
    RCU so it is only active when the nf_conntrack module is loaded.
    This avoids the module dependency on conntrack for NAT mode. Thanks
    to Pablo for the idea.

v15:
  - Handle ECN flags in ACK filter

v14:
  - Handle seqno wraps and DSACKs in ACK filter

v13:
  - Avoid ktime_t to scalar compares
  - Add class dumping and basic stats
  - Fail with ENOTSUPP when requesting NAT mode and conntrack is not
    available.
  - Parse all TCP options in ACK filter and make sure to only drop safe
    ones. Also handle SACK ranges properly.

v12:
  - Get rid of custom time typedefs. Use ktime_t for time and u64 for
    duration instead.

v11:
  - Fix overhead compensation calculation for GSO packets
  - Change configured rate to be u64 (I ran out of bits before I ran out
    of CPU when testing the effects of the above)

v10:
  - Christmas tree gardening (fix variable declarations to be in reverse
    line length order)

v9:
  - Remove duplicated checks around kvfree() and just call it
    unconditionally.
  - Don't pass __GFP_NOWARN when allocating memory
  - Move options in cake_dump() that are related to optional features to
    later patches implementing the features.
  - Support attaching filters to the qdisc and use the classification
    result to select flow queue.
  - Support overriding diffserv priority tin from skb->priority

v8:
  - Remove inline keyword from function definitions
  - Simplify ACK filter; remove the complex state handling to make the
    logic easier to follow. This will potentially be a bit less efficient,
    but I have not been able to measure a difference.

v7:
  - Split up patch into a series to ease review.
  - Constify the ACK filter.

v6:
  - Fix 6in4 encapsulation checks in ACK filter code
  - Checkpatch fixes

v5:
  - Refactor ACK filter code and hopefully fix the safety issues
    properly this time.

v4:
  - Only split GSO packets if shaping at speeds <= 1Gbps
  - Fix overhead calculation code to also work for GSO packets
  - Don't re-implement kvzalloc()
  - Remove local header include from out-of-tree build (fixes kbuild-bot
    complaint).
  - Several fixes to the ACK filter:
    - Check pskb_may_pull() before deref of transport headers.
    - Don't run ACK filter logic on split GSO packets
    - Fix TCP sequence number compare to deal with wraparounds

v3:
  - Use IS_REACHABLE() macro to fix compilation when sch_cake is
    built-in and conntrack is a module.
  - Switch the stats output to use nested netlink attributes instead
    of a versioned struct.
  - Remove GPL boilerplate.
  - Fix array initialisation style.

v2:
  - Fix kbuild test bot complaint
  - Clean up the netlink ABI
  - Fix checkpatch complaints
  - A few tweaks to the behaviour of cake based on testing carried out
    while writing the paper.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5025b99c

sch_cake: Conditionally split GSO segments · 0c850344

由 Toke Høiland-Jørgensen 提交于 7月 06, 2018

At lower bandwidths, the transmission time of a single GSO segment can add
an unacceptable amount of latency due to HOL blocking. Furthermore, with a
software shaper, any tuning mechanism employed by the kernel to control the
maximum size of GSO segments is thrown off by the artificial limit on
bandwidth. For this reason, we split GSO segments into their individual
packets iff the shaper is active and configured to a bandwidth <= 1 Gbps.
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c850344

sch_cake: Add overhead compensation support to the rate shaper · a729b7f0

由 Toke Høiland-Jørgensen 提交于 7月 06, 2018

This commit adds configurable overhead compensation support to the rate
shaper. With this feature, userspace can configure the actual bottleneck
link overhead and encapsulation mode used, which will be used by the shaper
to calculate the precise duration of each packet on the wire.

This feature is needed because CAKE is often deployed one or two hops
upstream of the actual bottleneck (which can be, e.g., inside a DSL or
cable modem). In this case, the link layer characteristics and overhead
reported by the kernel does not match the actual bottleneck. Being able to
set the actual values in use makes it possible to configure the shaper rate
much closer to the actual bottleneck rate (our experience shows it is
possible to get with 0.1% of the actual physical bottleneck rate), thus
keeping latency low without sacrificing bandwidth.

The overhead compensation has three tunables: A fixed per-packet overhead
size (which, if set, will be accounted from the IP packet header), a
minimum packet size (MPU) and a framing mode supporting either ATM or PTM
framing. We include a set of common keywords in TC to help users configure
the right parameters. If no overhead value is set, the value reported by
the kernel is used.
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a729b7f0

sch_cake: Add DiffServ handling · 83f8fd69

由 Toke Høiland-Jørgensen 提交于 7月 06, 2018

This adds support for DiffServ-based priority queueing to CAKE. If the
shaper is in use, each priority tier gets its own virtual clock, which
limits that tier's rate to a fraction of the overall shaped rate, to
discourage trying to game the priority mechanism.

CAKE defaults to a simple, three-tier mode that interprets most code points
as "best effort", but places CS1 traffic into a low-priority "bulk" tier
which is assigned 1/16 of the total rate, and a few code points indicating
latency-sensitive or control traffic (specifically TOS4, VA, EF, CS6, CS7)
into a "latency sensitive" high-priority tier, which is assigned 1/4 rate.
The other supported DiffServ modes are a 4-tier mode matching the 802.11e
precedence rules, as well as two 8-tier modes, one of which implements
strict precedence of the eight priority levels.

This commit also adds an optional DiffServ 'wash' mode, which will zero out
the DSCP fields of any packet passing through CAKE. While this can
technically be done with other mechanisms in the kernel, having the feature
available in CAKE significantly decreases configuration complexity; and the
implementation cost is low on top of the other DiffServ-handling code.

Filters and applications can set the skb->priority field to override the
DSCP-based classification into tiers. If TC_H_MAJ(skb->priority) matches
CAKE's qdisc handle, the minor number will be interpreted as a priority
tier if it is less than or equal to the number of configured priority
tiers.
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83f8fd69

sch_cake: Add NAT awareness to packet classifier · ea825115

由 Toke Høiland-Jørgensen 提交于 7月 06, 2018

When CAKE is deployed on a gateway that also performs NAT (which is a
common deployment mode), the host fairness mechanism cannot distinguish
internal hosts from each other, and so fails to work correctly.

To fix this, we add an optional NAT awareness mode, which will query the
kernel conntrack mechanism to obtain the pre-NAT addresses for each packet
and use that in the flow and host hashing.

When the shaper is enabled and the host is already performing NAT, the cost
of this lookup is negligible. However, in unlimited mode with no NAT being
performed, there is a significant CPU cost at higher bandwidths. For this
reason, the feature is turned off by default.

Cc: netfilter-devel@vger.kernel.org
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea825115

netfilter: Add nf_ct_get_tuple_skb global lookup function · b60a6040

由 Toke Høiland-Jørgensen 提交于 7月 06, 2018

This adds a global netfilter function to extract a conntrack tuple from an
skb. The function uses a new function added to nf_ct_hook, which will try
to get the tuple from skb->_nfct, and do a full lookup if that fails. This
makes it possible to use the lookup function before the skb has passed
through the conntrack init hooks (e.g., in an ingress qdisc). The tuple is
copied to the caller to avoid issues with reference counting.

The function returns false if conntrack is not loaded, allowing it to be
used without incurring a module dependency on conntrack. This is used by
the NAT mode in sch_cake.

Cc: netfilter-devel@vger.kernel.org
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b60a6040

sch_cake: Add optional ACK filter · 8b713881

由 Toke Høiland-Jørgensen 提交于 7月 06, 2018

The ACK filter is an optional feature of CAKE which is designed to improve
performance on links with very asymmetrical rate limits. On such links
(which are unfortunately quite prevalent, especially for DSL and cable
subscribers), the downstream throughput can be limited by the number of
ACKs capable of being transmitted in the *upstream* direction.

Filtering ACKs can, in general, have adverse effects on TCP performance
because it interferes with ACK clocking (especially in slow start), and it
reduces the flow's resiliency to ACKs being dropped further along the path.
To alleviate these drawbacks, the ACK filter in CAKE tries its best to
always keep enough ACKs queued to ensure forward progress in the TCP flow
being filtered. It does this by only filtering redundant ACKs. In its
default 'conservative' mode, the filter will always keep at least two
redundant ACKs in the queue, while in 'aggressive' mode, it will filter
down to a single ACK.

The ACK filter works by inspecting the per-flow queue on every packet
enqueue. Starting at the head of the queue, the filter looks for another
eligible packet to drop (so the ACK being dropped is always closer to the
head of the queue than the packet being enqueued). An ACK is eligible only
if it ACKs *fewer* bytes than the new packet being enqueued, including any
SACK options. This prevents duplicate ACKs from being filtered, to avoid
interfering with retransmission logic. In addition, we check TCP header
options and only drop those that are known to not interfere with sender
state. In particular, packets with unknown option codes are never dropped.

In aggressive mode, an eligible packet is always dropped, while in
conservative mode, at least two ACKs are kept in the queue. Only pure ACKs
(with no data segments) are considered eligible for dropping, but when an
ACK with data segments is enqueued, this can cause another pure ACK to
become eligible for dropping.

The approach described above ensures that this ACK filter avoids most of
the drawbacks of a naive filtering mechanism that only keeps flow state but
does not inspect the queue. This is the rationale for including the ACK
filter in CAKE itself rather than as separate module (as the TC filter, for
instance).

Our performance evaluation has shown that on a 30/1 Mbps link with a
bidirectional traffic test (RRUL), turning on the ACK filter on the
upstream link improves downstream throughput by ~20% (both modes) and
upstream throughput by ~12% in conservative mode and ~40% in aggressive
mode, at the cost of ~5ms of inter-flow latency due to the increased
congestion.

In *really* pathological cases, the effect can be a lot more; for instance,
the ACK filter increases the achievable downstream throughput on a link
with 100 Kbps in the upstream direction by an order of magnitude (from ~2.5
Mbps to ~25 Mbps).

Finally, even though we consider the ACK filter to be safer than most, we
do not recommend turning it on everywhere: on more symmetrical link
bandwidths the effect is negligible at best.

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b713881

sch_cake: Add ingress mode · 7298de9c

由 Toke Høiland-Jørgensen 提交于 7月 06, 2018

The ingress mode is meant to be enabled when CAKE runs downlink of the
actual bottleneck (such as on an IFB device). The mode changes the shaper
to also account dropped packets to the shaped rate, as these have already
traversed the bottleneck.

Enabling ingress mode will also tune the AQM to always keep at least two
packets queued *for each flow*. This is done by scaling the minimum queue
occupancy level that will disable the AQM by the number of active bulk
flows. The rationale for this is that retransmits are more expensive in
ingress mode, since dropped packets have to traverse the bottleneck again
when they are retransmitted; thus, being more lenient and keeping a minimum
number of packets queued will improve throughput in cases where the number
of active flows are so large that they saturate the bottleneck even at
their minimum window size.

This commit also adds a separate switch to enable ingress mode rate
autoscaling. If enabled, the autoscaling code will observe the actual
traffic rate and adjust the shaper rate to match it. This can help avoid
latency increases in the case where the actual bottleneck rate decreases
below the shaped rate. The scaling filters out spikes by an EWMA filter.
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7298de9c

sched: Add Common Applications Kept Enhanced (cake) qdisc · 046f6fd5

由 Toke Høiland-Jørgensen 提交于 7月 06, 2018

sch_cake targets the home router use case and is intended to squeeze the
most bandwidth and latency out of even the slowest ISP links and routers,
while presenting an API simple enough that even an ISP can configure it.

Example of use on a cable ISP uplink:

tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter

To shape a cable download link (ifb and tc-mirred setup elided)

tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash

CAKE is filled with:

* A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
  derived Flow Queuing system, which autoconfigures based on the bandwidth.
* A novel "triple-isolate" mode (the default) which balances per-host
  and per-flow FQ even through NAT.
* An deficit based shaper, that can also be used in an unlimited mode.
* 8 way set associative hashing to reduce flow collisions to a minimum.
* A reasonable interpretation of various diffserv latency/loss tradeoffs.
* Support for zeroing diffserv markings for entering and exiting traffic.
* Support for interacting well with Docsis 3.0 shaper framing.
* Extensive support for DSL framing types.
* Support for ack filtering.
* Extensive statistics for measuring, loss, ecn markings, latency
  variation.

A paper describing the design of CAKE is available at
https://arxiv.org/abs/1804.07617, and will be published at the 2018 IEEE
International Symposium on Local and Metropolitan Area Networks (LANMAN).

This patch adds the base shaper and packet scheduler, while subsequent
commits add the optional (configurable) features. The full userspace API
and most data structures are included in this commit, but options not
understood in the base version will be ignored.

Various versions baking have been available as an out of tree build for
kernel versions going back to 3.10, as the embedded router world has been
running a few years behind mainline Linux. A stable version has been
generally available on lede-17.01 and later.

sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
in the sqm-scripts, with sane defaults and vastly simpler configuration.

CAKE's principal author is Jonathan Morton, with contributions from
Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
Ryan Mounce, Tony Ambardar, Dean Scarff, Nils Andreas Svee, Dave Täht,
and Loganaden Velvindron.

Testing from Pete Heist, Georgios Amanakis, and the many other members of
the cake@lists.bufferbloat.net mailing list.

tc -s qdisc show dev eth2
 qdisc cake 8017: root refcnt 2 bandwidth 1Gbit diffserv3 triple-isolate split-gso rtt 100.0ms noatm overhead 38 mpu 84
 Sent 51504294511 bytes 37724591 pkt (dropped 6, overlimits 64958695 requeues 12)
  backlog 0b 0p requeues 12
  memory used: 1053008b of 15140Kb
  capacity estimate: 970Mbit
  min/max network layer size:           28 /    1500
  min/max overhead-adjusted size:       84 /    1538
  average network hdr offset:           14
                    Bulk  Best Effort        Voice
   thresh      62500Kbit        1Gbit      250Mbit
   target          5.0ms        5.0ms        5.0ms
   interval      100.0ms      100.0ms      100.0ms
   pk_delay          5us          5us          6us
   av_delay          3us          2us          2us
   sp_delay          2us          1us          1us
   backlog            0b           0b           0b
   pkts          3164050     25030267      9530280
   bytes      3227519915  35396974782  12879808898
   way_inds            0            8            0
   way_miss           21          366           25
   way_cols            0            0            0
   drops               5            0            1
   marks               0            0            0
   ack_drop            0            0            0
   sp_flows            1            3            0
   bk_flows            0            1            1
   un_flows            0            0            0
   max_len         68130        68130        68130
Tested-by: NPete Heist <peteheist@gmail.com>
Tested-by: NGeorgios Amanakis <gamanakis@gmail.com>
Signed-off-by: NDave Taht <dave.taht@gmail.com>
Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

046f6fd5

10 7月, 2018 20 次提交

net: Use __u32 in uapi net_stamp.h · 52b50921

由 Jesus Sanchez-Palencia 提交于 7月 09, 2018

We are not supposed to use u32 in uapi, so change the flags member of
struct sock_txtime from u32 to __u32 instead.

Fixes: 80b14dee ("net: Add a new socket option for a future transmit time")
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52b50921

Merge branch 'mlxsw-More-Spectrum-2-preparations' · 1497d2fd

由 David S. Miller 提交于 7月 09, 2018

aIdo Schimmel says:

====================
mlxsw: More Spectrum-2 preparations

This is the second and last set of preparations towards initial
Spectrum-2 support in mlxsw. It mainly re-arranges parts of the code
that need to work with both ASICs, but somewhat differ.

The first three patches allow different ASICs to register different set
of operations for KVD linear (KVDL) management. In Spectrum-2 there is
no linear memory and instead entries that reside there in Spectrum
(e.g., nexthops) are hashed and inserted to the hash-based KVD memory.

The fourth patch does a similar restructuring in the low-level multicast
router code. This is necessary because multicast routing is implemented
using regular circuit TCAM (C-TCAM) in Spectrum, whereas Spectrum-2 uses
an algorithmic TCAM (A-TCAM).

Next six patches prepare the ACL code for the introduction of A-TCAM in
follow-up patch sets.

Last two patches allow different ASICs to require different firmware
versions and add two resources that need to be queried from firmware by
Spectrum-2 specific code.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1497d2fd

mlxsw: resources: Add couple of Spectrum-2 KVD resources · a8b9f232

由 Jiri Pirko 提交于 7月 08, 2018

These resources are needed for Spectrum-2 KVD linear management
implementation.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8b9f232

mlxsw: spectrum: Prepare for multiple FW versions for Spectrum and Spectrum-2 · abfd6182

由 Jiri Pirko 提交于 7月 08, 2018

Prepare for Spectrum-2 FW version checking and
make mlxsw_sp_fw_rev_validate() per-ASIC as well as required FW revision
and FW filename.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abfd6182

mlxsw: spectrum_acl: Implement priority setting for rules inserted to TCAM · ea8b2e28

由 Jiri Pirko 提交于 7月 08, 2018

For Spectrum-2, we need to insert priority to C-TCAM because HW
needs that info in order to correctly process scenarios where rules
are in both C-TCAM and A-TCAM.

So extend the mlxsw_sp_acl_ctcam_entry_add() args to accept indication
if priority needs to be filled up and implement the priority
computation and fill-up.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea8b2e28

mlxsw: reg: Add priority field for PTCEV2 register · 42df8358

由 Jiri Pirko 提交于 7月 08, 2018

This is going to be needed for Spectrum-2 C-TCAM implementation.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42df8358

mlxsw: spectrum_acl: Move block items encoding into Spectrum op · a5995cc8

由 Jiri Pirko 提交于 7月 08, 2018

Since Spectrum-2 encodes blocks into different HW layout, push this
code into Spectrum-specific op.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5995cc8

mlxsw: spectrum_acl: Convert mlxsw_afk_create args to ops · c17d2083

由 Jiri Pirko 提交于 7月 08, 2018

Since the flex keys for Spectrum-2 differ not only in blocks definitions
but also in encoding layout, prepare for the implementation and pass
Spectrum/Spectrum-2 specific ops down to mlxsw_afk_create.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c17d2083

mlxsw: spectrum_acl: Add tcam init/fini ops · bab5c1cf

由 Jiri Pirko 提交于 7月 08, 2018

Add ops to be called on driver instance init and fini.
This is needed in order to be possible to do Spectrum-2 specific init
and fini work.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bab5c1cf

mlxsw: spectrum_acl: Split TCAM handling 3 ways · 64eccd00

由 Jiri Pirko 提交于 7月 08, 2018

To allow easy and clean Spectrum-2 implementation for things that differ
from Spectrum, split the existing ACL TCAM code 3 ways:
1) common code that calls Spectrum/Spectrum-2 specific ops
2) Spectrum ops implementations
3) common C-TCAM code that is going to be shared between Spectrum and
   Spectrum-2 implementations
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64eccd00

mlxsw: spectrum_mr_tcam: Push Spectrum-specific operations into a separate file · 8fae4392

由 Jiri Pirko 提交于 7月 08, 2018

Since Spectrum-2 has different handling of TCAM, push Spectrum MR TCAM
bits to a separate file accessible by ops which allows to implement
Spectrum-2 specific ops.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fae4392

mlxsw: spectrum_kvdl: Pass entry_count to free function · 0304c005

由 Jiri Pirko 提交于 7月 08, 2018

For the Spectrum-2 KVD linear manager implementation, entry_count will be
needed even for the free function. So pass it down.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0304c005

mlxsw: spectrum_kvdl: Pass entry type to alloc/free · 4b6b1869

由 Jiri Pirko 提交于 7月 08, 2018

Future Spectrum-2 KVD linear manager implementation needs to know type
of the entry to alloc and free. So define the types in an enum and
pass it down to alloc and free functions. Once the entry type
is passed down, KVDL common part knows sizes of each entry types,
so replace size function arg with entry count.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b6b1869

mlxsw: spectrum_kvdl: Push out KVD linear management into ops · ebcff743

由 Jiri Pirko 提交于 7月 08, 2018

In Spectrum-2 there is a different implementation of KVD linear
management. Unlike in Spectrum where there is a single index space,
in Spectrum-2 the indexes are per-resource. Also there is need to
explicitly tell HW that an entry is no longer used.
So push out the existing implementation into spectrum1_kvdl.c and
prepare ops infrastructure to allow new implementation in a follow-up.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebcff743

net/mlx5: Use 2-factor allocator calls · eec4edc9

由 Kees Cook 提交于 7月 04, 2018

This restores the use of 2-factor allocation helpers that were already
fixed treewide. Please do not use open-coded multiplication; prefer,
instead, using 2-factor allocation helpers.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eec4edc9

tcp: remove SG-related comment in tcp_sendmsg() · 95765a6c

由 Julian Wiedmann 提交于 7月 09, 2018

Since commit 74d4a8f8 ("tcp: remove sk_can_gso() use"), the code
doesn't care whether the interface supports SG.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95765a6c

Merge branch 'fix-use-after-free-bugs-in-skb-list-processing' · 863f4fdb

由 David S. Miller 提交于 7月 09, 2018

Edward Cree says:

====================
fix use-after-free bugs in skb list processing

A couple of bugs in skb list handling were spotted by Dan Carpenter, with
 the help of Smatch; following up on them I found a couple more similar
 cases.  This series fixes them by changing the relevant loops to use the
 dequeue-enqueue model (rather than in-place list modification).

v3: fixed another similar bug in __netif_receive_skb_list_core().

v2: dropped patch #3 (new list.h helper), per DaveM's request.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

863f4fdb

net: core: fix use-after-free in __netif_receive_skb_list_core · 9af86f93

由 Edward Cree 提交于 7月 09, 2018

__netif_receive_skb_core can free the skb, so we have to use the dequeue-
 enqueue model when calling it from __netif_receive_skb_list_core.

Fixes: 88eb1944 ("net: core: propagate SKB lists through packet_type lookup")
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9af86f93

netfilter: fix use-after-free in NF_HOOK_LIST · 9f17dbf0

由 Edward Cree 提交于 7月 09, 2018

nf_hook() can free the skb, so we need to remove it from the list before
 calling, and add passed skbs to a sublist afterwards.

Fixes: 17266ee9 ("net: ipv4: listified version of ip_rcv")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f17dbf0

net: core: fix uses-after-free in list processing · 8c057efa

由 Edward Cree 提交于 7月 09, 2018

In netif_receive_skb_list_internal(), all of skb_defer_rx_timestamp(),
 do_xdp_generic() and enqueue_to_backlog() can lead to kfree(skb).  Thus,
 we cannot wait until after they return to remove the skb from the list;
 instead, we remove it first and, in the pass case, add it to a sublist
 afterwards.
In the case of enqueue_to_backlog() we have already decided not to pass
 when we call the function, so we do not need a sublist.

Fixes: 7da517a3 ("net: core: Another step of skb receive list processing")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c057efa

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功