1. 13 7月, 2018 1 次提交
    • A
      tcp: use monotonic timestamps for PAWS · cca9bab1
      Arnd Bergmann 提交于
      Using get_seconds() for timestamps is deprecated since it can lead
      to overflows on 32-bit systems. While the interface generally doesn't
      overflow until year 2106, the specific implementation of the TCP PAWS
      algorithm breaks in 2038 when the intermediate signed 32-bit timestamps
      overflow.
      
      A related problem is that the local timestamps in CLOCK_REALTIME form
      lead to unexpected behavior when settimeofday is called to set the system
      clock backwards or forwards by more than 24 days.
      
      While the first problem could be solved by using an overflow-safe method
      of comparing the timestamps, a nicer solution is to use a monotonic
      clocksource with ktime_get_seconds() that simply doesn't overflow (at
      least not until 136 years after boot) and that doesn't change during
      settimeofday().
      
      To make 32-bit and 64-bit architectures behave the same way here, and
      also save a few bytes in the tcp_options_received structure, I'm changing
      the type to a 32-bit integer, which is now safe on all architectures.
      
      Finally, the ts_recent_stamp field also (confusingly) gets used to store
      a jiffies value in tcp_synq_overflow()/tcp_synq_no_recent_overflow().
      This is currently safe, but changing the type to 32-bit requires
      some small changes there to keep it working.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cca9bab1
  2. 12 7月, 2018 3 次提交
    • P
      net: Add lag.h, net_lag_port_dev_txable() · eeed992b
      Petr Machata 提交于
      LAG devices (team or bond) recognize for each one of their slave devices
      whether LAG traffic is going to be sent through that device. Bond calls
      such devices "active", team calls them "txable". When this state
      changes, a NETDEV_CHANGELOWERSTATE notification is distributed, together
      with a netdev_notifier_changelowerstate_info structure that for LAG
      devices includes a tx_enabled flag that refers to the new state. The
      notification thus makes it possible to react to the changes in txability
      in drivers.
      
      However there's no way to query txability from the outside on demand.
      That is problematic namely for mlxsw, which when resolving ERSPAN packet
      path, may encounter a LAG device, and needs to determine which of the
      slaves it should choose.
      
      To that end, introduce a new function, net_lag_port_dev_txable(), which
      determines whether a given slave device is "active" or
      "txable" (depending on the flavor of the LAG device). That function then
      dispatches to per-LAG-flavor helpers, bond_is_active_slave_dev() resp.
      team_port_dev_txable().
      
      Because there currently is no good place where net_lag_port_dev_txable()
      should be added, introduce a new header file, lag.h, which should from
      now on hold any logic common to both team and bond. (But keep
      netif_is_lag_master() together with the rest of netif_is_*_master()
      functions).
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeed992b
    • P
      team: Publish team_port_get_rcu() · 3443b00e
      Petr Machata 提交于
      A follow-up patch adds a new entry point, team_port_dev_txable(). Making
      it an ordinary exported function would mean that any module that may
      need the service in one of the supported configurations also
      unconditionally needs to pull in the team module, whether or not the
      user actually intends to create team interfaces.
      
      To prevent that, team_port_dev_txable() is defined in if_team.h, and
      therefore all dependencies of that function also need to be
      publicly-visible.
      
      Therefore move team_port_get_rcu() from team.c to if_team.h.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3443b00e
    • D
      tcp: expose both send and receive intervals for rate sample · 4929c942
      Deepti Raghavan 提交于
      Congestion control algorithms, which access the rate sample
      through the tcp_cong_control function, only have access to the maximum
      of the send and receive interval, for cases where the acknowledgment
      rate may be inaccurate due to ACK compression or decimation. Algorithms
      may want to use send rates and receive rates as separate signals.
      Signed-off-by: NDeepti Raghavan <deeptir@mit.edu>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4929c942
  3. 11 7月, 2018 2 次提交
    • T
      netfilter: Add nf_ct_get_tuple_skb global lookup function · b60a6040
      Toke Høiland-Jørgensen 提交于
      This adds a global netfilter function to extract a conntrack tuple from an
      skb. The function uses a new function added to nf_ct_hook, which will try
      to get the tuple from skb->_nfct, and do a full lookup if that fails. This
      makes it possible to use the lookup function before the skb has passed
      through the conntrack init hooks (e.g., in an ingress qdisc). The tuple is
      copied to the caller to avoid issues with reference counting.
      
      The function returns false if conntrack is not loaded, allowing it to be
      used without incurring a module dependency on conntrack. This is used by
      the NAT mode in sch_cake.
      
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b60a6040
    • T
      sched: Add Common Applications Kept Enhanced (cake) qdisc · 046f6fd5
      Toke Høiland-Jørgensen 提交于
      sch_cake targets the home router use case and is intended to squeeze the
      most bandwidth and latency out of even the slowest ISP links and routers,
      while presenting an API simple enough that even an ISP can configure it.
      
      Example of use on a cable ISP uplink:
      
      tc qdisc add dev eth0 cake bandwidth 20Mbit nat docsis ack-filter
      
      To shape a cable download link (ifb and tc-mirred setup elided)
      
      tc qdisc add dev ifb0 cake bandwidth 200mbit nat docsis ingress wash
      
      CAKE is filled with:
      
      * A hybrid Codel/Blue AQM algorithm, "Cobalt", tied to an FQ_Codel
        derived Flow Queuing system, which autoconfigures based on the bandwidth.
      * A novel "triple-isolate" mode (the default) which balances per-host
        and per-flow FQ even through NAT.
      * An deficit based shaper, that can also be used in an unlimited mode.
      * 8 way set associative hashing to reduce flow collisions to a minimum.
      * A reasonable interpretation of various diffserv latency/loss tradeoffs.
      * Support for zeroing diffserv markings for entering and exiting traffic.
      * Support for interacting well with Docsis 3.0 shaper framing.
      * Extensive support for DSL framing types.
      * Support for ack filtering.
      * Extensive statistics for measuring, loss, ecn markings, latency
        variation.
      
      A paper describing the design of CAKE is available at
      https://arxiv.org/abs/1804.07617, and will be published at the 2018 IEEE
      International Symposium on Local and Metropolitan Area Networks (LANMAN).
      
      This patch adds the base shaper and packet scheduler, while subsequent
      commits add the optional (configurable) features. The full userspace API
      and most data structures are included in this commit, but options not
      understood in the base version will be ignored.
      
      Various versions baking have been available as an out of tree build for
      kernel versions going back to 3.10, as the embedded router world has been
      running a few years behind mainline Linux. A stable version has been
      generally available on lede-17.01 and later.
      
      sch_cake replaces a combination of iptables, tc filter, htb and fq_codel
      in the sqm-scripts, with sane defaults and vastly simpler configuration.
      
      CAKE's principal author is Jonathan Morton, with contributions from
      Kevin Darbyshire-Bryant, Toke Høiland-Jørgensen, Sebastian Moeller,
      Ryan Mounce, Tony Ambardar, Dean Scarff, Nils Andreas Svee, Dave Täht,
      and Loganaden Velvindron.
      
      Testing from Pete Heist, Georgios Amanakis, and the many other members of
      the cake@lists.bufferbloat.net mailing list.
      
      tc -s qdisc show dev eth2
       qdisc cake 8017: root refcnt 2 bandwidth 1Gbit diffserv3 triple-isolate split-gso rtt 100.0ms noatm overhead 38 mpu 84
       Sent 51504294511 bytes 37724591 pkt (dropped 6, overlimits 64958695 requeues 12)
        backlog 0b 0p requeues 12
        memory used: 1053008b of 15140Kb
        capacity estimate: 970Mbit
        min/max network layer size:           28 /    1500
        min/max overhead-adjusted size:       84 /    1538
        average network hdr offset:           14
                          Bulk  Best Effort        Voice
         thresh      62500Kbit        1Gbit      250Mbit
         target          5.0ms        5.0ms        5.0ms
         interval      100.0ms      100.0ms      100.0ms
         pk_delay          5us          5us          6us
         av_delay          3us          2us          2us
         sp_delay          2us          1us          1us
         backlog            0b           0b           0b
         pkts          3164050     25030267      9530280
         bytes      3227519915  35396974782  12879808898
         way_inds            0            8            0
         way_miss           21          366           25
         way_cols            0            0            0
         drops               5            0            1
         marks               0            0            0
         ack_drop            0            0            0
         sp_flows            1            3            0
         bk_flows            0            1            1
         un_flows            0            0            0
         max_len         68130        68130        68130
      Tested-by: NPete Heist <peteheist@gmail.com>
      Tested-by: NGeorgios Amanakis <gamanakis@gmail.com>
      Signed-off-by: NDave Taht <dave.taht@gmail.com>
      Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      046f6fd5
  4. 10 7月, 2018 7 次提交
  5. 08 7月, 2018 8 次提交
  6. 07 7月, 2018 9 次提交
  7. 06 7月, 2018 1 次提交
  8. 05 7月, 2018 8 次提交
  9. 04 7月, 2018 1 次提交
    • J
      net/sched: Make etf report drops on error_queue · 4b15c707
      Jesus Sanchez-Palencia 提交于
      Use the socket error queue for reporting dropped packets if the
      socket has enabled that feature through the SO_TXTIME API.
      
      Packets are dropped either on enqueue() if they aren't accepted by the
      qdisc or on dequeue() if the system misses their deadline. Those are
      reported as different errors so applications can react accordingly.
      
      Userspace can retrieve the errors through the socket error queue and the
      corresponding cmsg interfaces. A struct sock_extended_err* is used for
      returning the error data, and the packet's timestamp can be retrieved by
      adding both ee_data and ee_info fields as e.g.:
      
          ((__u64) serr->ee_data << 32) + serr->ee_info
      
      This feature is disabled by default and must be explicitly enabled by
      applications. Enabling it can bring some overhead for the Tx cycles
      of the application.
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b15c707